Kim Lab Website

Software Repository
Kim Laboratory
Computational Evolutionary Biology

latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

contact us

Biology Department
School of Arts and Sciences
University of Pennsylvania
301a Lynch Laboratory
433 S University Avenue
Philadelphia, PA 19104 USA

off: (215) 746-5187
lab: (215) 898-8395
fax: (215) 898-8780

email: junhyong@sas.upenn.edu

additional links

PESS (Protein Empirical Structure Space) - Full-scale protein fold recognition

Maintained by Sarah Middleton

License (pdf)

Publication: Middleton, SA, Illuminati, J, and Kim, J. 2016. Complete fold annotation of the human proteome using a novel structural feature space. (Submitted)

Recent updates

11-10-16 - The paper version of PESS has been added, along with supplementary data.

Download for Linux

PESS 1.0.0 (81 MB) - paper version

Included in download:

All code needed for fold recognition predictions, except threading software (CNFalign_lite)
1,814 reference templates
Demo dataset

External Requirements:

Python 2 or 3 with NumPy and scikit-learn
CNFsearch, part of the RaptorX threading package. Version 1.66 was used to generate the paper results.

Github respository

Github repository - development versions

Getting started

Install required software. The Anaconda package manager is highly recommended for easier download of numpy and scikit-learn.
Unzip PESS: tar -zxvf pess_1.0.0.tar.gz
Test that everything is working by running the demo dataset (see below).

Demo example

Using demo.fa (included with PESS download):

Note: in the following, replace <PATH_TO_PESS> with the relative or absolute path to where you unzipped the PESS code, and replace <NUM_CPU> with the number of CPUs you want to utilize.

Navigate to the directory where your RaptorX executables are stored. (You must run the PESS code from within this directory or RaptorX will not work properly.)

Run the threading script using the following command:
```
 python <PATH_TO_PESS>/step1_threading.py <PATH_TO_PESS>/demo/demo.fa --cpu=<NUM_CPU> --out="<PATH_TO_PESS>/demo/demo_results" 
```
Note that this step will take several minutes per sequence, so we recommend using multiple CPU.

Run the classification script using the following command:
```
 python <PATH_TO_PESS>/step2_classification.py <PATH_TO_PESS>/demo/demo_results/demo.scoremat --cpu=<NUM_CPU> 
```
This step is faster, but can still benefit from multiple CPUs.

The output will be a file (<PATH_TO_PESS>/demo/demo_results/demo.scoremat.fold_preds.txt) containing the predicted fold for each sequence, its distance to the nearest training neighbor (which can give some idea of confidence), and whether the prediction is considered high confidence or not (based on having a nearest neighbor distance of <= 17.5).

Paper data

Benchmark & training data (123 MB)
Human proteome data & predictions (133 MB)
Supplementary tables (3 MB)

Software RepositoryKim LaboratoryComputational Evolutionary Biology

latest news

06.29.2019

06.27.2018

06.07.2018

contact us

additional links

PESS (Protein Empirical Structure Space) - Full-scale protein fold recognition

Recent updates

Download for Linux

Github respository

Getting started

Demo example

Paper data

Software Repository
Kim Laboratory
Computational Evolutionary Biology