PESS (Protein Empirical Structure Space) - Full-scale protein fold recognition
Maintained by Sarah Middleton | License (pdf) |
Publication: Middleton, SA, Illuminati, J, and Kim, J. 2016. Complete fold annotation of the human proteome using a novel structural feature space. (Submitted)
Recent updates
11-10-16 - The paper version of PESS has been added, along with supplementary data.
Download for Linux
- PESS 1.0.0 (81 MB) - paper version
Included in download:
- All code needed for fold recognition predictions, except threading software (CNFalign_lite)
- 1,814 reference templates
- Demo dataset
External Requirements:
- Python 2 or 3 with NumPy and scikit-learn
- CNFsearch, part of the RaptorX threading package. Version 1.66 was used to generate the paper results.
Github respository
- Github repository - development versions
Getting started
- Install required software. The Anaconda package manager is highly recommended for easier download of numpy and scikit-learn.
- Unzip PESS: tar -zxvf pess_1.0.0.tar.gz
- Test that everything is working by running the demo dataset (see below).
Demo example
Using demo.fa (included with PESS download):Note: in the following, replace <PATH_TO_PESS> with the relative or absolute path to where you unzipped the PESS code, and replace <NUM_CPU> with the number of CPUs you want to utilize.
- Navigate to the directory where your RaptorX executables are stored. (You must run the PESS code from within this directory or RaptorX will not work properly.)
- Run the threading script using the following command:
python <PATH_TO_PESS>/step1_threading.py <PATH_TO_PESS>/demo/demo.fa --cpu=<NUM_CPU> --out="<PATH_TO_PESS>/demo/demo_results"
Note that this step will take several minutes per sequence, so we recommend using multiple CPU. - Run the classification script using the following command:
python <PATH_TO_PESS>/step2_classification.py <PATH_TO_PESS>/demo/demo_results/demo.scoremat --cpu=<NUM_CPU>
This step is faster, but can still benefit from multiple CPUs.
Paper data
- Benchmark & training data (123 MB)
- Human proteome data & predictions (133 MB)
- Supplementary tables (3 MB)