latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

PESS (Protein Empirical Structure Space) - Full-scale protein fold recognition

Maintained by Sarah Middleton License (pdf)


Publication: Middleton, SA, Illuminati, J, and Kim, J. 2016. Complete fold annotation of the human proteome using a novel structural feature space. (Submitted)

Recent updates

11-10-16 - The paper version of PESS has been added, along with supplementary data.

Download for Linux


Included in download:
    • All code needed for fold recognition predictions, except threading software (CNFalign_lite)
    • 1,814 reference templates
    • Demo dataset

External Requirements:

Github respository

Getting started

  1. Install required software. The Anaconda package manager is highly recommended for easier download of numpy and scikit-learn.
  2. Unzip PESS: tar -zxvf pess_1.0.0.tar.gz
  3. Test that everything is working by running the demo dataset (see below).

Demo example

Using demo.fa (included with PESS download):

Note: in the following, replace <PATH_TO_PESS> with the relative or absolute path to where you unzipped the PESS code, and replace <NUM_CPU> with the number of CPUs you want to utilize.
  1. Navigate to the directory where your RaptorX executables are stored. (You must run the PESS code from within this directory or RaptorX will not work properly.)

  2. Run the threading script using the following command:

     python <PATH_TO_PESS>/step1_threading.py <PATH_TO_PESS>/demo/demo.fa --cpu=<NUM_CPU> --out="<PATH_TO_PESS>/demo/demo_results" 
    Note that this step will take several minutes per sequence, so we recommend using multiple CPU.

  3. Run the classification script using the following command:

     python <PATH_TO_PESS>/step2_classification.py <PATH_TO_PESS>/demo/demo_results/demo.scoremat --cpu=<NUM_CPU> 
    This step is faster, but can still benefit from multiple CPUs.
The output will be a file (<PATH_TO_PESS>/demo/demo_results/demo.scoremat.fold_preds.txt) containing the predicted fold for each sequence, its distance to the nearest training neighbor (which can give some idea of confidence), and whether the prediction is considered high confidence or not (based on having a nearest neighbor distance of <= 17.5).

Paper data