latest news


Sample data provenance from 1,347 RNAseq samples.

access info ...


ORNASEQ: Ontology for RNA sequencing.

access info ...


Documentation outlining our RNA-Seq schema.

schema ...

repo structure ...

qc metrics ...


PIVOT released (replacing IDV); an extensive R-based GUI for the visualization and analysis of RNASeq data.

access info ...

PESS (Protein Empirical Structure Space) - Full-scale protein fold recognition

Maintained by Sarah Middleton License (pdf)

Publication: Middleton, SA, Illuminati, J, and Kim, J. 2016. Complete fold annotation of the human proteome using a novel structural feature space. (Submitted)

Recent updates

11-10-16 - The paper version of PESS has been added, along with supplementary data.

Download for Linux

Included in download:
    • All code needed for fold recognition predictions, except threading software (CNFalign_lite)
    • 1,814 reference templates
    • Demo dataset

External Requirements:

Getting started

  1. Install required software. The Anaconda package manager is highly recommended for easier download of numpy and scikit-learn.
  2. Unzip PESS: tar -zxvf pess_1.0.0.tar.gz
  3. Test that everything is working by running the demo dataset (see below).

Demo example

Using demo.fa (included with PESS download):

Note: in the following, replace <PATH_TO_PESS> with the relative or absolute path to where you unzipped the PESS code, and replace <NUM_CPU> with the number of CPUs you want to utilize.
  1. Navigate to the directory where your RaptorX executables are stored. (You must run the PESS code from within this directory or RaptorX will not work properly.)

  2. Run the threading script using the following command:

     python <PATH_TO_PESS>/ <PATH_TO_PESS>/demo/demo.fa --cpu=<NUM_CPU> --out="<PATH_TO_PESS>/demo/demo_results" 
    Note that this step will take several minutes per sequence, so we recommend using multiple CPU.

  3. Run the classification script using the following command:

     python <PATH_TO_PESS>/ <PATH_TO_PESS>/demo/demo_results/demo.scoremat --cpu=<NUM_CPU> 
    This step is faster, but can still benefit from multiple CPUs.
The output will be a file (<PATH_TO_PESS>/demo/demo_results/demo.scoremat.fold_preds.txt) containing the predicted fold for each sequence, its distance to the nearest training neighbor (which can give some idea of confidence), and whether the prediction is considered high confidence or not (based on having a nearest neighbor distance of <= 17.5).

Paper data