latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

NoFold: RNA structure clustering without folding or alignment

Maintained by Sarah Middleton License (pdf)


Publication: Middleton, S.A. and Kim, J. 2014. NoFold: RNA structure clustering without folding or alignment. RNA 20: 1671-1683.

About NoFold

NoFold is an approach for characterizing and clustering RNA secondary structures without computational folding or alignment. It works by mapping each RNA sequence of interest to a structural feature space, where each coordinate within the space corresponds to the probabilistic similarity of the sequence to an empirically defined structure model (e.g. Rfam family covariance models). NoFold provides scripts for mapping sequences to this structure space, extracting any robust clusters that are formed, and annotating those clusters with structural and functional information.

Recent updates

10-03-16 - Latest nightly source (21 MB)
02-26-14 - The paper version of NoFold (1.0.1) has been added.
02-25-14 - Supplemental data and results files for the paper have been updated.

Download for Linux


Included in download:
    • All code needed to run NoFold (scoring, clustering, annotation)
    • 1,973 calibrated Rfam covariance models
    • Pre-made threshold files appropriate for datasets of up to ~4,000 sequences
    • A script for generating thresholds specific to your dataset size, if needed
    • A demo dataset for testing your installation
External Requirements:

Git repository

Getting started

  1. Install required software. Add executables to your PATH if possible, otherwise you will need to supply a path to the folders containing the executables to NoFold (see README).
  2. Unzip NoFold: tar -zxvf nofold.tar.gz
  3. Navigate to /src/ directory: cd nofold/src
  4. Test that everything is working by running the demo dataset. See README for instructions.

Example usage

Using demo1.db (included with NoFold):

python score_and_normalize.py ../demo/demo1/demo1.db --cpus=4

python nofold_pipeline.py ../demo/demo1/demo1.zNorm.pcNorm100.zNorm.bitscore ../demo/demo1/demo1.db \
--cpus=4 --bounds-file=../thresh/bounds_30seq.txt --verbose

This scores the sequences and then extracts clusters based on the within-cluster distance thresholds defined in the bounds file. It outputs a file with annotation information about each identified cluster (example).

Paper data


Supplemental files:
    • Supplemental analyses: pdf (361 KB)
    • RESS axes loadings: txt (1.8 MB)
    • RESS axes correlations: txt (246 KB)
    • RESS axes loadings for synthetic structure PCA (Fig. 2B): png (12 KB)
    • Rfam test set LDA (after CM removal): txt (716 KB)
    • Rfam test set - distribution of sequences per cluster: pdf
    • Rfam test set - distribution of cluster diameters: pdf
    • Experimental datasets - distribution of sequences per cluster: pdf
    • Experimental datasets - distribution of cluster diameters: pdf

Datasets & clustering results