NoFold is an approach for characterizing and clustering RNA secondary structures without computational folding or alignment. It works by mapping each RNA sequence of interest to a structural feature space, where each coordinate within the space corresponds to the probabilistic similarity of the sequence to an empirically defined structure model (e.g. Rfam family covariance models). NoFold provides scripts for mapping sequences to this structure space, extracting any robust clusters that are formed, and annotating those clusters with structural and functional information.

Recent updates

10-03-16 - Latest nightly source (21 MB)
02-26-14 - The paper version of NoFold (1.0.1) has been added.
02-25-14 - Supplemental data and results files for the paper have been updated.

Download for Linux

NoFold 1.0.1 (19 MB) - latest stable release
README (17 KB)

Included in download:

All code needed to run NoFold (scoring, clustering, annotation)
1,973 calibrated Rfam covariance models
Pre-made threshold files appropriate for datasets of up to ~4,000 sequences
A script for generating thresholds specific to your dataset size, if needed
A demo dataset for testing your installation

External Requirements:

Python 2.X - Used to run the most of the code
Infernal (v.1.0.2) (infernal-1.0.2.tar.gz, 15.2 MB) - Used for scoring
R with fastcluster package - Used for clustering and some statistics
LocARNA - Used to predict consensus structure of clusters
RNAz (optional) - used to get additional stats on cluster structures

Git repository

Github repository

Getting started

Install required software. Add executables to your PATH if possible, otherwise you will need to supply a path to the folders containing the executables to NoFold (see README).
Unzip NoFold: tar -zxvf nofold.tar.gz
Navigate to /src/ directory: cd nofold/src
Test that everything is working by running the demo dataset. See README for instructions.

Example usage

Using demo1.db (included with NoFold):

python score_and_normalize.py ../demo/demo1/demo1.db --cpus=4

python nofold_pipeline.py ../demo/demo1/demo1.zNorm.pcNorm100.zNorm.bitscore ../demo/demo1/demo1.db \
--cpus=4 --bounds-file=../thresh/bounds_30seq.txt --verbose

This scores the sequences and then extracts clusters based on the within-cluster distance thresholds defined in the bounds file. It outputs a file with annotation information about each identified cluster (example).

Paper data

Supplemental files:

Supplemental analyses: pdf (361 KB)
RESS axes loadings: txt (1.8 MB)
RESS axes correlations: txt (246 KB)
RESS axes loadings for synthetic structure PCA (Fig. 2B): png (12 KB)
Rfam test set LDA (after CM removal): txt (716 KB)
Rfam test set - distribution of sequences per cluster: pdf
Rfam test set - distribution of cluster diameters: pdf
Experimental datasets - distribution of sequences per cluster: pdf
Experimental datasets - distribution of cluster diameters: pdf

Datasets & clustering results

Synthetic structure test set
Rfam 20-family test set
Full Rfam "cross validation" set
Dendritically localized transcripts, 3' UTRs
Dendritically localized transcripts, retained introns
Non-canonical translation init sites
All paper data (510 MB)

Software RepositoryKim LaboratoryComputational Evolutionary Biology

latest news

06.29.2019

06.27.2018

06.07.2018

contact us

additional links

NoFold: RNA structure clustering without folding or alignment

About NoFold