latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

CIPRES Simulation Data

Maintained by Sheng Guo License (pdf)

Sequence simulation based on complex evolution model is part of the CIPRes project. Here you can download the simulated RNA data.

The simulation was done using the same starting/root RNA sequence with secondary structure, but on trees with different number of taxa. These trees were random-sampled subtrees of a 1-million-taxon ultrametic binary tree, which resembles real phylogenetic trees and was created by Tracy Heath at Hillis/Bull lab in UTexas at Austin.

Eight sets of subtrees were used, the number of taxa ranging from 128 to 16384. Note that these trees are also ultrametric binary trees, and trees with the same number of taxa may not be the same, since again they were randomly sampled. Ultrametric here refers to the absolute time model of branching, the actual expected numbers of change in each lineage is a function of the molecule and is not clock-like.

Simulation parameters were tuned such that the simulated sequences resemble real small subunit rRNA (ssu rRNA) sequences in terms of sequence identity, number of indels, the ratio between substitution and indels, etc.

The dataset is presented in NEXUS format with three blocks. The tree block records the tree used for the simulation. The character block are the aligned ancestral and extant RNA sequences. The ancestral sequences start with "_I" (These trees use the original node labels in the 1-million-taxon tree, so the labels in the subtree are not continuous). In the crimson block, the aligned secondary structures of RNA molecules are listed in Vienna format. An example is given below:

 
#NEXUS
begin trees; 
 tree tree1 = (((((35774:95.04406000,36059:95.04406000)_I71419:35.23316000, ((((38802:65.77072000,39768:65.77072000)_I77514......_I1:0.00000000;
end;

begin characters; 
   dimensions ntax =255 nchar =1902; 
   format datatype=rna gap=-; 
matrix
_I1       ---A-------------G--A--.....
_I2       ---A-------------G--A--.....
_I3       ---A-------------G--A--.....
...
128       ---C-------------G--A--.....
;
end;

begin crimson; 
matrix
_I1       ---.-------------(--(--.....
_I2       ---.-------------(--(--.....
_I3       ---.-------------(--(--.....
...
128       ---.-------------(--(--.....
;
end;