CIPResSimulationData

CIPRes Simulation Data

Sequence simulation based on complex evolution model is part of the CIPRes project. Here you can download the simulated RNA data.

The simulation was done using the same starting/root RNA sequence with secondary structure, but on trees with different number of taxa. These trees were random-sampled subtrees of a 1-million-taxon ultrametic binary tree, which resembles real phylogenetic trees and was created by Tracy Heath at Hillis/Bull lab in UTexas at Austin.

Eight sets of subtrees were used, the number of taxa ranging from 128 to 16384. Note that these trees are also ultrametric binary trees, and trees with the same number of taxa may not be the same, since again they were randomly sampled. Ultrametric here refers to the absolute time model of branching, the actual expected numbers of change in each lineage is a function of the molecule and is not clock-like.

Simulation parameters were tuned such that the simulated sequences resemble real small subunit rRNA (ssu rRNA) sequences in terms of sequence identity, number of indels, the ratio between substitution and indels, etc.

The dataset is presented in NEXUS format with three blocks. The tree block records the tree used for the simulation. The character block are the aligned ancestral and extant RNA sequences. The ancestral sequences start with "_I" (These trees use the original node labels in the 1-million-taxon tree, so the labels in the subtree are not continuous). In the crimson block, the aligned secondary structures of RNA molecules are listed in Vienna format. An example is given below:

#NEXUS
begin trees;
tree tree1 = (((((35774:95.04406000,36059:95.04406000)_I71419:35.23316000,((((38802:65.77072000,
39768:65.77072000)_I77514......_I1:0.00000000;
end;

begin characters;
dimensions ntax =255 nchar =1902;
format datatype=rna gap=-;
matrix
_I1       ---A-------------G--A--.....
_I2       ---A-------------G--A--.....
_I3       ---A-------------G--A--.....
...
128       ---C-------------G--A--.....
;
end;

begin crimson;
matrix
_I1       ---.-------------(--(--.....
_I2       ---.-------------(--(--.....
_I3       ---.-------------(--(--.....
...
128       ---.-------------(--(--.....
;
end;




  • 128-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 256-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 512-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 1024-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 2048-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 4096-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 8192-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 16384-taxon tree
    trial1 trial2 trial3 trial4 trial5 trial6 trial7 trial8 trial9 trial10 trial11 trial12 trial13 trial14 trial15 trial16 trial17 trial18 trial19 trial20
    all

  • 1 million-taxon tree (bzip2)
    trial 2 (24544 bp), trial 3 (24926 bp), trial 4 (25800 bp), trial 5 (25020 bp), trial 6 (24227 bp), trial 7 (24294 bp), trial 8 (24166 bp), trial 9 (24486 bp)
  • The CIPRes simulation data are managed by Sheng Guo. Email me for any concern.