latest news

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

07.17.2017

Documentation outlining our RNA-Seq schema.

schema ...

repo structure ...

qc metrics ...

03.27.2017

PIVOT released (replacing IDV); an extensive R-based GUI for the visualization and analysis of RNASeq data.

access info ...

RNAseq Sample Dataset

Maintained by Stephen Fisher   License (pdf)

This data set consists of 93 metadata fields describing 1,347 RNAseq samples. The data provenance includes the sample source, pre-sequencing preparations, sequencing, and primary sequencing analysis. The sequencing analysis was performed using the PennSCAP-T Pipeline. The dataset is provided as PROV-XML.

This data set was curated and modified from real metadata collected from sequencing samples. When possible, metadata terms were mapped to the ORNASEQ application ontology.

Four versions of the pipeline were used for the primary, post-sequencing analysis. Each pipeline version consisted of five out of seven possible stages. The accompanying Excel table describes the metadata that uniquely defines each pipeline. Occassionally pipelines were run incorrectly either through intentional operator actions or error. These errors are included in the metadata. It is also possible for metadata from a pipeline stage to be missing, for some samples.

Possible Pipeline Stages:

    • BLAST - a quality control check
    • FASTQC - a quality control check
    • TRIM - trim adapter contaminants
    • BOWTIE - map sequencing reads to a reference
    • STAR - map sequencing reads to a reference
    • HTseq - assign mapped reads to genes
    • VERSE - assign mapped reads to genes

Summary Table:


PROV-XML DATABASE:




Last updated June 2018