RNAseq Sample Dataset
Maintained by Stephen Fisher | License (pdf) |
This data set consists of 93 metadata fields describing 1,347 RNAseq samples. The data provenance includes the sample source, pre-sequencing preparations, sequencing, and primary sequencing analysis. The sequencing analysis was performed using the PennSCAP-T Pipeline. The dataset is provided as PROV-XML.
This data set was curated and modified from real metadata collected from sequencing samples. When possible, metadata terms were mapped to the ORNASEQ application ontology.
Four versions of the pipeline were used for the primary, post-sequencing analysis. Each pipeline version consisted of five out of seven possible stages. The accompanying Excel table describes the metadata that uniquely defines each pipeline. Occassionally pipelines were run incorrectly either through intentional operator actions or error. These errors are included in the metadata. It is also possible for metadata from a pipeline stage to be missing, for some samples.
Possible Pipeline Stages:
- BLAST - a quality control check
- FASTQC - a quality control check
- TRIM - trim adapter contaminants
- BOWTIE - map sequencing reads to a reference
- STAR - map sequencing reads to a reference
- HTseq - assign mapped reads to genes
- VERSE - assign mapped reads to genes
Summary Table:
- Excel table (zip file)
PROV-XML DATABASE:
- PROV-XML (1 MB zip file)
Last updated June 2018