Kim Lab Website

Software Repository
Kim Laboratory
Computational Evolutionary Biology

latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

contact us

Biology Department
School of Arts and Sciences
University of Pennsylvania
301a Lynch Laboratory
433 S University Avenue
Philadelphia, PA 19104 USA

off: (215) 746-5187
lab: (215) 898-8395
fax: (215) 898-8780

email: junhyong@sas.upenn.edu

additional links

RNAseq Sample Dataset

Maintained by Stephen Fisher

License (pdf)

This data set consists of 93 metadata fields describing 1,347 RNAseq samples. The data provenance includes the sample source, pre-sequencing preparations, sequencing, and primary sequencing analysis. The sequencing analysis was performed using the PennSCAP-T Pipeline. The dataset is provided as PROV-XML.

This data set was curated and modified from real metadata collected from sequencing samples. When possible, metadata terms were mapped to the ORNASEQ application ontology.

Four versions of the pipeline were used for the primary, post-sequencing analysis. Each pipeline version consisted of five out of seven possible stages. The accompanying Excel table describes the metadata that uniquely defines each pipeline. Occassionally pipelines were run incorrectly either through intentional operator actions or error. These errors are included in the metadata. It is also possible for metadata from a pipeline stage to be missing, for some samples.

Possible Pipeline Stages:

BLAST - a quality control check
FASTQC - a quality control check
TRIM - trim adapter contaminants
BOWTIE - map sequencing reads to a reference
STAR - map sequencing reads to a reference
HTseq - assign mapped reads to genes
VERSE - assign mapped reads to genes

Summary Table:

Excel table (zip file)

PROV-XML DATABASE:

PROV-XML (1 MB zip file)

Last updated June 2018

Software RepositoryKim LaboratoryComputational Evolutionary Biology

latest news

06.29.2019

06.27.2018

06.07.2018

contact us

additional links

RNAseq Sample Dataset

Possible Pipeline Stages:

Summary Table:

PROV-XML DATABASE:

Software Repository
Kim Laboratory
Computational Evolutionary Biology