latest news


Sample data provenance from 1,347 RNAseq samples.

access info ...


ORNASEQ: Ontology for RNA sequencing.

access info ...


Documentation outlining our RNA-Seq schema.

schema ...

repo structure ...

qc metrics ...


PIVOT released (replacing IDV); an extensive R-based GUI for the visualization and analysis of RNASeq data.

access info ...

RNA-Seq Data Quality Control

Maintained by Stephen Fisher License (pdf)

This document describes the quality control metrics tracked by the Kim and Eberwine labs for next-generation sequencing samples.

  • Contamination:
    • Percent blast hits to non-target species. To compute this metric we use 5000 randomly selected reads from each sample and then blast them against the 'nr' blast database. We report the number of total hits, hits to the target species and hits to each of the primary non-target speices. This has proven particularly helpful in identifying novel sources of contamination.
  • Sample preparation issues: These metrics are typically available from the output of the primary analysis pipeline.
    • Average base quality.
    • Percent reads trimmed.
    • Percent reads discarded due to trimming.
    • Percent reads uniquely mapped.
    • Percent reads multi-mapped.
    • Percent reads not mapped.
    • Percent reads mapped to exonic, intronic, mitochondiral, and intergenic regions.
    • Average length of the reads mapped.
    • Number of spike-in reads.
    • Number reads missing mates.

Last updated July 2017