RNA-Seq Data Quality Control
Maintained by Stephen Fisher | License (pdf) |
This document describes the quality control metrics tracked by the Kim and Eberwine labs for next-generation sequencing samples.
- Contamination:
- Percent blast hits to non-target species. To compute this metric we use 5000 randomly selected reads from each sample and then blast them against the 'nr' blast database. We report the number of total hits, hits to the target species and hits to each of the primary non-target speices. This has proven particularly helpful in identifying novel sources of contamination.
- Sample preparation issues: These metrics are typically available from the output of the primary analysis pipeline.
- Average base quality.
- Percent reads trimmed.
- Percent reads discarded due to trimming.
- Percent reads uniquely mapped.
- Percent reads multi-mapped.
- Percent reads not mapped.
- Percent reads mapped to exonic, intronic, mitochondiral, and intergenic regions.
- Average length of the reads mapped.
- Number of spike-in reads.
- Number reads missing mates.
Last updated July 2017