latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

Penn SCAP-T Pipeline: Documentation

Back ↩ License (pdf)

Module: RMDUP

This module will remove duplicate reads.

Usage:
    ngs.sh rmdup [-i inputDir] [-se] sampleID
Input:
    sampleID/inputDir/unaligned_1.fq
    sampleID/inputDir/unaligned_2.fq (paired-end reads)
Output:
    sampleID/rmdup/unaligned_1.fq
    sampleID/rmdup/unaligned_1.fq
    sampleID/rmdup/sampleID.rmdup.stats.txt
Requires:
    removeDuplicates.py
Options:
    -i inputDir - location of source files (default: init).
    -se - single-end reads (default: paired-end)

Remove duplicate reads. Reads are considered duplicates if they exactly match. For paired-end reads, the mate pairs both must exactly match to be considered duplicates. This is very RAM intensive, requiring RAM amounts up to three times the input file size (e.g. if your fastq files total 20GB then up to 60GB RAM may be used when removing duplicates).