Usage:
    ngs.sh rmdup [-i inputDir] [-se] sampleID
Input:
    sampleID/inputDir/unaligned_1.fq
    sampleID/inputDir/unaligned_2.fq (paired-end reads)
Output:
    sampleID/rmdup/unaligned_1.fq
    sampleID/rmdup/unaligned_1.fq
    sampleID/rmdup/sampleID.rmdup.stats.txt
Requires:
    removeDuplicates.py
Options:
    -i inputDir - location of source files (default: init).
    -se - single-end reads (default: paired-end)

Remove duplicate reads. Reads are considered duplicates if they exactly match. For paired-end reads, the mate pairs both must exactly match to be considered duplicates. This is very RAM intensive, requiring RAM amounts up to three times the input file size (e.g. if your fastq files total 20GB then up to 60GB RAM may be used when removing duplicates).

Software RepositoryKim LaboratoryComputational Evolutionary Biology

latest news

06.29.2019

06.27.2018

06.07.2018

contact us

additional links

Penn SCAP-T Pipeline: Documentation

Module: RMDUP

Software Repository
Kim Laboratory
Computational Evolutionary Biology