latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

Penn SCAP-T Pipeline: Documentation

Back ↩ License (pdf)

Module: INIT

This module will extract the raw reads file(s) from a repository and prepare them for processing.

Usage:
    ngs.sh init [-i inputDir] [-se] sampleID
Input:
    inputDir/sampleID/*_R1_*.gz
    inputDir/sampleID/*_R2_*.gz (paired-end reads)
Output:
    sampleID/init/unaligned_1.fq
    sampleID/init/unaligned_2.fq (paired-end reads)
Options:
    -i - parent directory containing subdirectory with compressed fastq files (default: ./raw). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie inputDir/sampleID).
    -se - single-end reads (default: paired-end)

By default this expects the directory './raw/sampleID' that contains the demultiplexed reads, for the sample called sampleID. The demultiplexed reads need to be gzipped. The files containing the first reads need to include '_R1_' in their filenames and the second read files need to contain '_R2_' in the filenames (note the placement of the underscores). If inputDir is used then the read files are expected to reside in 'inputDir/sampleID'.

This will uncompress the raw files and place them in the directory './sampleID/init. Output files are named 'unaligned_1.fq' (first reads) and 'unaligned_2.fq' (second reads). Only unaligned_1.fq will be generated in the case of single-end reads.