latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

Penn SCAP-T Pipeline: Documentation

Back ↩ License (pdf)

Module: PIPELINE

This module will run a single dataset through the various analysis modules, depending on the selected data type (RNASeq or WGS). The modules used are hardcoded in the ngs_PIPELINE.sh file.

Usage:
    ngs.sh pipeline [-i inputDir] [-o outputDir] [-t RNASeq | RNASeq_Stranded | RNASeq_Human | WGS] -p numProc -s species [-se] sampleID
Input:
    see individual commands
Output:
    see individual commands
Requires:
    see individual commands
OPTIONS:
    -i - parent directory containing subdirectory with compressed fastq files (default: ./raw). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie inputDir/sampleID).
    -o - directory containing subdirectory with analysis files (default: ./analyzed). This is the parent directory of the sample-specific directory. The sampleID will be used to complete the directory path (ie outputDir/sampleID).
    -t type - RNASeq or WGS (Whole Genome Sequencing) (default: RNASeq). RNASeq_Stranded assumes stranded reads for HTSeq counting and will generate intron counts. RNASeq_Human is the same as RNASeq_Stranded but also uses 'gene_name' for the name of the features in the HTSeq GTF file.
    -p numProc - number of cpu to use.
    -s species - species from repository: /lab/repo/resources.
    -se - single-end reads (default: paired-end)

This will process sequencing data using either an RNASeq or WGS (Whole Genome Sequencing) pipeline. For RNASeq the modules used are: init, fastqc, blast, trim, star, post, htseq, blastdb, and rsync. For WGS the modules used are: init, fastqc, blast, trim, bowtie, SPAdes, post, and rsync. See individual modules for documentation.

RNASeq modules and arguments (in order):

INIT
FASTQC
BLAST
TRIM -m 20 -q 53 -rAT 26 -rN -c $REPO_LOCATION/trim/contaminants.fa
FASTQC -i trim -o fastqc.trim
STAR
HTSEQ
 * if species = "hg38.gencode21.stranded" then [-stranded -introns]
POST
RSYNC