latest news

06.29.2019

VisCello; for visualization of single cell data.

access info ...

06.27.2018

Sample data provenance from 1,347 RNAseq samples.

access info ...

06.07.2018

ORNASEQ: Ontology for RNA sequencing.

access info ...

RNA-Seq Data Repository

Maintained by Stephen Fisher License (pdf)

This document describes the storage structure used by the Kim and Eberwine labs to store NGS samples.

  • Each NGS run is called a "sequencingRun" or "run" (e.g. all lanes in an Illumina flow cell)
  • Each sequencingRun is uniquely labeled and put in its own directory (e.g. incremental or UUID's)
  • SequencingRun directories contain the following subdirectories:
    • raw: This contains demultiplexed and otherwise unprocessed fastq files. After demultiplexing, each sample has one or more fastq files. A subdirectory is created for each sample and the sample-specific fastq files are compressed (i.e. gzip) and placed into the respective subdirectory. Sample names are used as the subdirectory names. The fastq files containing the first read are expected to have the string "_R1" in their file names. The files with the second reads must have an "_R2" in their file names.
    • analyzed: This contains the output files from the primary analysis pipeline. As with the raw subdirectory, here too a subdirectory is created for each sample using the sample name as the subdirectory name. Each stage of the analysis pipeline gets its own subdirectory in the respective sample subdirectory.
    • src: This contains the 'source' data supplied by the sequencing center. This might include BCL files and demultiplexing configuration files. Any sequencing-specific data files are stored here.
    • info: This contains any sequencing files that don't fit into the other directories. For example, descriptive files from upstream sample preparations such as bioanalyzer traces and images.

The file structure is illustrated in the image below. Clicking on the image below will load a larger version of the image.


Last updated July 2017