Problem with Sniper v1.6.1:
My first run with Sniper v1.5.9 went smoothly with 19-anc1 and 19-27. I successfully mapped and called SNPs to the S288C r64 reference genome.
I tried different sets of flags just to see how it responded and came up with this fine-tuning setting:
* -a -O --index -k 2 -d 10 -x 4 --bowtie /home/hoagiang/bin/bowtie-0.12.7 --kb --qual33 --minins 100 --maxins 500 (mapping via bowtie)
* --nosort -s --saveass -Q 40 --fwer 0.01,100 -x 4 (calling SNPs)
Summary stats have been achieved via those two runs:
* mapping: ~75% reads, perfect:mismatch=0.6:0.4, multiplicity=1.1-1.2
* SNPs: ~60k unique SNPs, homozygous=95%, heterozygous=5%
Shortly later, I upgraded Sniper to v1.6.1 to fix the bugs (insert size, multi-processors). However, new runs with 19-anc2 and YPS2073 posed serious problems:
* mapping: down to 63% reads, perfect:mismatch=0.4:0.6, multiplicity=1.1-1.2
* SNPs: 2-3k SNPs, mostly non-unique.
I then checked the quality of these samples. Accidentally, both samples are from the same lane 5, and has bad quality score at base 3 and 11. I expected this to affect mapping rate but not number of SNPs.
Going further in my investigation, I found that the total number of hypotheses has down enormously from ~11M to just 0.5M, which explained the small number of significant SNPs and the non-unique issue. Many loci has missed in the "mother" files.
I went back and ran Sniper in different ways: re-genotyping with different stringency, re-indexing then re-genotyping, with failed and succeeded samples (19-anc1, 19-anc2). Same old results.
I also trimmed the first 11 bases in the 19-anc2, ran Sniper v1.6.1, but yielded the same result.
One last chance is to downgrade to Sniper v1.5.9. And it works (by simply looking at the number of hypotheses).
I'm personally thinking there is something with indexing. So, I'm trying now to re-index with v1.5.9 and then call SNPs.
So far, these samples are mapped with v1.5.9: 19-anc1, 19-27, YPS2055, 2060, 2066, 2067, 2079, 3060.
19-anc2, 19-7, 19-8 and 2073 were first mapped with v1.6.1 (in mapping_v1.6.1/), later mapped with v1.5.9 (in mapping_v1.5.9/)
Updated Sniper setting (using v1.5.9):
* -a -O --index -k 2 -d 20 -x 4 --bowtie /home/hoagiang/bin/bowtie-0.12.7 --kb --qual33 --minins 100 --maxins 500 (mapping via bowtie)
* --nosort -s --saveass -Q 40 --fwer 0.01,100 (calling SNPs)
* to redo significance filtering: -o ???/ -i ~/read_maps_all/ -id ??? --resig --fwer 0.05,100
The FLAG field in SAM files: how to interprete it?
Filtering out unmapped can be done easily with SAMtools when combining with different flags. For example, one can run "samtools view -F 4" to remove all the unmapped reads out of the SAM files. One can go further by only extracting the reads mapping to specific strand, running with "-f 16". This is the FLAG field in the SAM format. However, I figure it may have a value of zero. I'm looking around and seems like not clear explanation what zero means in the mapping. The consensus is it's a mapped read to the forward strand, while 16 means a mapped read to the reverse strand. And keep in mind that it's NOT working with the "-F/f" option when you run SAMtools.
Simple way to count the number of unique reads from a SAM file
cat reads.sam | cut -f 3 | sort | uniq -c
How to compile Kent source utilities in the cluster: http://iamphioxus.org/2012/05/17/build-ucsc-kent-utilities-on-rice-sugr-cluster/
UCSC whole genome alignment pipeline: http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto
Popular commands for our cluster: slogin; cup.pl; top; yppasswd; history | grep vnc; ps aux | grep Xvnc; vncserver -geometry 1600x1024 :13; vncserver -kill :13; export DISPLAY=kimclust15:13;