-
Notifications
You must be signed in to change notification settings - Fork 0
Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/, original code from @marta- , with some work done by @yoann-dufresne)
laurentnoe/storm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SToRM - Seed-based Read Mapping tool for SOLiD or Illumina sequencing data [ABOUT] SToRM is a read mapping tool designed first for the SOLiD technology, but also available now for the classical Illumina technology. First, SToRM is based on seeding techniques adapted to the statistical characteristics of reads: the seeds and the alignment algorithms are designed to comply with the properties of the SOLiD color encoding or Illumina substitution model as well as the observed reading error distribution along the read. More information at http://bioinfo.cristal.univ-lille.fr/storm/storm.php [CONTENTS OF THIS ARCHIVE] README Makefile *.c, *.h - source code seeds/ - several seed families adapted for SOLiD/Illumina reads [COMPILING THE SOURCES] Requirements: gcc (>=4.2), OpenMP support, Linux or MacOS Command line: "make" / "make storm-color" / "make storm-nucleotide" (multithreaded) [RUNNING SToRM] Command line: "./storm -M 1 -g <path_2_reference.fas> -r <path_2_reads.fastq> [opts]" "./storm -h" for full parameter list A classic command line: "./storm-color -M 1 -g reference.fas -r reads.cs -o out.sam" "./storm-nucleotide -M 1 -g reference.fas -r reads.fq -o out.sam" Advanced command line: for "faster" /or/ "more sensitive" mapping : [0] obviously, use -M <int> to have all mapable reads being mapped !! Otherwise, only the best read is kept per reference position, which implies that not all the reads (duplicate, lower quality ones at the same reference position) are mapped. [1] use "larger" /or/ "smaller" scoring thresholds ... -z 150 -t 150 -z 120 -t 120 -z 100 -t 100 ... By default, setting the same value for both -z/-t is a good idea, unless you have ">15 indels" or "SOLiD color correction" reasons. ... -z 105 -t 120 -z 85 -t 100 -z 75 -t 90 ... note that E-value "may be somehow" checked with the ALP tool: http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html_ncbi/html/software/program.html?uid=6 [2] "increase" /or/ "decrease" the number of indels : ... -i 1 -i 3 (default value) -i 7 -i 15 for full SIMD support. ... -i 20 ... for final alignment support only : in that case, the SIMD code generates a warning explaining that it will compute at most 15 diagonals : you can thus change the SIMD threshold -z <int> to a smaller value than the final alignment threshold -t <int>. (see [1] before) [3] use "less" /or/ "more" seeds of "large" /or/ "small" weight: ... -s "####-##-#-####;###-#--##-##--###;####-#---#----#--####" -s "##-##-###-###;###--#-#--#--####;####--#--#---#---### ... note first that iedera can help you in the seed design : http://bioinfo.cristal.univ-lille.fr/yass/iedera.php ./iedera -spaced -n 3 -w 11,11 -s 11,22 -r 10000 -k -l 40 ./iedera -spaced -n 3 -w 10,10 -s 10,20 -r 10000 -k -l 40 => you can send requests to Laurent Noé for specific seeding note also that "chain processing" can be now applied in SToRM : unmmaped reads from the first step can be mapped in extra steps ... "./storm-nucleotide -M 1 -s <... big seed(s) ...> -g ref.fas -r reads.fq -o out.sam -O reads2.fq" "./storm-nucleotide -M 1 -s <... small seed(s) ...> -g ref.fas -r reads2.fq -o out2.sam -O reads3.fq" ... e.g. an example for single seed: "###-##--#-##-#-###" (weight 12) -then-> "####-#-#--##-###" (weight 11) -then-> "#####-##-###" (weight 10) theses 3 seeds are generated with : ./iedera -spaced -l 50 -w 12,12 -s 12,24 -r 10000 -k -> "###-##--#-##-#-###" ./iedera -spaced -l 45 -w 11,11 -s 11,22 -r 10000 -k -mx "###-##--#-##-#-###" -> "####-#-#--##-###" ./iedera -spaced -l 40 -w 10,10 -s 10,20 -r 10000 -k -mx "###-##--#-##-#-###,####-#-#--##-###" -> "#####-##-###" [4] "use" the -l option with smaller value (e.g. -l 10) /or/ "use" with bigger value (e.g. -l 1000)/"disable" with "<= 0" value [5] check your quality and change the -b 33 to -b 0 to disable the score scaling algorithm, or to -b 64 (for Illumina 1.5 to 1.7) accordingly ... [READING THE RESULT] SToRM generates its output in the SAM format. To visualise the result, we recommend use of the SAMtools package (http://samtools.sourceforge.net/ - http://www.htslib.org/). For both a SToRM output file "out.sam" and a reference file "ref.fas", the samtools command lines usually are: samtools faidx ref.fas samtools view -bT ref.fas out.sam > out.bam ## ## Only when needed, merge several bam files into one single : samtools merge -f out.bam out{1}.bam out{2}.bam ... out{42}.bam ## ## Only if the "-M <int>" option is set: the "out.sam"/"out.bam" ## are "read sorted" and not "ref sorted" in that case, so : samtools sort out.bam -o out_sorted.bam samtools index out_sorted.bam samtools tview out_sorted.bam ref.fas Please refer to the SAMtools documentation for more usage details.
About
Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/, original code from @marta- , with some work done by @yoann-dufresne)
Topics
Resources
Stars
Watchers
Forks
Packages 0
No packages published