GitHub - laurentnoe/storm: Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/, original code from @marta- , with some work done by @yoann-dufresne)

laurentnoe / storm Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/, original code from @marta- , with some work done by @yoann-dufresne)

bioinfo.univ-lille.fr/storm/

1 star 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
documentation		documentation
seeds		seeds
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
Makefile.avx2		Makefile.avx2
Makefile.avx2.icc		Makefile.avx2.icc
Makefile.avx512bw.icc		Makefile.avx512bw.icc
Makefile.sse		Makefile.sse
Makefile.sse.icc		Makefile.sse.icc
Makefile.sse2		Makefile.sse2
Makefile.sse2.icc		Makefile.sse2.icc
Makefile.win32.sse2		Makefile.win32.sse2
Makefile.win64.sse2		Makefile.win64.sse2
README		README
alignment.c		alignment.c
alignment.h		alignment.h
alignment_simd.c		alignment_simd.c
alignment_simd.h		alignment_simd.h
codes.c		codes.c
codes.h		codes.h
doxygenconfig		doxygenconfig
genome_map.c		genome_map.c
genome_map.h		genome_map.h
hit_map.c		hit_map.c
hit_map.h		hit_map.h
index.c		index.c
index.h		index.h
load_data.c		load_data.c
load_data.h		load_data.h
main.c		main.c
read_quality.c		read_quality.c
read_quality.h		read_quality.h
reads_data.c		reads_data.c
reads_data.h		reads_data.h
reference_data.c		reference_data.c
reference_data.h		reference_data.h
run.c		run.c
run.h		run.h
seed.c		seed.c
seed.h		seed.h
stats.c		stats.c
stats.h		stats.h
util.c		util.c
util.h		util.h

Repository files navigation

SToRM - Seed-based Read Mapping tool for SOLiD or Illumina sequencing data

[ABOUT]

 SToRM is a read mapping tool designed first for the SOLiD technology, but
 also available now for the classical Illumina technology. First, SToRM is 
 based on seeding techniques adapted to the statistical characteristics of
 reads: the seeds and the alignment algorithms are designed to comply with
 the properties of the SOLiD color encoding or Illumina substitution model 
 as well as the observed reading error distribution along the read.

 More information at  http://bioinfo.cristal.univ-lille.fr/storm/storm.php


[CONTENTS OF THIS ARCHIVE]

    README
    Makefile
    *.c, *.h - source code
    seeds/   - several seed families adapted for SOLiD/Illumina reads


[COMPILING THE SOURCES]

 Requirements: 
    gcc (>=4.2), OpenMP support, Linux or MacOS

 Command line: 
    "make" / "make storm-color" / "make storm-nucleotide" (multithreaded)



[RUNNING SToRM]

  Command line: 
    "./storm -M 1 -g <path_2_reference.fas> -r <path_2_reads.fastq> [opts]"
    "./storm -h" for full parameter list

   A classic command line:
    "./storm-color      -M 1 -g reference.fas -r reads.cs -o out.sam"
    "./storm-nucleotide -M 1 -g reference.fas -r reads.fq -o out.sam"

  Advanced command line: 

    for "faster" /or/ "more sensitive" mapping :

     [0] obviously, use -M <int> to have all mapable reads being mapped !!

         Otherwise,  only the best read is kept per reference position,
         which implies that not all the reads (duplicate, lower quality
         ones at the same reference position) are mapped.

     [1] use "larger" /or/ "smaller" scoring thresholds
      ...
      -z 150 -t 150
      -z 120 -t 120
      -z 100 -t 100
      ...
      By default, setting the same value for both -z/-t is a good idea,
      unless you have ">15 indels" or "SOLiD color correction" reasons.
      ...
      -z 105 -t 120
      -z 85 -t 100
      -z 75 -t 90
      ...
     
      note that E-value "may be somehow" checked with the ALP tool:
         http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html_ncbi/html/software/program.html?uid=6


     [2] "increase" /or/ "decrease" the number of indels :
      ...
      -i 1
      -i 3 (default value)
      -i 7
      -i 15 
         for full SIMD support.
      ...
      -i 20
      ...
         for final alignment support only : in that case, the SIMD code
         generates a warning explaining that it will compute at most 15
         diagonals : you can thus change the SIMD threshold -z <int> to
         a smaller value than the final alignment threshold -t <int>.
         (see [1] before)

     [3] use "less" /or/ "more" seeds of "large" /or/ "small" weight:  
      ...
      -s "####-##-#-####;###-#--##-##--###;####-#---#----#--####"
      -s "##-##-###-###;###--#-#--#--####;####--#--#---#---###
      ...

      note first that iedera can help you in the seed design :
         http://bioinfo.cristal.univ-lille.fr/yass/iedera.php
             ./iedera -spaced -n 3 -w 11,11 -s 11,22 -r 10000 -k -l 40
             ./iedera -spaced -n 3 -w 10,10 -s 10,20 -r 10000 -k -l 40
      => you can send requests to Laurent Noé for specific seeding

      note also that "chain processing" can be now applied in SToRM :
        unmmaped reads from the first step can be mapped in extra steps
        ...
        "./storm-nucleotide -M 1 -s <...   big seed(s) ...> -g ref.fas -r reads.fq -o out.sam -O reads2.fq"
        "./storm-nucleotide -M 1 -s <... small seed(s) ...> -g ref.fas -r reads2.fq -o out2.sam -O reads3.fq"
        ...

        e.g. an example for single seed:
          "###-##--#-##-#-###" (weight 12)
               -then-> 
          "####-#-#--##-###"   (weight 11)
               -then->
          "#####-##-###"       (weight 10)
            
        theses 3 seeds are generated with :

          ./iedera -spaced -l 50 -w 12,12 -s 12,24 -r 10000 -k   
             ->  "###-##--#-##-#-###" 
          ./iedera -spaced -l 45 -w 11,11 -s 11,22 -r 10000 -k -mx "###-##--#-##-#-###"
             ->  "####-#-#--##-###"
          ./iedera -spaced -l 40 -w 10,10 -s 10,20 -r 10000 -k -mx "###-##--#-##-#-###,####-#-#--##-###"
             ->  "#####-##-###"

     [4] "use" the -l option with smaller value (e.g. -l 10) /or/ "use"
          with bigger value (e.g. -l 1000)/"disable" with "<= 0" value

     [5] check your quality and change the -b 33 to -b 0 to disable the 
         score scaling algorithm, or to -b 64 (for Illumina 1.5 to 1.7) 
         accordingly ...


[READING THE RESULT]

 SToRM generates its output in the SAM format. To visualise the result, we 
 recommend use of the SAMtools package (http://samtools.sourceforge.net/ - http://www.htslib.org/).

 For both a SToRM output file "out.sam" and a reference file "ref.fas", the 
 samtools command lines usually are:


     samtools faidx ref.fas

     samtools view -bT ref.fas out.sam > out.bam

     ##
     ## Only when needed, merge several bam files into one single : 
     samtools merge -f out.bam   out{1}.bam out{2}.bam ... out{42}.bam

     ##
     ## Only if the "-M <int>" option is set: the "out.sam"/"out.bam"
     ## are "read sorted" and not "ref sorted" in that case, so : 
     samtools sort out.bam -o out_sorted.bam

     samtools index out_sorted.bam

     samtools tview out_sorted.bam ref.fas


 Please refer to the SAMtools documentation for more usage details.

About

Illumina (and SOLiD) sensitive read mapping tool (cloned from svn://scm.gforge.inria.fr/svnroot/storm/, original code from @marta- , with some work done by @yoann-dufresne)

bioinfo.univ-lille.fr/storm/