Skip to content

Module: Sequencing

Niema Moshiri edited this page Jul 9, 2018 · 34 revisions

The Sequencing module simulates sequencing imperfections, such as the following:

  • Sequence subsampling per individual
  • Sequencing error
  • Post-processing
  • Consensus (ambiguity, etc.)

See the source code to see what is defined by the abstract class.

List of Implementations

  • Uses ART to simulate realistic Roche 454 reads (amplicon sequencing)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_454_path: The path to your art_454 executable (or simply "art_454" if it is in your PATH variable)
    • art_454_options: The command-line arguments with which to run art_454 (excluding <-A|-B>, <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, and <#_READS/#_READ_PAIRS_PER_AMPLICON>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_454_amplicon_mode: The desired mode of amplicon sequencing
      • Specify "single" for single-end amplicon sequencing
      • Specify "paired" for paired-end amplicon sequencing
    • art_454_reads_pairs_per_amplicon: Number of reads (single-end) or read pairs (paired-end) per amplicon
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic Roche 454 reads (paired-end)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_454_path: The path to your art_454 executable (or simply "art_454" if it is in your PATH variable)
    • art_454_options: The command-line arguments with which to run art_454 (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <FOLD_COVERAGE>, <MEAN_FRAG_LEN>, and <STD_DEV>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_454_fold_coverage: The desired fold of read coverage
    • art_454_mean_frag_len: The average DNA fragment size for paired-end read simulation
    • art_454_std_dev: The standard deviation of the DNA fragment size for paired-end read simulation
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic Roche 454 reads (single-end)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_454_path: The path to your art_454 executable (or simply "art_454" if it is in your PATH variable)
    • art_454_options: The command-line arguments with which to run art_454 (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, and <FOLD_COVERAGE>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_454_fold_coverage: The desired fold of read coverage
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic Illumina NGS sequence data from the true sequences
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_illumina_path: The path to your art_illumina executable (or simply "art_illumina" if it is in your PATH variable)
    • art_illumina_options: The command-line arguments with which to run art_illumina (excluding -i and -o)
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (amplicon mate-pair, F3-R3)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ>, and <READ_PAIRS_PER_AMPLICON>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read: The desired length of F3/R3 reads (max 75)
    • art_SOLiD_read_pairs_per_amplicon: The desired number of read pairs per amplicon
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (amplicon paired-end, F3-F5)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ_F3>, <LEN_READ_F5>, and <READ_PAIRS_PER_AMPLICON>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read_F3: The desired length of F3 reads (max 75)
    • art_SOLiD_len_read_F5: The desired length of F5 reads (max 75)
    • art_SOLiD_read_pairs_per_amplicon: The desired number of read pairs per amplicon
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (amplicon single-end, F3)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ>, and <READS_PER_AMPLICON>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read: The desired length of F3 reads (max 75)
    • art_SOLiD_reads_per_amplicon: The desired number of reads per amplicon
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (mate-pair, F3-R3)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ>, and <FOLD_COVERAGE>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read: The desired length of F3/R3 reads (max 75)
    • art_SOLiD_fold_coverage: The desired fold of read coverage
    • art_SOLiD_mean_frag_len: The mean fragment size for mate-pair read simulation
    • art_SOLiD_std_dev: The standard deviation of the fragment size for mate-pair simulation
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (paired-end, F3-F5)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ_F3>, <LEN_READ_F5>, <FOLD_COVERAGE>, <MEAN_FRAG_LEN>, and <STD_DEV>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read_F3: The desired length of F3 reads (max 75)
    • art_SOLiD_len_read_F5: The desired length of F5 reads (max 75)
    • art_SOLiD_fold_coverage: The desired fold of read coverage
    • art_SOLiD_mean_frag_len: The mean fragment size for mate-pair read simulation
    • art_SOLiD_std_dev: The standard deviation of the fragment size for mate-pair simulation
    • out_dir: The simulation's output directory
  • Uses ART to simulate realistic SOLiD reads (single-end, F3)
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • art_SOLiD_path: The path to your art_SOLiD executable (or simply "art_SOLiD" if it is in your PATH variable)
    • art_SOLiD_options: The command-line arguments with which to run art_SOLiD (excluding <INPUT_SEQ_FILE>, <OUTPUT_FILE_PREFIX>, <LEN_READ>, and <FOLD_COVERAGE>)
      • To use default settings, simply use the empty string (i.e., "")
    • art_SOLiD_len_read: The desired length of F3 reads (max 75)
    • art_SOLiD_fold_coverage: The desired fold of read coverage
    • out_dir: The simulation's output directory
  • Uses DWGSIM to simulate realistic NGS sequence data from the true sequences
  • Generates one sequencing run per sampled individual
  • Requirements:
    • DWGSIM
  • Config Parameters:
    • dwgsim_path: The path to your DWGSIM executable (or simply "dwgsim" if it is in your PATH variable)
    • dwgsim_options: The command-line options with which to run DWGSIM (just the options, not <in.ref.fa> or <out.prefix>)
      • To use default settings, simply use the empty string (i.e., "")
    • out_dir: The simulation's output directory
  • Uses Grinder to simulate realistic Sanger sequence data from the true sequences
  • Generates one sequencing run per sampled individual
  • Requirements:
  • Config Parameters:
    • grinder_path: The path to your Grinder executable (or simply "grinder" if it is in your PATH variable)
    • out_dir: The simulation's output directory
  • Do not output any sequences
  • Requirements:
    • None
  • Config Parameters:
    • None
  • Returns full error-free sequences for all viruses
  • Requirements:
    • None
  • Config Parameters:
    • out_dir: The simulation's output directory
Clone this wiki locally