Skip to content

Running BayesTyper using snakemake

Jonas Andreas Sibbesen edited this page Jun 15, 2018 · 5 revisions

The BayesTyper workflow involves a number of steps from generating variant candidates, over counting k-mers to running BayesTyper itself as described in more detail in the Readme.

We provide a snakemake workflow that automates the execution of the different parts of the pipeline all the way from BAM file(s) to final genotype estimates ready for downstream analysis.

Setting up BayesTyper with snakemake

  1. Install snakemake
    • Note: Snakemake requires Python3 and the BayesTyper workflow further uses the Pandas python package. Use conda create -n <project_name> python=3.6 snakemake pandas and conda activate <project_name> to quickly get up and running
  2. Download the latest BayesTyper release ( snakemake files included)
  3. Edit the config.yaml and samples.tsv files located together with the workflow and move them to the directory where you want the outputs from the pipeline to be placed.
    • Important: Do not change the header of samples.tsv

Running BayesTyper using snakemake

  1. Navigate to the directory containing the config.yaml and samples.tsv files
  2. To execute the workflow on a cluster using the slurm scheduler use snakemake --jobs <max_num_jobs> --cluster "sbatch -t {params.runtime} --mem=<memory> --cores=<num_cores>" --snakefile <path_to_call_candidates_and_genotype.smk>.
    • Note: To execute in other (cluster) environments, please refer to the snakemake documentation.
    • Note: We recommend using at least 128gb memory (--mem=128g) and as many cores as available.

Changing the candidate calling setup

The default workflow executes GATK-HaplotypeCaller, Platypus and Manta on all samples and combines the results with the BayesTyper variation prior (included in the data bundle). If you wish to remove a caller (e.g. GATK to speed things up) simply comment out the corresponding input line in the bayestyper_combine_variants rule in the call_candidates.smk file.