Somatic Amplicon Pipeline

A snakemake pipeline for calling variants from amplicon data using Pisces.

What it does

The pipeline first replaces the read group header using picard tools for compatibility with GATK3 IndelRealigner. Indel realignment is then performed and the pisces variant caller run. For more information see:

Installation

The only prerequisite is snakemake. To install snakemake, you will need to install a Conda-based Python3 distribution. For this, Mambaforge is recommended. Once mamba is installed, snakemake can be installed like so:

mamba create -c conda-forge -c bioconda -n snakemake snakemake

Now activate the snakemake environment (you'll have to do this every time you want to run the pipeline):

conda activate snakemake

Now clone the repository:

git clone https://github.com/WEHIGenomicsRnD/somatic-amplicon-pipe.git
cd somatic-amplicon-pipe

Configuration

The configuration file is found under config/config.yaml. Make sure to set these options carefully. The main variables to set would be a genome fasta reference and an intervals (bed file) containing your target region.

Running

Place your aligned bam files (or symlink them) in a directory called 'mapped' in the pipeline root directory. Check that the pipeline has picked these up using:

snakemake --cores 1 --dry-run

You should see a list of steps to be processed on the target bam files.

If you want to submit your jobs to the cluster using SLURM (recommended), use the following to run the pipeline:

snakemake --use-conda --conda-frontend mamba --profile slurm --jobs 8 --cores 24

To run locally omit the --profile slurm part.

NOTE about GATK3

You will notice that GATK3 IndelRealigner will fail on first run. Unfortunately, this is due to licensing restrictions. First, download the GATK3 jar file (this link should work; tested with v3.8.0). Now check the error log to see the conda environment (it should look something like <working-directory>/.snakemake/conda/<id>. Copy the path and activate the environment like so:

conda activate <working-directory>/.snakemake/conda/<id>
gatk3-register <path-to-GATK-jar-file>
conda deactivate

You can now run the rerun the pipeline. Unfortunately, you'll have to activate the environment and reregister gatk3 for every new run from a clean directory.

Output

The pipeline will generate all results under a results directory, including bam realignments and variant calls in VCF format.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
slurm		slurm
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dag.png		dag.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Somatic Amplicon Pipeline

What it does

Installation

Configuration

Running

NOTE about GATK3

Output

About

Releases

Packages

Languages

License

WEHIGenomicsRnD/somatic-amplicon-pipe

Folders and files

Latest commit

History

Repository files navigation

Somatic Amplicon Pipeline

What it does

Installation

Configuration

Running

NOTE about GATK3

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages