Multispecies Mitochondrial Variant Analysis

This workflow performs a variant analysis on mitochondrial genomes using the bcftools variant caller. For human samples, the workflow also performs a variant analysis using mutserve. It is listed in the Snakemake Workflow Catalog where usage of standardized Snakemake workflows is described.

Local installation

Clone this repo

git clone https://github.com/tlenfers/multispecies_mitochondrial_variant_analysis.git
cd multispecies_mitochondrial_variant_analysis

Install dependencies

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# setup channels 
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# create & activate new env with installed deps
conda env create -n wf -f environment.yaml
source activate wf

Configuration

Config files:

config.yaml - analysis-specific settings
environment.yaml - software dependencies and versions
samples.tsv - list of (paired) samples

Samples:

Put all sample names in a single column in samples.tsv.
Add the data folder to config.yaml where all files to be analysed are located
- standard path is set to data/
assumed naming convention:
- sampleName_R1.fastq.gz
- sampleName_R2.fastq.gz

Reference:

To analyse dog, mouse or human samples the corresponding reference will be downloaded. Define the to be analysed species in config.yaml under reference.

If you want to analyse a different species or use your own reference, enter the name of the file and it's path in config.yaml.

Execute the workflow

cd workflow
# 'dry' run only checks I/O files
snakemake -np

# To run mutlipecies variant analysis
snakemake -j n all --use-conda --use-singularity
# where n is the numer of cores to use

# To run human variant analysis with mutserve
snakemake -j n all_human --use-conda --use-singularity
# where n is the numer of cores to use

Output

All output files are put in /results and in their own subfolder regarding the used reference and caller.
The results are in a sub-folder corresponding to the name of the reference file used.

/results/calls_bcftools contains all called variants using bcftools. The variants are normalized and saved as sample_name.vcf.gz. In addition, the file mergerd.vcf is created in which all variants are merged together.
- Example file without header:

    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	results/mapped/human/sample_name.bam
    chrM	73	.	A	G	225.417	.	DP=253;VDB=3.59147e-17;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,240,5;MQ=60	GT:PL	1:255,0
    chrM	146	.	T	C	225.422	.	DP=242;VDB=0.795672;SGB=-0.693147;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,165,52;MQ=60	GT:PL	1:255,0

/results/calls_mutserve contains all called variants using mutserve.
/results/mapped contains all aligned reads as sample_name.bam ad their index file sample_name.bam.bai.
/results/plots contains the created heatmap plots for the bctools caller. Example plots:
- ref_heatmap.pdf
- ref_heatmap_clusterrow.pdf
- The name of the samples is on the X-axis, the variants on the Y-axis
- The values of the heatmap refer to the Phred-scaled likelihood for homomorphic reference allele (scale 0-255; 255: reference is very unlikely -> alternative more likely).
- The plots of alt_heatmap are containing the Phred-scaled likelihood for homomorphic alternative allele, i.e. that the variant is present at this position (scale 0-255; 0: variant is present).
/results/sequences contains the created consensus sequences for each sample in regard to the used reference in fasta format as sample_name.fa.

Snakedeploy usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

# To run human variant analysis with mutserve
snakemake --cores all all_human --use-conda --use-singularity

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository.

Planned features

This pipeline is work in progress. The following features are planned:

mutserve caller:
- extraction of common variants
- plotting variant heatmaps based on variants called by mutserve
- building the consensus sequence
additional quality control strep:
- remove samples with low coverage from further analysis
dynamic sizing of the heatmap based on samples and variants

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
config		config
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multispecies Mitochondrial Variant Analysis

Local installation

Configuration

Execute the workflow

Output

Snakedeploy usage

Planned features

About

Releases 3

Packages

Contributors 3

Languages

License

TLenfers/multispecies_mitochondrial_variant_analysis

Folders and files

Latest commit

History

Repository files navigation

Multispecies Mitochondrial Variant Analysis

Local installation

Configuration

Execute the workflow

Output

Snakedeploy usage

Planned features

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages