NGS.PRSS1-2caller is a toolkit for calling genetic variants at PRSS1-PRSS2 locus, which can solve the problem of misaligned short-reads from the pseudogenes PRSS3P2 and TRY7. NGS.PRSS1-2caller realigns short reads to GRCh38 ALT contig (chr7_KI270803v1_alt) from NGS aligned data with GRCh37/GRCh38 as reference (file in bam format), and detects variants including SNV, INDEL and CNV at PRSS1-PRSS2 locus with high accuracy and sensitivity. NGS.PRSS1-2caller can also annotate the biological consequences of a variant and perform variant phasing with population-level data.
If you use NGS.PRSS1-2caller, please cite our paper:
Lou H, Xie B, Wang Y, et al Improved NGS variant calling tool for the PRSS1–PRSS2 locus Gut Published Online First: 14 March 2022. doi: 10.1136/gutjnl-2022-327203
NGS.PRSS1-2caller does not need to be installed. You need to replace the software path in the parameter.txt file with your own software path.
The following software versions have been tested and passed:
software version weblink
python 2.7 https://www.python.org/downloads
perl 5.22.1 https://www.perl.org/get.html
java 11.0.1 https://www.oracle.com/java/technologies/javase/jdk11-archive-downloads.html
R 3.6.0 https://www.r-project.org (with ggplot2,ggthemes)
samtools 1.9 https://sourceforge.net/projects/samtools/files/samtools
bwa 0.7.17-r1188 https://github.com/lh3/bwa/releases
freebayes 1.3.5 https://github.com/ekg/freebayes
snpEff 4.3t https://sourceforge.net/projects/snpeff/files
gatk 4.1.7.0 https://gatk.broadinstitute.org/hc/en-us/articles/360036194592-Getting-started-with-GATK4
The python package of NGS.PRSS1-2caller will be released soon.
NGS.PRSS1-2caller.sh -i [filelist] -m [num] -p -n [name]
Required arguments
-i FILE Tab-separated bam file list including *sample ID* / *bam file location* / *sample read depth* (U for unknown)
Optional arguments
-m [num] Mapping quality (default=50)
-r Reference genome of input data (GRCh38/GRCh37, dafault=GRCh38)
-p Do phasing
-n Taskname
./NGS.PRSS1-2caller.sh -i example.list -p -n example
We generated the following results with the data in the NGS.PRSS1-2caller/exp/ directory. The example data are the extracted bam files (NGS with GRCh38 as reference) from the 1000 Genomes Project and Human Genome Diversity Project samples (HG00581, HG03084, HG03490, HGDP00578). All output files can be found in the NGS.PRSS1-2caller/exp/example.out/ directory. NGS.PRSS1-2caller can provide: 1) variants in variant call format (VCF) with phased information, 2) variants with predicted biological consequences, 3) a plot of variant positions at PRSS1-PRSS2 locus, 4) a list of large-impact variants.
- Variants detected based on GRCh38 ALT contig in vcf format (variants can be phased if you add [-p]).
- example_PRSS_snpEff_ann.vcf.gz
2.Variants in vcf format with biological consequences annotated.
- example_PRSS_snpEff_ann.txt (extracted)
3.Variant position plot with missense, stop-gained and frameshift variants highlighted。
- example_PRSS_snpEff_ann.plot.pdf
4.List of missense, stop-gained and frameshift variants. For a full list of large-impact variants, please refer to the variants annotated with "HIGH" and "MODERATE" at INFO column in the NAME_PRSS_snpEff_ann.txt file.
- example_PRSS_snpEff_ann.sum.txt
Written by Wang Yimin & Xie bo, conceptualized by Lou Haiyi.