Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using --usefiltered-truth results in incorrect FP calls for filtered variants in the truth (xcmp) #191

Open
melferink opened this issue Jul 24, 2024 · 0 comments

Comments

@melferink
Copy link

Hi, thanks for the great tool!

I’ve stumbled on a bug that causes FP variant calls when performing a pairwise comparison of VCFs (xcmp mode) while when taking all variants into account (that is, including non-PASS).
We've tested Hap.py v0.3.14 (and also present in 0.3.15 looking at the code)

Please correct me if I understood it incorrectly ;)
Argument can be --usefiltered-truth used to take filtered variants (non-Pass and non-‘dot’) into account. While this option works It appears that these variants will always result in a FP as the struth will always have the genotype ./.

Example:

Truth VCF (only 1 variant shown):
15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,59:59:99:2696,207,0
Query (only 1 variant shown):
15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,28:28:99:1440,108,0

Command:
hap.py truth.vcf.gz query.vcf.gz --usefiltered-truth --reference <ref_fasta> --threads 2 -R <bed-file> -T <bed_file> -o output

I’ve tested different varieties of options (including --preprocess-truth) and the result was always the same:

15 90294306 . C G 21269.5 SnpCluster BS=90294306 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:tv:SNP:homalt:21269.5

Although the position itself was in the output (this is not the case when --usefiltered-truth was not used), the truth genotype is ./. or NOCALL, and thus a FP in this specific case.

Diving in the code I’ve noticed that the bug seems to be in the xcmp command as option “--apply-filters-truth False” should have been passed to this command when “--usefiltered-truth” is used.

Example:
Using some intermediate chunk of the original run I’ve test xcmp: again with two VCF files (filter.vcf.gz and pass.vcf.gz) and focusing on a single variant (15: 90294306).
For filter.vcf.gz the filter status for that variant is SnpCluster, for pass.vcf.gz this is PASS

filter.vcf.gz
15 90294306 . C G 21269.5 SnpCluster AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,59:0:59:99:0

pass.vcf.gz
15 90294306 . C G 21269.5 PASS AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,28:0:28:99:0

Command:
xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta
Output:
15 90294306 . C G 21269.5 . AF=1;BS=90294306;IQQ=21269.5;ctype=simple:mismatch;gtt2=gt_homalt;kind=missing;type=FP GT ./. 1/1

Using the --apply-filters-truth False filter results in the expected output.
Command:
xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta --apply-filters-truth False

Output:
15 90294306 . C G 21269.5 SnpCluster AF=1;BS=90294306;IQQ=21269.5;ctype=simple:match;gtt1=gt_homalt;gtt2=gt_homalt;kind=match;type=TP GT 1/1 1/1

Btw. (sanity check for myself): using pass.vcf.gz as truth always results in a correct output, regardless is -apply-filters-truth is used

For me it makes sense to include “--apply-filters-truth False“ in the xcmp command if --usefiltered-truth is used in the main script. If I look into the code, this does not seems to be the case/ an option.
Could you add this to the code, or am I missing something here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant