You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve stumbled on a bug that causes FP variant calls when performing a pairwise comparison of VCFs (xcmp mode) while when taking all variants into account (that is, including non-PASS).
We've tested Hap.py v0.3.14 (and also present in 0.3.15 looking at the code)
Please correct me if I understood it incorrectly ;)
Argument can be --usefiltered-truth used to take filtered variants (non-Pass and non-‘dot’) into account. While this option works It appears that these variants will always result in a FP as the struth will always have the genotype ./.
Example:
Truth VCF (only 1 variant shown): 15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,59:59:99:2696,207,0
Query (only 1 variant shown): 15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,28:28:99:1440,108,0
I’ve tested different varieties of options (including --preprocess-truth) and the result was always the same:
15 90294306 . C G 21269.5 SnpCluster BS=90294306 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:tv:SNP:homalt:21269.5
Although the position itself was in the output (this is not the case when --usefiltered-truth was not used), the truth genotype is ./. or NOCALL, and thus a FP in this specific case.
Diving in the code I’ve noticed that the bug seems to be in the xcmp command as option “--apply-filters-truth False” should have been passed to this command when “--usefiltered-truth” is used.
Example:
Using some intermediate chunk of the original run I’ve test xcmp: again with two VCF files (filter.vcf.gz and pass.vcf.gz) and focusing on a single variant (15: 90294306).
For filter.vcf.gz the filter status for that variant is SnpCluster, for pass.vcf.gz this is PASS
filter.vcf.gz 15 90294306 . C G 21269.5 SnpCluster AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,59:0:59:99:0
pass.vcf.gz 15 90294306 . C G 21269.5 PASS AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,28:0:28:99:0
Command: xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta
Output: 15 90294306 . C G 21269.5 . AF=1;BS=90294306;IQQ=21269.5;ctype=simple:mismatch;gtt2=gt_homalt;kind=missing;type=FP GT ./. 1/1
Using the --apply-filters-truth False filter results in the expected output.
Command: xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta --apply-filters-truth False
Output: 15 90294306 . C G 21269.5 SnpCluster AF=1;BS=90294306;IQQ=21269.5;ctype=simple:match;gtt1=gt_homalt;gtt2=gt_homalt;kind=match;type=TP GT 1/1 1/1
Btw. (sanity check for myself): using pass.vcf.gz as truth always results in a correct output, regardless is -apply-filters-truth is used
For me it makes sense to include “--apply-filters-truth False“ in the xcmp command if --usefiltered-truth is used in the main script. If I look into the code, this does not seems to be the case/ an option.
Could you add this to the code, or am I missing something here?
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great tool!
I’ve stumbled on a bug that causes FP variant calls when performing a pairwise comparison of VCFs (xcmp mode) while when taking all variants into account (that is, including non-PASS).
We've tested Hap.py v0.3.14 (and also present in 0.3.15 looking at the code)
Please correct me if I understood it incorrectly ;)
Argument can be --usefiltered-truth used to take filtered variants (non-Pass and non-‘dot’) into account. While this option works It appears that these variants will always result in a FP as the struth will always have the genotype ./.
Example:
Truth VCF (only 1 variant shown):
15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,59:59:99:2696,207,0
Query (only 1 variant shown):
15 90294306 rs6496598 C G 21269.49 SnpCluster AC=2;AF=1.00;AN=2 GT:AD:DP:GQ:PL 1/1:0,28:28:99:1440,108,0
Command:
hap.py truth.vcf.gz query.vcf.gz --usefiltered-truth --reference <ref_fasta> --threads 2 -R <bed-file> -T <bed_file> -o output
I’ve tested different varieties of options (including --preprocess-truth) and the result was always the same:
15 90294306 . C G 21269.5 SnpCluster BS=90294306 GT:BD:BK:BI:BVT:BLT:QQ ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:tv:SNP:homalt:21269.5
Although the position itself was in the output (this is not the case when --usefiltered-truth was not used), the truth genotype is ./. or NOCALL, and thus a FP in this specific case.
Diving in the code I’ve noticed that the bug seems to be in the xcmp command as option “--apply-filters-truth False” should have been passed to this command when “--usefiltered-truth” is used.
Example:
Using some intermediate chunk of the original run I’ve test xcmp: again with two VCF files (filter.vcf.gz and pass.vcf.gz) and focusing on a single variant (15: 90294306).
For filter.vcf.gz the filter status for that variant is SnpCluster, for pass.vcf.gz this is PASS
filter.vcf.gz
15 90294306 . C G 21269.5 SnpCluster AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,59:0:59:99:0
pass.vcf.gz
15 90294306 . C G 21269.5 PASS AF=1 GT:AD:ADO:DP:GQ:PL 1/1:0,28:0:28:99:0
Command:
xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta
Output:
15 90294306 . C G 21269.5 . AF=1;BS=90294306;IQQ=21269.5;ctype=simple:mismatch;gtt2=gt_homalt;kind=missing;type=FP GT ./. 1/1
Using the --apply-filters-truth False filter results in the expected output.
Command:
xcmp filter.vcf.gz pass.vcf.gz -o test_merge_xcmp.bcf -r Homo_sapiens.GRCh37.GATK.illumina.fasta --apply-filters-truth False
Output:
15 90294306 . C G 21269.5 SnpCluster AF=1;BS=90294306;IQQ=21269.5;ctype=simple:match;gtt1=gt_homalt;gtt2=gt_homalt;kind=match;type=TP GT 1/1 1/1
Btw. (sanity check for myself): using pass.vcf.gz as truth always results in a correct output, regardless is -apply-filters-truth is used
For me it makes sense to include “--apply-filters-truth False“ in the xcmp command if --usefiltered-truth is used in the main script. If I look into the code, this does not seems to be the case/ an option.
Could you add this to the code, or am I missing something here?
The text was updated successfully, but these errors were encountered: