-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI consensus calling error on WGS samples #131
Comments
Using an older version of xgboost (1.7.1 and 1.6) I get different errors, which might be more on the bug side.
Which seems to be caused by
|
Yeah thanks for the report. ntree_limit has been deprecated in xgboost. Let me figure out what's the best way forward. |
I've managed to make it work with version 2.0.3 of xgboost:
I'm not sure if for your model, transforming infinite numbers to NAN is the best approach. Maybe turning -INF and INF values to 0 or to a very big number is better. |
Thanks. iteration_range was introduced in v1.4. Think I'll make it |
Do you know where did the data get unexpected Inf or NaN? |
I have no idea exactly where it is because i'm processing in a cluster, but I could try running it and saving the train dataset to explore if you want to. I`m just not sure how. |
Looked around on internet it seems people have gotten that error when there are very large number (e.g., 1e300) in the data: https://stackoverflow.com/questions/67986268/xgboost-check-failed-valid-input-data-contains-inf-or-nan |
I think I might have found why there are infinite values. |
Hmm, maybe because DP4 and VAF has quality filters (e.g., minimum mapping quality or base call quality) that a mutation caller used to make that call. |
I'm trying to run somaticseq_parallel on some samples VCFs to call the AI consensus.
The version for SomaticSeq is SomaticSeq v3.7.3. Version of XGBOOST is 2.0.2
I've run all mutation callers, then, with the VCF files, did the following command:
somaticseq_parallel.py --classifier-snv /scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/SNV_model.classifier --classifier-indel /scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/INDEL_model.classifier --output-directory /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR --genome-reference /scratch4/nsobrei2/references/ncbi_grch38_cipher/GRCh38_full_analysis_set_plus_decoy_hla.fa -dbsnp /scratch4/nsobrei2/references/dbsnp/138_cipher/Homo_sapiens_assembly38.dbsnp138.vcf.gz --threads 38 paired --tumor-bam-file /scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_TUMOR.bam --normal-bam-file /scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_GERMLINE.bam --mutect2-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuTect2.vcf.gz --vardict-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarDict.vcf.gz --somaticsniper-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.SomaticSniper.vcf.gz --muse-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuSE.vcf.gz --strelka-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.snv.vcf.gz --strelka-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.indel.vcf.gz --varscan-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.snv.vcf.gz --varscan-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.indel.vcf.gz --lofreq-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.snv.vcf.gz --lofreq-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.indel.vcf.gz
This is the output with the error
The output of the created AI model, used in the above code, was:
The text was updated successfully, but these errors were encountered: