Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

germ line module not detecting full chromosome #65

Open
aidanshoham12 opened this issue Jun 23, 2024 · 3 comments
Open

germ line module not detecting full chromosome #65

aidanshoham12 opened this issue Jun 23, 2024 · 3 comments

Comments

@aidanshoham12
Copy link

Hello, I seem to be having an issue with the germline module. I'm working on the bone marrow single cell sample provided in the GitHub. Upon running the germ line module, I can only detect a much smaller number of SNVs than I should be. (around 600 detected by me and 10000 detected in your tutorial)
Window 1 [ chr20:273372-1542468 ] reference markers: 635 target markers: 635
From what I see, the lower number of SNVs detected is probably due to a smaller range of chr20 that was used by the tool (as shown in resource directory). I specified chr20 the same way you do in the region.lst file and still cant seem to scan the entire chromosome. The time spent building the model also seems to be much smaller than it should be.
Number of markers: 576 Total time for building model: 9 seconds Total time for sampling: 0 seconds Total run time: 15 seconds
From all of this, I think the tool might be having some issues detecting the region to detect SNVs. From what I understood, specifying just chr20 in the region.lst file was enough to be the entire chromosome 20 and not a section of it. Do you have any idea of how to get the tool to be able to recognize a more broad section of the chromosome? I'd be open to any ideas
Thank you so much for your help!

@aidanshoham12
Copy link
Author

Hello
I just wanted to update about the above issue, I think the issue was that I was using the CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz that only contained 2 Mb of SNVs instead of the whole chromosome. This would be the reason for why im detecting a smaller number of SNVs. I tried rerunning the sample using the downloaded panel for chromosome 1 and got the following error message:
(...)
[2024-06-24 10:21:24,990] INFO germline.py --nthreads = [1]
[2024-06-24 10:21:24,990] INFO germline.py --norun = [FALSE]
[2024-06-24 10:21:24,990] INFO Monopogen.py Checking existence of essenstial resource files...
[2024-06-24 10:21:25,004] INFO Monopogen.py Checking dependencies...
[mpileup] 1 samples in 1 input files
(mpileup) Max depth is above 1M. Potential memory hog!
Lines total/split/realigned/skipped: 209691145/485374/116534/0
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.HashMap.resize(HashMap.java:702)
at java.base/java.util.HashMap.putVal(HashMap.java:661)
at java.base/java.util.HashMap.put(HashMap.java:610)
at java.base/java.util.HashSet.add(HashSet.java:221)
at main.Main.restrictToVcfMarkers(Main.java:343)
at main.Main.allData(Main.java:313)
at main.Main.main(Main.java:111)
gzip: path/to/germline/chr1.gp.vcf.gz: No such file or directory
path/to/germline/chr1.gp.vcf.gz: No such file or directory
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Exception in thread "main" java.lang.IllegalArgumentException: Missing line (#CHROM ...) after meta-information lines
File source: path/to/germline/chr1.germline.vcf
null
at vcf.VcfHeader.checkHeaderLine(VcfHeader.java:135)
at vcf.VcfHeader.(VcfHeader.java:119)
at vcf.VcfIt.(VcfIt.java:190)
at vcf.VcfIt.create(VcfIt.java:175)
at vcf.VcfIt.create(VcfIt.java:150)
at main.Main.allData(Main.java:297)
at main.Main.main(Main.java:111)
[2024-06-24 12:00:52,537] INFO Monopogen.py Success! See instructions above.
I'm not encountering this issue with the 2Mb version of CCDG_14151_B01_GRM_WGS_2020-08-05_chr20.filtered.shapeit2-duohmm-phased.vcf.gz but am having issues when using the whole chromosome. I think it might have to do with the amount of RAM given to beagle in the Picked up JAVA_TOOL_OPTIONS: -Xmx2g argument. Is it possible to increase 2g higher? Im open to any suggestions
Thank you so much for your help!

@jinzhuangdou
Copy link
Collaborator

Do you have chr1.gl.vcf.gz file generated in the germline folder? How many variants included in the file?

@aidanshoham12
Copy link
Author

hello, here are the three files produced after running the germline module with their respective sizes:
8195748066, chr1.gl.vcf.gz
852, chr1.gp.log
859, chr1.phased.log
The chr1.gl.vcf.gz contains 871954 SNVs that were detected by Monopogen. This seems to be around 4X the normal amount (usually 249000 from google on chr1). For reference, this is a tumor sample and is expected to contain a much larger amount of SNVs compared to the normal tissues presented in the GitHub tutorial. I am interested in conducting lineage tracing in tumor samples but im not sure if the large number of SNVs will overwhelm the tool. let me know what you think
Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants