You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered an issue where Monopogen appears to process multiple samples as a single sample during germline SNV calling, despite correctly handling them as separate samples in the preprocessing step, with multiple samples across different rows.
Key observations:
Preprocessing correctly identifies 4 samples.
Germline calling reports "1 samples in 4 input files".
All filtered BAM files have identical RG tags (SM:atac_possorted_bam).
I suspect the identical RG tags may cause this behavior. Is there a way to force Monopogen to treat each input file as a separate sample? Alternatively, should we modify the RG tags in our original BAM files? Any guidance on resolving this issue would be appreciated. Let me know if you need any additional information.
Update: the RG tags are unique in my original BAM files, and preprocessing module took 4 samples as separate samples, but also modified the RG info to be the same in the preprocessing step. It is important to modify the BamFilter function in germline.py to preserve the original Read Group information when creating the filtered BAM files. The problem most likely arose because my original BAM files are of the same name across different folders, which is very common.
bash atac_out/Script/runGermline_chr20.sh
[mpileup] 1 samples in 4 input files
(mpileup) Max depth is above 1M. Potential memory hog!
Lines total/split/realigned/skipped: 61434549/866633/85422/0
[2024-08-14 12:29:23,079] INFO Monopogen.py Success! See instructions above.
The text was updated successfully, but these errors were encountered:
I've encountered an issue where Monopogen appears to process multiple samples as a single sample during germline SNV calling, despite correctly handling them as separate samples in the preprocessing step, with multiple samples across different rows.
Key observations:
Preprocessing correctly identifies 4 samples.
Germline calling reports "1 samples in 4 input files".
All filtered BAM files have identical RG tags (SM:atac_possorted_bam).
I suspect the identical RG tags may cause this behavior. Is there a way to force Monopogen to treat each input file as a separate sample? Alternatively, should we modify the RG tags in our original BAM files? Any guidance on resolving this issue would be appreciated. Let me know if you need any additional information.
Update: the RG tags are unique in my original BAM files, and preprocessing module took 4 samples as separate samples, but also modified the RG info to be the same in the preprocessing step. It is important to modify the BamFilter function in germline.py to preserve the original Read Group information when creating the filtered BAM files. The problem most likely arose because my original BAM files are of the same name across different folders, which is very common.
The text was updated successfully, but these errors were encountered: