-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User Story] Fix issue with filtered regions in CNVkit #1468
Comments
After refinement 2024-07-26 we still don't know what the best way forward is. For option 1, updating the bedfile in the repo there are a few options:
For option 2, we could proceed as outlined, by adding some pre-processing script to the bedfile before running CNVkit. It would however mean adding some extra layer of documentation to ensure that we don't update this pre-processing step in the pipeline without also updating the PON. Note by @fevac: this is also not the preferred option if we want these changes and information about these issues to be incorporated in future panels. If we only change it in the balsamic side it would be prone to be lost. |
Added some notes above. Also, would it be too complicated to store the panel file (or file path) in the reference folder per release instead of getting it from the repo (similar to what RD does)? Then maybe we could change it per balsamic release without affecting so many systems in the future (and uncoupling it a bit). But I bet this comes with other issues too |
Nice 🙏 as a note on the note on option 2 we could definitely add instructions in the create target panel too, to avoid these situations in the future. We have a couple of different reference folders. There's the one with the databases like production/cancer/reference and there's the one in the balsamic cache production/cancer/balsamic_cache/[version] At the moment the bedfiles are retrieved from the repo in the production/cancer/reference folder which I think CG parses for the balsamic config case argument to find the right bedfile. But I guess you're suggesting to add it to the balsamic init argument to download the bedfiles to the balsamic cache together with the other references. I think that sounds like a nice idea if it can be done. It would be nice to the bedfiles that were used in a certain version of balsamic saved somewhere 🤔 but I don't know if I know all the benefits with it! I think @ivadym understands the pros and cons of this better than me |
Yes, MIP uses a centralized config file in Servers, but it still retrieves the panel bed from LIMS to generate its run config, exactly what we are currently doing in Balsamic. We have a versioning system for the PONs, so an option could be to extend that versioning to the target capture beds as well. For example, taking the latest target capture bed ( |
Hmm so in this example is |
I don't really know how CNVkit works, but instinctively it feels better if the bed-region is at least somewhat corresponding to the region we're sequencing. Probably it doesn't matter much honestly! Maybe increase the number of variants slightly, and decrease the % off target reads in the QC :D |
If we decide to go with the new versioning strategy for these bedfiles I have prepared this PR: https://github.com/Clinical-Genomics/target_capture_bed/pull/134 |
Sounds like we're going for the strategy of padding dynamically the bedfile before CNVkit instead: #1469 |
Need
As a clinician I want to find CNVs with as high resolution as possible. Currently some target capture bedfiles have regions with a very small size, usually 1 base, and which corresponds to some CNV backbone SNV probe-regions, and these are automatically filtered out by CNVkit and ignored in analysis.
See issue: https://github.com/Clinical-Genomics/target_capture_bed/issues/133 in target capture repo
And assessment: #1466
We need to increase the size of these regions, and ideally for release 16 so that we can build the best possible PONs for the release.
Suggested approach
NEEDS REFINEMENT:
I don't know what the best approach is right now.
At the moment there are many bedfiles that are affected with this issue, but we don't need to prioritise all of them right now. The most relevant bedfiles that I could see are:
Out of these the exome bedfile is probably most important, especially as I have planned to build a PON for this workflow for release 16.
The problem is how to implement the extension of the bed regions...
Considered alternatives
1. Change the bedfile in the target repo.
Pros:
Cons:
2. Change the bedfile in runtime in balsamic only for the CNVkit analysis
Pros:
Cons:
Deviation
No response
System requirements assessed
Requirements affected by this story
No response
Risk assessment needed
Risk assessment
No response
SOUPs
No response
Can be closed when
No response
Blockers
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: