-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix variant region for ins and dup on intron-exon boundary #748
base: main
Are you sure you want to change the base?
Fix variant region for ins and dup on intron-exon boundary #748
Conversation
@reece and @andreasprlic would you mind taking a look at this and letting me know if you like the approach here better? |
Thanks, this looks technically more concise to me. I have a scientific question though: Is it established that insertions at the intron/exon boundary always end up as part of the exon? In other words, do we always want to report such insertions as frameshift variants, or is there a chance they are actually not impacting the coding sequence and remain part of the intron? If we are not sure, it feels as if there should be some config, that would allow to customize the behavior of the code. I am thinking of something like the |
AFAIK they will always be coding, along with the new conditions for duplications that were added. I can certainly create a feature flag for this but IMO I don't think its needed. I think the only thing that would change would be that these niche mapping cases would again return incorrect results. @gostachowiak do you have anything you want to add? |
@andreasprlic For the scientific question-- no, it is not established that insertions at the boundary will always be part of the exon. For any given mutation, it is impossible to know what will happen in real life. The idea here is that since the entire intron (including canonical splice sites) is intact, it will default to treating the inserted material as being part of the exon. Since the canonical splice site is well-defined, this seems like the most rational default assumption. I believe this is strictly better than the current behavior, where sometimes insertions at boundaries are treated as intronic, and sometimes as exonic, based on what offset values happen to appear in the nomenclature. The updated behavior is also consistent with other tools such as Mutalyzer. A feature flag would make sense, although in my opinion it would be better to deal with in a follow-up issue, because (1) it would be non-trivial, and (2) the current pull request at least establishes a consistent baseline (i.e. eliminates the arbitrary/inconsistent results that are currently returned). |
Thanks @gostachowiak for the explanation. I agree, more consistent behavior would be desirable. Having read your comment, my first thought was "this will make the variants look more pathogenic as a default". As such I had a chat with a variant scientist about this issue. The recommendation from that person is that such variants always need to be reviewed carefully and that as a general rule the goal would be to try to represent in the more benign representation, which would be as an intronic variant and potentially impacting splicing. (3' shuffling rule in HGVS nomenclature is not helpful here and would not play a role in the naming) As such, how hard would it be to have two options here: move such insertions into the intron and as an alternative move into exon? (One could create synonyms and let a scientist chose the preferred name) From a "hgvs as a library" perspective I can easily see that some labs have certain SOPs how they would report out variants in certain situations, such as this. However it is better if the library offers choice, rather than impose behavior when there are more than one possible answers. |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Fixes #655
This PR is offered as an alternative to #709. It solves the same problem, that certain ins or dup variants that occur at the intron/exon boundaries have non-zero offsets but are still coding. The approach taken here varies from the earlier PR in a few ways: