Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ins or dups where splice region is preserved #719

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

b0d0nne11
Copy link
Contributor

Fixes #714.

Fixes ins or dup variants spanning the intron/exon or exon/intron boundary where the splice site & region remain completely intact.

@b0d0nne11 b0d0nne11 requested a review from a team as a code owner January 23, 2024 19:25
@b0d0nne11
Copy link
Contributor Author

We found some examples of duplications where the original logic here didn't shift the variant far enough to to get the expected result. It turned out that for these variants it's not possible to write the shifted version as a duplication since in a duplication the alt will always follow the ref. I've added some logic to rewrite these shifted variants as insertions before attempting to map them back to var_ps and added tests to include these cases.

@gostachowiak
Copy link

gostachowiak commented Feb 7, 2024

High-level explanation of this pull request:

  • HGVS nomenclature has the 3' shifting rule. so all cdots and pdots are shifted to the right
  • However, 3' shifting is arbitrary and a necessary evil for nomenclature purposes. But biology doesn't care about the 3' shifting rule
  • Consider an example: positive strand gene, first 8 bases of the intron is duplicated
  • cdot would be +1_+8dup
  • currently no pdot would be calculated because both positions have an offset
  • but now consider the biology-- after the duplication, there are 2 splice sites on the left side of the intron. Which is more likely to be used for splicing?
  • We obviously can't know for sure, but it seems to me that the most logical assumption is that it will use the "inner" splice site for splicing, leaving the extra inserted material within the coding sequence, resulting in a frameshift pdot. Why is this the best assumption?
    • When that splice site is used, the entire intronic sequence is totally intact
    • Seems "safer" to calculate a pdot for this to rescue the variant-- otherwise downstream applications will most likely be filtering out this variant because it's an insertion after the 8th position in the intron
  • To handle this type of situation in the most general way possible, this is the approach:
    • calculate pdot the normal way.
    • if empty, shift the cdot in the REVERSE direction, and calculate the pdot again. if you get a result, use it
    • basically, if you can get a pdot with either forward or reverse shifting, that means the entire intron is intact and we should bring in the pdot
  • note that the 3' shifting rule is still respected both for the cdot and pdot nomenclature
    • reverse shifting is only used as a tool when calculating pdot from cdot, which does not violate HGVS nomenclature

@gostachowiak
Copy link

I also wanted to mention that we discovered this because some fraction of FLT3 ITDs currently get missed when using the hgvs package (including one added to the unit tests). So it is a high impact issue.

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch 2 times, most recently from f2f89da to 08cea20 Compare February 8, 2024 15:34
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Mar 11, 2024
@ahwagner ahwagner removed the stale Issue is stale and subject to automatic closing label Mar 11, 2024
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Apr 11, 2024
@b0d0nne11
Copy link
Contributor Author

@reece or @ahwagner can you remove the stale label here? We would still like to get this merged is possible. Thanks!

@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Apr 12, 2024
@b0d0nne11
Copy link
Contributor Author

We found a case where trying to map the shifted variant causes an HGVSInvalidVariantError. I've added logic to handle this and a test case. The variant is NM_182758.2:c.2953-31_2953-26dup. As part of the shifting procedure, mapping this to the g type yielded an unexpected transformation to NC_000015.9:g.53815545_53815550delinsC that caused problems with later steps. I'm simply handling the error here since we don't want to consider invalid variants.

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch from 08cea20 to 13f7e36 Compare May 24, 2024 16:24
@b0d0nne11
Copy link
Contributor Author

Also rebased on main

@b0d0nne11 b0d0nne11 force-pushed the 714-splice-region-preserved branch from 13f7e36 to efdf6d0 Compare June 4, 2024 18:17
@b0d0nne11
Copy link
Contributor Author

Rebased on main

@b0d0nne11
Copy link
Contributor Author

Made a few non-functional edits:

  • Rebased on main
  • Moved tests to an isolated test file
  • Squashed some commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

c_to_p at intron/exon boundary where splice region is preserved
4 participants