Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncertain ranges #699

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Uncertain ranges #699

wants to merge 22 commits into from

Conversation

theferrit32
Copy link

No description provided.

@theferrit32 theferrit32 requested a review from a team September 20, 2023 16:29
@theferrit32 theferrit32 self-assigned this Sep 20, 2023
@theferrit32
Copy link
Author

theferrit32 commented Sep 20, 2023

Haven't added for c. yet. The hgvs.pymeta changes are likely similar but will need to add tests.

Can use the expressions from here: #225

@korikuzma
Copy link
Contributor

@biocommons/hgvs-maintainers Would we be able to get feedback on this PR?

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 10, 2023
@andreasprlic
Copy link
Member

andreasprlic commented Dec 11, 2023

It would be great to get this PR in. Can we just make sure the "uncertain" field on BaseOffsetPosition or SimplePosition gets set correctly (and tested in the unit test)?

If we express a range in parenthesis, I'd assume we want to express that the whole range is uncertain. At least that's how I'd read the hgvs spec.

Thanks!

andreasprlic
andreasprlic previously approved these changes Dec 11, 2023
Copy link
Member

@andreasprlic andreasprlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andreasprlic andreasprlic removed the stale Issue is stale and subject to automatic closing label Dec 11, 2023
…Since the behavior is a bit unspecified, we fall back to the inner (confident) interval of the uncertain range for this projection.
@andreasprlic
Copy link
Member

Playing around with this branch some more I realized that g_to_c projection was not yet working. As such I had a go at enabling this. Since the behavior of this projection is somewhat undefined, the current approach picks the inner (=confident) interval of the uncertain range and uses that for the .c projection. The resulting hgvs_c string is then "confident" and not "uncertain" any longer.

Copy link

github-actions bot commented Feb 2, 2024

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Feb 2, 2024
Copy link

This PR was closed because it has been stalled for 7 days with no activity.

@andreasprlic
Copy link
Member

Can we work on this one on Monday during our dev session?

@andreasprlic
Copy link
Member

andreasprlic commented Feb 20, 2024

Added imprecise hgvs_c strings, but it is not 100% there yet, since

NC_000009.11:g.(?_108337304)_(108337428_?)delNM_001079802.1:c.(-10)_(105+10)del at the moment. We want this to be NM_001079802.1:c.(?_-10)_(105+10_?)del. The problem is that the -10 comes out as a BaseOffsetPosition currently, when we prob would want it an Interval where the start has no "base" set. That prob. requires some more modifications on g_to_n in the alignmentmapper.

(see otherwise working cases in test_uncertain_projection_confidence).

Regarding partially precise events that were mentioned earlier: In the example above, there could be a slightly different event, where one might observe that there is indeed a precise breakpoint in the UTR on the left, but an unknown breakpoint in the UTR region downstream of the exon. It would prob be something like this: (uncertainty only on the right side) NC_000009.11:g.108337304_(108337428_?)del .

@reece reece marked this pull request as draft February 20, 2024 17:17
@reece
Copy link
Member

reece commented Feb 20, 2024

Thanks for working this issue, @andreasprlic.

I converted this PR to a draft to make it clear that it was a WIP.

@larrybabb
Copy link

@andreasprlic thank you for working on this. This is a fairly important issue for vrs-Python and the various datasets we are trying to transform from hgvs to vrs. Do you have any sense of when this work will be completed or if it could be handed off to another developer efficiently? No pressure but any information will assist our planning.

@andreasprlic
Copy link
Member

@larrybabb Thanks for calling this out. So far we have been treating this one as a "nice to have", not a "must have".

At this point I believe g_to_c for imprecise events is working. Right now I am looking into the other direction, c_to_g, which will require some "grammar" modifications in how to parse the hgvs-strings. Perhaps it is best if I push what I got so far up, then somebody else who has worked on the parser previously could try to get imprecise hgvs_c parsing to work? @theferrit32 perhaps? :-)

…uires parser modifications still. This PR contains unit tests that are still broken, but should parse once we got the c. parsing figured out. (plus some more potential modifications in the alignment mapper).
@andreasprlic
Copy link
Member

I pushed my current version of the code.

Note:

  • unit tests for imprecise g_to_c are passing.
  • unit test for partially precise g_to_c is passing too.
  • not passing yet: imprecise c_to_g.

@andreasprlic
Copy link
Member

@larrybabb for your use case, do you need the support in the direction g_to_c, or do you also need c_to_g? I do wonder if the vast majority of use cases are addressed by this PR as it is, and the missing parsing of imprecise c_to_g would be just useful for rare edge cases (and could be done via a future PR).

@larrybabb
Copy link

larrybabb commented Mar 4, 2024

... for your use case, do you need the support in the direction g_to_c, or do you also need c_to_g?

@andreasprlic at this point in time I do not need either the g_to_c or the c_to_g support. But if I look ahead a bit I think I might like to have the g_to_c support down the road. I do not foresee needing the c_to_g on these ambiguous endpoints.

@andreasprlic andreasprlic marked this pull request as ready for review March 4, 2024 19:11
@andreasprlic
Copy link
Member

@larrybabb Thanks, then I believe this PR is ready to review.

@@ -122,7 +122,7 @@ pro_ident = '=' -> hgvs.edit.AARefAlt(ref='',alt=''

# potentially indefinite/uncertain intervals
c_interval = def_c_interval | '(' def_c_interval:iv ')' -> iv._set_uncertain()
g_interval = def_g_interval | '(' def_g_interval:iv ')' -> iv._set_uncertain()
g_interval = uncertain_g_interval:iv | ('(' def_g_interval:iv ')' -> iv._set_uncertain()) | def_g_interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. The second alternative parse (with parentheses) is for an uncertain g_interval.

Copy link
Member

@reece reece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreasprlic @theferrit32 @larrybabb: I figured out my discomfort with the changes in around hgvs.pymeta:125. At the very least, I'd like to discuss this further because I think the current change breaks important parity across variant types.

We currently have coordinates like:

  • 5 (pos in the grammar)
  • 5_5 (def_x_interval)
  • (5_5) (def_x_interval with Interval.uncertain = True)

And now we're adding (5_5)_(5_5) for genomic sequences only. I think there are two problems with this. 1) We should have parallel constructs for other variant types, 2) uncertain_g_interval is a confusing name (to me) because it can be confused with the (5_5) form above.

It seems to me that we should be able to refactor this into atomic positions (as is) and a single interval with starts and ends that can be either positions or, now also intervals.

I haven't thought this through fully, but I think this could be implemented by creating a g_pos_or_interval rule, then using that in line 132 in lieu of the g_pos:start and :end.

Thanks for your patience. I'm not trying to drive you crazy. The parity of these rules is part of what makes the hgvs package useful in structuring HGVS variant descriptions, so I'm reluctant to toss this parity across rules without a compelling reason.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Apr 13, 2024
@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Apr 14, 2024
@ahwagner ahwagner self-requested a review September 5, 2024 17:24
@reece
Copy link
Member

reece commented Sep 9, 2024

Per collective coding meeting discussion on 2024-09-09, here's the path forward:

  • Rework grammar changes to preserve analogous grammar across sequence types for def_g_interval, etc.
  • Update g-to-c projections to support transformations like those shown in support uncertain ranges #225 (comment)
  • Ensure tests for each of these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants