-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncertain ranges #699
base: main
Are you sure you want to change the base?
Uncertain ranges #699
Conversation
…rtain ranges, for g. expressions
b67d9c7
to
5054739
Compare
Haven't added for Can use the expressions from here: #225 |
@biocommons/hgvs-maintainers Would we be able to get feedback on this PR? |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
It would be great to get this PR in. Can we just make sure the "uncertain" field on BaseOffsetPosition or SimplePosition gets set correctly (and tested in the unit test)? If we express a range in parenthesis, I'd assume we want to express that the whole range is uncertain. At least that's how I'd read the hgvs spec. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…Since the behavior is a bit unspecified, we fall back to the inner (confident) interval of the uncertain range for this projection.
Playing around with this branch some more I realized that g_to_c projection was not yet working. As such I had a go at enabling this. Since the behavior of this projection is somewhat undefined, the current approach picks the inner (=confident) interval of the uncertain range and uses that for the .c projection. The resulting hgvs_c string is then "confident" and not "uncertain" any longer. |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This PR was closed because it has been stalled for 7 days with no activity. |
Can we work on this one on Monday during our dev session? |
Added imprecise hgvs_c strings, but it is not 100% there yet, since
(see otherwise working cases in test_uncertain_projection_confidence). Regarding partially precise events that were mentioned earlier: In the example above, there could be a slightly different event, where one might observe that there is indeed a precise breakpoint in the UTR on the left, but an unknown breakpoint in the UTR region downstream of the exon. It would prob be something like this: (uncertainty only on the right side) |
Thanks for working this issue, @andreasprlic. I converted this PR to a draft to make it clear that it was a WIP. |
@andreasprlic thank you for working on this. This is a fairly important issue for vrs-Python and the various datasets we are trying to transform from hgvs to vrs. Do you have any sense of when this work will be completed or if it could be handed off to another developer efficiently? No pressure but any information will assist our planning. |
@larrybabb Thanks for calling this out. So far we have been treating this one as a "nice to have", not a "must have". At this point I believe g_to_c for imprecise events is working. Right now I am looking into the other direction, c_to_g, which will require some "grammar" modifications in how to parse the hgvs-strings. Perhaps it is best if I push what I got so far up, then somebody else who has worked on the parser previously could try to get imprecise hgvs_c parsing to work? @theferrit32 perhaps? :-) |
…uires parser modifications still. This PR contains unit tests that are still broken, but should parse once we got the c. parsing figured out. (plus some more potential modifications in the alignment mapper).
I pushed my current version of the code. Note:
|
@larrybabb for your use case, do you need the support in the direction |
@andreasprlic at this point in time I do not need either the |
@larrybabb Thanks, then I believe this PR is ready to review. |
@@ -122,7 +122,7 @@ pro_ident = '=' -> hgvs.edit.AARefAlt(ref='',alt='' | |||
|
|||
# potentially indefinite/uncertain intervals | |||
c_interval = def_c_interval | '(' def_c_interval:iv ')' -> iv._set_uncertain() | |||
g_interval = def_g_interval | '(' def_g_interval:iv ')' -> iv._set_uncertain() | |||
g_interval = uncertain_g_interval:iv | ('(' def_g_interval:iv ')' -> iv._set_uncertain()) | def_g_interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this. The second alternative parse (with parentheses) is for an uncertain g_interval.
…for uncertain intervals. Add test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreasprlic @theferrit32 @larrybabb: I figured out my discomfort with the changes in around hgvs.pymeta:125. At the very least, I'd like to discuss this further because I think the current change breaks important parity across variant types.
We currently have coordinates like:
5
(pos
in the grammar)5_5
(def_x_interval
)(5_5)
(def_x_interval
with Interval.uncertain = True)
And now we're adding (5_5)_(5_5)
for genomic sequences only. I think there are two problems with this. 1) We should have parallel constructs for other variant types, 2) uncertain_g_interval
is a confusing name (to me) because it can be confused with the (5_5)
form above.
It seems to me that we should be able to refactor this into atomic positions (as is) and a single interval with starts and ends that can be either positions or, now also intervals.
I haven't thought this through fully, but I think this could be implemented by creating a g_pos_or_interval
rule, then using that in line 132 in lieu of the g_pos:start and :end.
Thanks for your patience. I'm not trying to drive you crazy. The parity of these rules is part of what makes the hgvs package useful in structuring HGVS variant descriptions, so I'm reluctant to toss this parity across rules without a compelling reason.
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Per collective coding meeting discussion on 2024-09-09, here's the path forward:
|
No description provided.