Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counterintuitive/wrong? offset from genomic to transcript mapping #332

Closed
jsstevenson opened this issue Jul 30, 2024 · 3 comments · Fixed by #352
Closed

Counterintuitive/wrong? offset from genomic to transcript mapping #332

jsstevenson opened this issue Jul 30, 2024 · 3 comments · Fixed by #352
Assignees
Labels
bug Something isn't working priority:high High priority

Comments

@jsstevenson
Copy link
Member

jsstevenson commented Jul 30, 2024

Describe the bug

Mapping the beginning genomic coordinate of exon 1 on ELN on a GRCh37 accession produces a genomic data description with an exon offset of -97. This is counterintuitive (wrong even?) -- I would expect it to map to the beginning of exon 1 on a GRCh38 accession, ie it should map to exon 1 with offset 0.

Steps to reproduce

from cool_seq_tool.app import CoolSeqTool
import asyncio

from cool_seq_tool.schemas import Strand

async def do_thing():
    cst = CoolSeqTool()
    result = await cst.ex_g_coords_mapper.genomic_to_transcript_exon_coordinates(
        alt_ac="NC_000007.13",
        start=73442503,  # this is the start of exon 1 on this gene/ac -- see https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000007.13&tkey=KAJKLycgLCUiKyYpDRw5JR9XRUpiWVlhdUpVf1zLfSA0VyXsgas3w_8zqs-w-8PLsK8&assm_context=GCF_000001405.25&app_context=gene&v=73442488:73442615&c=null&select=null&slim=0
        strand=Strand.POSITIVE,
        gene="ELN",
    )
    print(result.genomic_data.exon_start_offset)

asyncio.run(do_thing())
# -97

Expected behavior

Should be an exon offset of ~0

Current behavior

Gives an offset of -97

Possible reason(s)

_set_exon_offset looks fishy imho

Suggested fix

No response

Branch, commit, and/or version

efa56e1

Screenshots

No response

Environment details

mac

Additional details

I have yet to find another example of a gene on GRCh37 that maps to a transcript at all, so this might be specific to this gene

Contribution

Yes, I can create a PR for this fix.

@jsstevenson jsstevenson added the bug Something isn't working label Jul 30, 2024
@korikuzma
Copy link
Member

@jsstevenson by the way, I am making quite a few changes in the mapper class. So not sure if you want to tackle this or I can.

@korikuzma
Copy link
Member

@jarbesfeld says this expected output from #352 is correct:

{
  "gene": "ELN",
  "genomic_ac": "NC_000007.14",
  "tx_ac": "NM_000501.4",
  "seg_start": {
    "exon_ord": 0,
    "offset": 1,
    "genomic_location": {
      "type": "SequenceLocation",
      "sequenceReference": {
        "type": "SequenceReference",
        "refgetAccession": "SQ.F-LrLMe1SRpfUZHkQmvkVKFEGaoDeHul"
      },
      "start": 74028173
    }
  }
}

@korikuzma korikuzma linked a pull request Aug 20, 2024 that will close this issue
korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <jarbesfeld@gmail.com>
korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <jarbesfeld@gmail.com>
korikuzma added a commit that referenced this issue Aug 21, 2024
Close #345 and #332

* Update and fix bugs in`ExonGenomicCoordsMapper` 
  * Change output for public methods in `ExonGenomicCoordsMapper` (leverage VRS Sequence Location and improve structure for transcript segment data). Renamed `warnings` to `errors`.
  * Resolve offset / genomic location bugs (#345 and #332)
  * Remove `mane_transcript` instance variable and use `mane_transcript_mappings` instead.
  * Refactor code that was unnecessary or extra.
  * Rename arguments in `genomic_to_tx_segment`: `alt_ac` -> `genomic_ac`, `genomic_start` -> ` seg_start_genomic`, `genomic_end` -> `seg_end_genomic`
* pin `ga4gh.vrs` to `2.0.0a10`
* Remove `get_genes_and_alt_acs` from `UtaDatabase`. Moved this to `ExonGenomicCoordsMapper` and renamed to `_get_genomic_ac_gene`. Will return single gene since genomic accessions are not needed anymore.

---------

Co-authored-by: Jeremy Arbesfeld <jarbesfeld@gmail.com>
@korikuzma
Copy link
Member

This was completed in #352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants