-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong HGVSP annotation for some RefSeq sequences for GRCh37 #103
Comments
This affects MAP3K14, and it has been reported to Illumina before in September 2022, as I just learned after talking to a colleague. This also affects the result generated by LocalApp for the TSO500 and TSO COMP panels. |
Hi Henrik, Did you provide the correct HGVS g. notation? Annotating that variant with Nirvana and Biocommons HGVS reveals overlaps with Looking at the latest version of RefSeq for GRCh37, I would expect to see
|
Here's the annotation for {
"transcript":"NM_002447.4",
"source":"RefSeq",
"bioType":"mRNA",
"codons":"Ggc/Ggc",
"aminoAcids":"G",
"cdnaPos":"3847",
"cdsPos":"3583",
"exons":"17/20",
"proteinPos":"1195",
"geneId":"4486",
"hgnc":"MST1R",
"consequence":[
"synonymous_variant"
],
"hgvsc":"NM_002447.4:c.3583=",
"hgvsp":"NP_002438.2:p.(Gly1195=)",
"isCanonical":true,
"proteinId":"NP_002438.2"
} |
Hi Michael, you are right, I mixed up the HGVSG for two cases that I looked at. MST1R has another issue, but that's due to a difference between the genomic backbone for GRCh37 vs GRCh38 (GRCh37 has one base which is a SNP, which is replaced by the common variant in GRCh38, so the hgvsp derived from Nirvana+GRCh37 does not match the NP sequence - NP has an AA according to GRCh38, where it is a synonymous mutation, while translating from GRCh37 results in a missense mutation). It is interesting that with your internal version 3.20 you get the synonymous mutation for MST1R, you must have corrected something here. With 3.18.1, which is the latest public release, this looks different. |
Thanks for the quick reply, @heseber ! In an old release of the TSO500 software, Nirvana 3.2.3 was used, and it produced the following incorrect annotation: Nirvana 3.2.3{
"transcript":"NM_003954.3",
"source":"RefSeq",
"bioType":"protein_coding",
"codons":"gCc/gTc",
"aminoAcids":"A/V",
"cdnaPos":"1743",
"cdsPos":"1634",
"exons":"9/16",
"proteinPos":"545",
"geneId":"9020",
"hgnc":"MAP3K14",
"consequence":[
"missense_variant"
],
"hgvsc":"NM_003954.3:c.1634C>T",
"hgvsp":"NP_003945.2:p.(Ala545Val)",
"isCanonical":true,
"proteinId":"NP_003945.2"
} Subsequent versions of TSO500 used Nirvana 3.2.5.1 and Nirvana 3.2.6. Both provide the correct annotation: Nirvana 3.2.5.1{
"transcript":"NM_003954.3",
"source":"RefSeq",
"bioType":"protein_coding",
"codons":"ggC/ggT",
"aminoAcids":"G",
"cdnaPos":"1744",
"cdsPos":"1635",
"exons":"9/16",
"proteinPos":"545",
"geneId":"9020",
"hgnc":"MAP3K14",
"consequence":[
"synonymous_variant"
],
"hgvsc":"NM_003954.3:c.1635C>T",
"hgvsp":"NM_003954.3:c.1635C>T(p.(Gly545=))",
"isCanonical":true,
"proteinId":"NP_003945.2"
} Here we see the differences in Nirvana 3.2.6Nirvana 3.2.6 uses data directly from RefSeq and therefore annotates this transcript accurately: {
"transcript":"NM_003954.5",
"source":"RefSeq",
"bioType":"mRNA",
"codons":"ggC/ggT",
"aminoAcids":"G",
"cdnaPos":"1716",
"cdsPos":"1635",
"exons":"9/16",
"proteinPos":"545",
"geneId":"9020",
"hgnc":"MAP3K14",
"consequence":[
"synonymous_variant"
],
"hgvsc":"NM_003954.5:c.1635C>T",
"hgvsp":"NM_003954.5:c.1635C>T(p.(Gly545=))",
"isCanonical":true,
"proteinId":"NP_003945.2"
} Nirvana 3.16.1 - 3.19.0I can also confirm that the normal Nirvana releases (3.16.1 & 3.19.0) also annotate this incorrectly mostly because the input data had some artifacts. Nirvana 3.20Our latest internal release, Nirvana 3.20.0, grabs all the genes and transcript data directly from RefSeq and Ensembl. Therefore, like Nirvana 3.2.6, it annotates correctly: {
"transcript":"NM_003954.5",
"source":"RefSeq",
"bioType":"mRNA",
"codons":"ggC/ggT",
"aminoAcids":"G",
"cdnaPos":"1716",
"cdsPos":"1635",
"exons":"9/16",
"proteinPos":"545",
"geneId":"9020",
"hgnc":"MAP3K14",
"consequence":[
"synonymous_variant"
],
"hgvsc":"NM_003954.5:c.1635C>T",
"hgvsp":"NP_003945.2:p.(Gly545=)",
"isCanonical":true,
"proteinId":"NP_003945.2"
} |
Dear Michael, |
Example:
NC_000003.11:g.49928691T>CCorrected: NC_000017.10:g.43350892G>AThis is annotated for RefSeq as a missense variant NP_003945.2:p.(Ala545Val) with codons gCc/gTc, which is wrong because the frame is erroneously off by 1. The true change is ggC/ggT, which is a synonymous mutation Gly545Gly.
The text was updated successfully, but these errors were encountered: