-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: exon in xpore dataprep #217
Comments
Hi @mumdark, Can you send me your Thanks! Best wishes, |
Dear @yuukiiwa, Thanks! I've uploaded the first 100 lines of the GTF file I’m using. Best regard, |
I modified the #convert genomic positions to tx positions(raw)
#if is_gff < 0:
# for id in dict:
# tx_pos,tx_start=[],0
# for pair in dict[id]["exon"]:
# tx_end=pair[1]-pair[0]+tx_start
# tx_pos.append((tx_start,tx_end))
# tx_start=tx_end+1
# dict[id]['tx_exon']=tx_pos
# The code modification above is as follows, removing transcripts that don't contain the exon key:
if is_gff < 0:
id_names=list(dict.keys())
for id in id_names:
tx_pos,tx_start=[],0
if 'exon' not in dict[id].keys():
removed_value=dict.pop(id, None)
print("Remove: ", id, "-", removed_value)
continue
for pair in dict[id]["exon"]:
tx_end=pair[1]-pair[0]+tx_start
tx_pos.append((tx_start,tx_end))
tx_start=tx_end+1
dict[id]['tx_exon']=tx_pos However, is it correct to remove the transcripts that lack the Additionally, The set p-value is 0.5, but the diffmod.table only has around 300 rows. Is this normal? Thanks! Best regard, |
Hi @mumdark, Sorry for the delayed reply! If the flags You can always remove the following from the
Thanks! Best wishes, |
Hi, thank you for developing such an excellent tool!
I encountered an error while running the
dataprep
function in the xpore software as follows:I carefully checked line 242 of the
dataprep.py
and found that some transcripts indict
do not have the corresponding exon start and end points annotated, as shown in the screenshot for ENSMUST00000193812.Subsequently, I tested the readAnnotation function, adding the following check code to inspect the
attrDict
,type
,start
, andend
variables for ENSMUST00000193812:The output is as follows:
This indicates that the
exon line
for ENSMUST00000193812 is above thetranscript line
, leading to this single-exon transcript, ENSMUST00000193812, not generating the expected information during the following code condition.Then, I added the following code to prevent this sequencing error:
Although this resolved the issue, I encountered an error in the next loop due to not all genes having multiple exons.
For example, it does not produce an error for this key:
But it does produce an error for this keys with only one pair of exon loci:
I am unsure if this is a bug or an error in the order of the GTF file.
Thanks!
The text was updated successfully, but these errors were encountered: