You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use the GFF3toolkit to remove some gene models (all with one isoform, from an external list) from a gff3 file. I first run gff3_QC -g assembly_MAKER1.gff -f assembly.fa -o QC_report1 -s QC_stats1
and got this report:
==> QC_report <==
Line_num Error_code Error_level Error_tag
['Line 1'] Esf0014 Error ["##gff-version" missing from the first line]
['Line 15079'] Esf0012 Info [Found 5 Ns in CDS feature of length 296 using the external FASTA, consists of 1 segment (start, length): (210940, 5)]
==> QC_stats <==
Error_code Number_of_problematic_models Error_level Error_tag
Esf0014 1 Error ##gff-version" missing from the first line
Esf0012 1 Info Found Ns in a feature using the external FASTA
(I can fix the header myself)
I wonder how I can use gff3_fix to remove ~1500 genes (gene, mRNA, exon, and CDS lines): is it possible to create a 4-column file to submit to -qc_r? Can I use any of the error codes that have a "delete_model" function? Is there a way to specify the gene ID instead of the line number?
Also, is there a feature to remove gene models whose protein sequence does not start with M?
Thanks,
Dario
The text was updated successfully, but these errors were encountered:
Hi @dcopetti - that's an interesting use case! I suppose you could hack a qc report file to get that done. The qc reports are line-based because not every feature in gff3 is required to have an ID. So you could provide the line number of the gene feature and assign it an error code that uses the delete_model function (https://github.com/NAL-i5K/GFF3toolkit/blob/master/docs/gff3_fix.py-documentation.rst). I've never tried this, but it might work.
The gff3toolkit doesn't have a function to flag or delete models with partial protein sequences.
Hello,
I would like to use the GFF3toolkit to remove some gene models (all with one isoform, from an external list) from a gff3 file. I first run
gff3_QC -g assembly_MAKER1.gff -f assembly.fa -o QC_report1 -s QC_stats1
and got this report:
(I can fix the header myself)
I wonder how I can use
gff3_fix
to remove ~1500 genes (gene, mRNA, exon, and CDS lines): is it possible to create a 4-column file to submit to-qc_r
? Can I use any of the error codes that have a "delete_model" function? Is there a way to specify the gene ID instead of the line number?Also, is there a feature to remove gene models whose protein sequence does not start with M?
Thanks,
Dario
The text was updated successfully, but these errors were encountered: