Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for partial genes prokka-1.12 #219

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jennahd
Copy link

@jennahd jennahd commented Feb 23, 2017

In Summary:

  • Prodigal can now be run without the -c option (closed ends) allowing for partial gene prediction.
  • Prodigal's gff output is now parsed with Bio::Tools:GFF, using the 'partial' tag of prodigal to find genes running off edges.
  • Bio::Location::Fuzzy is used to correctly set the start and end of CDSs. Also the frame and codon_start attributes were set.
  • Some tweaking was performed to correctly print the tbl file, think all possible cases are covered.
  • Added a new --partialgenes option to prokka.

We (@novigit and I) applied these changes made by @lguy for prokka version 1.11 (issue #37) to prokka version 1.12.

In addition we made one other fix and completed some tests:

  • For partial genes on contig edges, in some cases proteins were being called in the wrong frame, but gene translation for these cases works properly now.
  • Running with --partialgenes now gives the same ORF predictions as prodigal without -c (except for ORFs that overlap with tRNAs etc. or those otherwise removed by prokka).
  • Running without --partialgenes gives the same ORF predictions as prodigal with -c (except for the ORFs that overlap with tRNAs etc. or those otherwise removed by prokka), re. the same output as prokka 1.12 before these changes.
  • No frameshift issues with partial genes were noted when running with several different gcode's.

Copy link

@lguy lguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the two small things in README.md (line 43, should be as in the original; ll. 478 and following, should be as in the original), I'd recommend accepting those changes. I believe they have been tested thoroughly. We've been using the version with --partialgene (based on prokka 1.11) in the past 2 years without issue.
Did @jcmcnh also had the opportunity to test them?

@jcmcnch
Copy link

jcmcnch commented Feb 24, 2017

@jennahd @lguy @novigit This is fantastic, thank you so much! I pulled your changed code and ran it on my dataset and it works like a charm. Without the --partialgenes option enabled, I got 2525 ORFs, and with it I get 4590 ORFs! A lot of those new ones are short contigs that might be meaningless but I was also definitely was missing a lot of edge genes contained in larger contigs. Even for some of the shorter contigs that have edges on both sides, prokka gave functional annotations - information I can keep now thanks to your changes!

So bottom line, it definitely works well and allows me to parse the maximum amount of meaningful data from a fragmented assembly. This will be very handy for both SAGs and metagenomes! Many thanks for sharing your hard work!

rdenise added a commit to rdenise/prokka that referenced this pull request Jan 4, 2022
Adding the improvement about the pull request tseemann#219 to remove the `-c`  in prodigal in metagenome and adding the `--partialgenes` option.
Also adding improvment (1) from issue tseemann#474
rdenise added a commit to rdenise/prokka that referenced this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants