Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prodigal to pyrodigal #2306

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

prodigal to pyrodigal #2306

wants to merge 17 commits into from

Conversation

meren
Copy link
Member

@meren meren commented Jul 13, 2024

This PR changes the default gene caller in anvi'o from prodigal to pyrodigal. Gene calling associated flags in anvi-gen-contigs-database remain the same, and --prodigal-single-mode continues to modify single/meta procedures in the pyrodigal module. I didn't change the names of these flags since pyrodigal uses prodigal code in the background.

The current PR does not implement the --prodigal-translation-table parameter. Because I couldn't find a way to do it properly at this time, but we can take a look at that once everything else is working and/or until someone complains (the parameter is there, but it produces 'not implemented' error, so there is no way to miss if/when one wants to use it).

The PR also adds a new parameter, --full-gene-calling-report, which generates a separate TAB-delimited text file with everything imaginable about gene calls (which is way more than the amount of data we store for each gene call at this point).

The prodigal driver is still in the codebase and works, although there is no way to invoke prodigal instead of pyrodigal through the CLI at the moment. If we want to keep it for any reason, we can, but I think we should remove it from the codebase.

@FlorianTrigodet
Copy link
Contributor

Sounds good to me. We should even switch to pyrodigal-gv. We also need to update the rest of the code base that uses "Prodigal" as default gene caller.

@Ge0rges
Copy link
Collaborator

Ge0rges commented Aug 29, 2024

Seems like the pulp dependency is really an issue of snakemake that's on the way to being solved. For now we might want to keep it included, and remove it when fixed upstream in the future.

@Ge0rges
Copy link
Collaborator

Ge0rges commented Sep 3, 2024

@meren what do you think about defaulting to pyrodigal-gv as Florian suggested? Looking into it briefly It seems that the viral models are added, so I think this would be a good default, though we could make it a flag to call GeneFinder or ViralGeneFinder.

@meren
Copy link
Member Author

meren commented Sep 4, 2024

I'm happy with that. We should make the change and wait for @FlorianTrigodet to thumbs up and merge the branch if everything is working out well.

We also should leave behind a small note (perhaps as a blog post on anvio.org) to mark this important change, to thank the developer of pyrodigal, and explain people what it means. It could be very short but I think it would be responsible to do so. Perhaps Florian would like to do that too since he pushed for it all this far? :)

@FlorianTrigodet
Copy link
Contributor

Latest updates:
I modified most of the codebase related to Prodigal, and updated the driver for pyrodigal-gv.

There are two flags, --prodigal-single-mode and --prodigal-translation-table, that I did not modify to something like --pyrodigal-gv-single-mode. If you think it is confusing to keep anything with the word 'prodigal' let me know and I'll updated the code.

I also found this unexpected function in utils.py:

def get_missing_programs_for_hmm_analysis():
    missing_programs = []
    for p in ['prodigal', 'hmmscan']:
        try:
            is_program_exists(p)
        except ConfigError:
            missing_programs.append(p)
    return missing_programs

It is only used in anvi-run-hmms. Any reasons why we needed to check for prodigal here? If there is no good reasons, I'll remove prodigal here.

Last, I need to update all the contigs.db in the tests directory. I will probably just modify the dbs, unless there is a good reason to re-run anvi-gen-contigs-database.

@Ge0rges
Copy link
Collaborator

Ge0rges commented Oct 7, 2024

I think it's more user friendly to keep the flags named the same way. The underlying engine is still prodigal after all.

@Ge0rges
Copy link
Collaborator

Ge0rges commented Oct 24, 2024

@FlorianTrigodet I believe the function you removed from utils is also used on line 15 in anvio/data/hmm/__init__.py . So importing anvio.data.hmm currently crashes (e.g. when running anvi-run-hmms).

Only found this because I currently live on the branch, it's my happy place :>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants