Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support default versions for multiple Ensembl databases #69

Open
uniqueg opened this issue Sep 6, 2023 · 0 comments
Open

feat: support default versions for multiple Ensembl databases #69

uniqueg opened this issue Sep 6, 2023 · 0 comments
Labels
future will not be fixed for NOW

Comments

@uniqueg
Copy link
Member

uniqueg commented Sep 6, 2023

Problem

The various Ensembl databases for genome resources (core Ensembl, Metazoa, Fungi, Protists, Plants, Bacteria) all have their own versioning. However, ZARP-cli currently provides only the option for setting a single default release version to use. This can create problems, especially if users frequently run analyses on libraries from multiple source.

For example, a version number of, say, 50 could represent the latest version of one database, but a very old version for another. It is also highly possible that the desired version in one database is not yet (or not anymore) in another.

Solution

By defining database-specific default versions, users or groups of users will be able to run all of their analyses on a common recent release version for each group of organisms/sources.

Context

For reference, the current latest versions and corresponding release dates for the individual databases are:

  • Ensembl: 110 (July '23)
  • Ensembl Metazoa: 57 (July '23)
  • Ensembl Fungi: 57 (July '23)
  • Ensembl Protists: 57 (July '23)
  • Ensembl Plants: 57 (July '23)
  • Ensembl Bacteria: 57 (July '23)

From this it seems that releases are coordinated and that only two different versioning schemes are used (110 and 57).

Therefore, it would -at least for the moment- be sufficient to provide just one more default version parameter (for all of Metazoa, Fungi, Protists, Plants, Bacteria).

Suggested implementation

ZARP-cli currently has no knowledge of which organism/source is fetched from which Ensembl database. Apart from adding an additional parameter, therefore this information needs to be encoded somewhere, preferably as an additional column in the ./data/genome_assemblies_map.tsv.

@uniqueg uniqueg added the future will not be fixed for NOW label Sep 6, 2023
@uniqueg uniqueg changed the title Provide separate default versions for individual Ensembl databases feat: support default versions for multiple Ensembl databases Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future will not be fixed for NOW
Projects
None yet
Development

No branches or pull requests

1 participant