You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The various Ensembl databases for genome resources (core Ensembl, Metazoa, Fungi, Protists, Plants, Bacteria) all have their own versioning. However, ZARP-cli currently provides only the option for setting a single default release version to use. This can create problems, especially if users frequently run analyses on libraries from multiple source.
For example, a version number of, say, 50 could represent the latest version of one database, but a very old version for another. It is also highly possible that the desired version in one database is not yet (or not anymore) in another.
Solution
By defining database-specific default versions, users or groups of users will be able to run all of their analyses on a common recent release version for each group of organisms/sources.
Context
For reference, the current latest versions and corresponding release dates for the individual databases are:
Ensembl: 110 (July '23)
Ensembl Metazoa: 57 (July '23)
Ensembl Fungi: 57 (July '23)
Ensembl Protists: 57 (July '23)
Ensembl Plants: 57 (July '23)
Ensembl Bacteria: 57 (July '23)
From this it seems that releases are coordinated and that only two different versioning schemes are used (110 and 57).
Therefore, it would -at least for the moment- be sufficient to provide just one more default version parameter (for all of Metazoa, Fungi, Protists, Plants, Bacteria).
Suggested implementation
ZARP-cli currently has no knowledge of which organism/source is fetched from which Ensembl database. Apart from adding an additional parameter, therefore this information needs to be encoded somewhere, preferably as an additional column in the ./data/genome_assemblies_map.tsv.
The text was updated successfully, but these errors were encountered:
uniqueg
changed the title
Provide separate default versions for individual Ensembl databases
feat: support default versions for multiple Ensembl databases
Sep 6, 2023
Problem
The various Ensembl databases for genome resources (core Ensembl, Metazoa, Fungi, Protists, Plants, Bacteria) all have their own versioning. However, ZARP-cli currently provides only the option for setting a single default release version to use. This can create problems, especially if users frequently run analyses on libraries from multiple source.
For example, a version number of, say, 50 could represent the latest version of one database, but a very old version for another. It is also highly possible that the desired version in one database is not yet (or not anymore) in another.
Solution
By defining database-specific default versions, users or groups of users will be able to run all of their analyses on a common recent release version for each group of organisms/sources.
Context
For reference, the current latest versions and corresponding release dates for the individual databases are:
From this it seems that releases are coordinated and that only two different versioning schemes are used (110 and 57).
Therefore, it would -at least for the moment- be sufficient to provide just one more default version parameter (for all of Metazoa, Fungi, Protists, Plants, Bacteria).
Suggested implementation
ZARP-cli currently has no knowledge of which organism/source is fetched from which Ensembl database. Apart from adding an additional parameter, therefore this information needs to be encoded somewhere, preferably as an additional column in the
./data/genome_assemblies_map.tsv
.The text was updated successfully, but these errors were encountered: