Reads [sample2taxid].csv (see sample-processing repo), filters rows where matched_rank != "Species", renames and reorders columns based on taxonomy request spreadsheet requirements, and outputs results to .tsv file to be emailed to ENA for taxid creation.
Requires pygbif be installed in conda env to grab GBIF ID's from GBIF Backbone taxonomy using API.
usage: python [path/to/sample2taxid.csv] taxonomy_request.tsv species_output.csv
- path/to/[sample2taxid].csv = path to user-named output.csv file from sample-processing repo.
- taxonomy_request.tsv = .tsv file containing necessary fields for requesting taxonomic id creation by ENA. Can be named anything (see below).
- specis_output.csv = .csv file containing rows from sample2taxid.csv where matched_rank == 'species'.
proposed_name | name_type | host | project_id | description |
177658 | Apatania stylata | BGE: [Process ID] |[GBIF ID] | |
177627 | Agapetus iridipennis | BGE: [Process ID] |[GBIF ID] | |
177860 | Diplectrona meridionalis | BGE: [Process ID] |[GBIF ID] |
Species with inconsistencies in their GBIF ID's output to gbif_inconsistent.tsv for review. Parameter thresholds for 'inconsistent GBIF IDs):
- Multiple synonymous GBIF ID's
- < 95% confidence
- Without 'ACCEPTED' status
- Class != Insecta
- MatchType != EXACT
taxonomy_request.tsv emailed to ENA to request species-level taxID creation
- Figure out what to do when GBIF IDs are inconsistent.
- Parse new taxIDs created by ENA to file. Currently unsure how new taxIDs will be returned by ENA after creation, and how to get them into ENA sample registration form for sample accession number creation.
GBIF ID inconsistency example:
usageKey | scientificName | canonicalName | rank | status | confidence | matchType | kingdom | phylum | order | family | genus | species | kingdomKey | phylumKey | classKey | orderKey | familyKey | genusKey | speciesKey | synonym | class | index | acceptedUsageKey |
8753555 | Erotesis melanella McLachlan, 1884 | Erotesis melanella | SPECIES | SYNONYM | 98 | EXACT | Animalia | Arthropoda | Trichoptera | Leptoceridae | Adicella | Adicella melanella | 1 | 54 | 216 | 1003 | 4395 | 1436670 | 1436745 | True | Insecta | 5 | 1436745 |