Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New field literatureCited in GBIF EML profile #29

Open
mike-podolskiy90 opened this issue Apr 11, 2024 · 7 comments
Open

New field literatureCited in GBIF EML profile #29

mike-podolskiy90 opened this issue Apr 11, 2024 · 7 comments
Assignees

Comments

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Apr 11, 2024

There's no clear view for this field yet.

My questions/concerns:

  • Current EML field eml/additionalMetadata/metadata/gbif/bibliography - keep or remove
  • From IPT perspective - how input would look like
  • Where and how we store. We have to be able to re-create it from the Dataset class when writing EML
  • What BibTeX fields would be (e.g. @article/@book/..., title, author, year etc.)

EML details https://eml.ecoinformatics.org/eml-schema

related #5

@MattBlissett @mdoering

@mike-podolskiy90 mike-podolskiy90 self-assigned this Apr 11, 2024
@mdoering
Copy link
Member

I would remove eml/additionalMetadata/metadata/gbif/bibliography now that there is an official replacement in EML. It does introduce the problem though that we had a list of citation strings with ids, e.g. doi, before and now its a structured reference. We might recommend to put the entire citation into the bibtex title, but that's a bit of a hack. So maybe we should live with parallel structures which also isnt really great and maybe even worse?

Input from IPT perspective could be structured or unstructured. Structured input is a bit of work, but in an ideal scenario you would be able to enter only the bibtex fields relevant for your citation type chosen. Very convenient would be an option to just enter a DOI and retrieve the bibtex record from Crossref, see https://github.com/CatalogueOfLife/coldp/blob/master/docs/publishing-guide-txtree.md#references

@thomasstjerne implemented a form for bibtex entries in the CLB metadata forms: https://www.checklistbank.org/tools/metadata-generator
We miss a create from scratch option currently, so just enter title: 1234 in the YAML form and hit the edit button to see the form in the source section at the very bottom:

image

Storing wise I would create a new BibTexReference class and use that as the list in Dataset with

  • String id
  • BibTexTypeEnum type
  • Map<String, String> fields

The new enum should hole the 14 entry types given here: https://bibtex.eu/types/

Implementation wise there is a very basic jbibtex library that is useful to read, parse and write BibTeX. But I don't think we want to depend on that in our API.

@mike-podolskiy90
Copy link
Contributor Author

Thank you Markus

I can imagine maintaining two separate entities would be messy. Can we probably find a way to migrate it for all existing resources somehow to literatureCited and remove bibliography?

@mdoering
Copy link
Member

mdoering commented Apr 15, 2024

I see 2 options:

  • use anystyle.io to parse citations. That can be run locally and also trained to our cases. It is pretty good, try it, but it would certainly make some mistakes.
  • have a convention to place the entire citation in title or only try to parse author, year and put the rest into the title.

I am struggling with this for years and don't think there is a perfect solution.

Updating existing resources touches on a much broader problem. Do we plan to migrate all EML to 1.3 in the registry?
I would expect us to keep the exact copy as it came in, even if it was version 1.2, but use 1.3 for all our generated and dynamic output. But you would still be able to request the original 1.2 version. The IPT can just upgrade existing resources when someone upgrades his IPT - apart from already published and archived ones probably.

@mike-podolskiy90
Copy link
Contributor Author

I have a feeling we might want to postpone the introduction of literatureCited as we did for the multi-language support. Looks like it's big enough to be a dedicated thing. @MattBlissett your thoughts please?

@mdoering
Copy link
Member

for checklistbank it would be really great to have, but I understand the troubles

@mdoering
Copy link
Member

mdoering commented Apr 15, 2024

I just checked checklistbanks eml 1.2 parser and it places the entire citation into the title and makes it a book type bibtex entry.
It then also checks the identifier and populates the DOI or link fields in case its a doi or http based identifier

  public static Citation create(String citation) {
    Citation c = new Citation();
    c.setTitle(citation);
    c.setType(CSLType.BOOK);
    return c;
  }

  public static Citation create(String citation, String identifier) {
    if (StringUtils.isBlank(citation)) return null;
    Citation c = create(citation.trim());
    if (!StringUtils.isBlank(identifier)) {
      var opt = DOI.parse(identifier);
      if (opt.isPresent()) {
        c.setDoi(opt.get());
        c.setId(c.getDoi().getDoiName());
      } else {
        try {
          URI link = URI.create(identifier);
          c.setUrl(link.toString());
        } catch (IllegalArgumentException e) {
          c.setNote(identifier);
        }
      }
    }
    return c;
  }

@mike-podolskiy90 mike-podolskiy90 changed the title New field literatureCited in GBIF EML profile 1.3 New field literatureCited in GBIF EML profile Jul 31, 2024
@mike-podolskiy90
Copy link
Contributor Author

This is a complex issue. We need to be careful implementing it. But first, we need to release the Metadata profile 1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants