Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GBIF's EML profile — multilingual support #6

Open
MattBlissett opened this issue Aug 7, 2023 · 1 comment
Open

Update GBIF's EML profile — multilingual support #6

MattBlissett opened this issue Aug 7, 2023 · 1 comment

Comments

@MattBlissett
Copy link
Member

MattBlissett commented Aug 7, 2023

Multilingual support for dataset descriptions and other dataset metadata

The issue to update the EML profile does not include multilingual support. This would be a significant extension to the Registry's APIs, in terms of the work required for GBIF and potentially the long-term overhead of always handling multiple languages.

Some interest was shown in this issue, but as this is a significant change (in implementation and API complexity) we should check whether there is still demand for it.

These elements would gain multilingual support. A publisher can then choose to provide equivalent metadata in two or more languages for these elements.

  • Existing: dataset/title — this element is already supported, but we will support data in multiple languages:
    Old:

    <title>
      Vernal pool amphibian density data, Isla Vista, 1990-1996.
    </title>

    New:

    <title> <!-- Language taken from the overall dataset language -->
      Vernal pool amphibian density data, Isla Vista, 1990-1996.
      <value xml:lang="fr">
        Données de densité d'amphibiens de la piscine vernale, Isla Vista, 1990-1996.
      </value>
    </title>
  • Existing: dataset/abstract — Existing, "a brief overview of the resource."

  • Existing: dataset/purpose — Existing, "a synopsis of the purpose of this dataset."

  • New: dataset/introduction — New, "an overview of the background and context for the dataset."

  • New: dataset/gettingStarted — New, "a high level overview of interpretation, structure and content of the dataset."

  • New: dataset/acknowledgements — New, "text that acknowledges funders and other key contributors."
    Three new elements will be supported, all with multilingual support.
    Old:

    <abstract>
      <para>
        &lt;em&gt;Reef Life Survey&lt;/em&gt; (RLS) aims to improve biodiversity conservation...

    New, showing all available formatting using DocBook:

    <abstract>
      <para><emphasis>Reef Life Survey</emphasis> (RLS) aims to improve biodiversity conservation...</para>
      <section>
        <title>A separate section</title>
        <para>More text</para>
        <para>And more text, with
          <itemizedlist>
            <listitem>First item</listitem>
          </itemizedlist>
          <orderedlist>
            <listitem>First item</listitem>
          </orderedlist>
          <section>
            <title>A sub-section</title>
            <emphasis>Emphasis</emphasis>
            CO<subscript>2</subscript> (or just CO₂)
            m<superscript>3</superscript> (or just m³)
            <literalLayout>
              x = fn(y, z)
            </literalLayout>
          </section>
          <ulink url="https://example.org"><citetitle>Example link</citetitle></ulink>
        </para>
      </section>
    
      <!-- And then the same can be provided in another language -->
      <para lang="fr"><emphasis>Enquête sur la vie des récifs</emphasis> (EVR) vise à améliorer...</para>
      <section lang="fr">
        <title>Title in French</title>
        <para>In French...</para>
        <para>...etc.</para>
      </section>
    </abstract>

    Note that we expect an element to be either fully translated, or not translated at all.

Earlier proposal and implementation

We describe the older method in the GMP guide, but never added support for this in the GBIF Registry. Now we could implement the newer method.

This is important for a new profile version, but needs care for API changes to the Registry to expose the result. Perhaps we could implement something without changing /v1/dataset (e.g. return only the primary language, or use the Accept-Language header) with a parameter to 'explode' the supported terms into arrays:

{
  "title": {
    "en": "...",
    "fr": "..."
  } ...
}

Similar changes would probably be required for the organization and node entities.

@MattBlissett
Copy link
Member Author

MattBlissett commented Oct 17, 2023

Result of the survey at the Global Nodes Meeting, with 39 votes:

  • 49% Nice to have
  • 36% High priority
  • 15% Not important for us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant