Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend metha-cat to extract metadata records #33

Open
nichtich opened this issue Jun 22, 2023 · 3 comments
Open

Extend metha-cat to extract metadata records #33

nichtich opened this issue Jun 22, 2023 · 3 comments

Comments

@nichtich
Copy link

meta-cat should allow to extract harvested metadata records without OAI-Envelope (that's oai:record/oai:metadata/*).

@miku
Copy link
Owner

miku commented Jun 22, 2023

Just curious, is that something documented in the spec?

Another question: aren't there XML tools that would allow specific XML tag extraction? That way, we could keep functionality separate.

@nichtich
Copy link
Author

As far as I understand the spec, oai:metadata is nested in oai:OAI-PMH/oai:ListRecords/oai:record and it may contain any XML elements.

<complexType name="metadataType">
  <annotation>
    <documentation>Metadata must be expressed in XML that complies
     with another XML Schema (namespace=#other). Metadata must be 
      explicitly qualified in the response.</documentation>
  </annotation>
  <sequence>
    <any namespace="##other" processContents="strict"/>
  </sequence>
</complexType>

I'm only interested in this child element(s) of oia:metadata because this is the actual payload. I'm surprised people keep the OAI envelope.

Suire Workaround is to apply an XSLT to each record (when using find and xargs) or to the while set (when using meta-cat), but this is far from a one-liner:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Extract OAI metadata records from OAI-PMH responses -->
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:oai="http://www.openarchives.org/OAI/2.0/" exclude-result-prefixes="oai">
  <xsl:strip-space elements="*"/>
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/Response">
    <records>
      <xsl:apply-templates select="ListRecords/oai:record"/>
    </records>
  </xsl:template>

  <xsl:template match="/Records">
    <records>
      <xsl:apply-templates select="oai:record"/>
    </records>
  </xsl:template>

  <xsl:template match="oai:record">
      <xsl:copy-of select="oai:metadata/*"/>
  </xsl:template>
</xsl:transform>  

I suppose the extraction would only be a few lines and an optional command line flag in metha source code.

@miku
Copy link
Owner

miku commented Jun 22, 2023

I'm only interested in this child element(s) of oia:metadata because this is the actual payload. I'm surprised people keep the OAI envelope.I'm surprised people keep the OAI envelope.

Yes, metadata is the most interesting. If the envelop - oai:record/oai:metadata/* - would be just the metadata then one may mis the "set specifier" that's sometimes useful.

<record>
  <header status="">
    <identifier>oai:ojs.pkp.sfu.ca:article/2287</identifier>
    <datestamp>2020-06-22T11:34:53Z</datestamp>
    <setSpec>EJOEH:E</setSpec>
    <setSpec>driver</setSpec>
  </header>
  <metadata>
    <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" ....oai_dc.xsd">
      <dc:title xml:lang="it-IT">Editoriale / Editorial</dc:title>
      <dc:creator>Szmigielski, S.</dc:creator>
      ...
    </oai_dc:dc>
  </metadata>
</record>

Thanks for the XSTL.

but this is far from a one-liner:

I appreciate the desire for one-liners.

@miku miku changed the title Extend meta-cat to extract metadata records Extend metha-cat to extract metadata records Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants