Skip to content

Maintaining authority files

andrew-morrison edited this page Jan 3, 2018 · 2 revisions

Purpose

Authority files are necessary to implement separate search indexes for entities such as works, people or places.

These are mentioned throughout the manuscript TEI files, and users will be able to find them by searching the text of those descriptions. But to control which should be discoverable via browsing, and dedicated search indexes, then lists must be maintained in separate XML files. Identifiers, which must be unique within the catalogue, link entries in those authority files to the elements (e.g. author, persName, placeName, settlement, etc) and form part of the URLs of the pages that will be created for them on the web site.

The controlled vocabulary aspect of an authority file allows variations to be indexed as synonyms. For example, if the name of an author of a work in a manuscript is recorded in their native language form, then the authority file can provide latinizated and other versions. Place names can be disambiguated and historical spellings added. One version is chosen for display on the page that will be created for each on the web site, but it will not affect what is displayed in the manuscript description.

Additional information can be added, such as geographical coordinates for places, dates of birth and death for people, and references to other resources such as VIAF or Library of Congress vocabularies . These are not currently used by the web site, but features could be built on top of such information in the future.

Setting up authority files

Authority files should be kept in the root folder of your catalogue's GitHub repository (not in either the collections or processing subfolders) and named appropriately (e.g. "works.xml" for a works authority file.)

The contents should be TEI documents, although not using the same customized schema as for manuscript descriptions. The following template can be used to start the file:

TODO

The entries added within depend on type of entities:

  • Works should be represented by bibl elements within a listBibl parent element.
    • TODO: More rules
  • People should be represented by person elements within a listPerson parent element.
    • TODO: More rules
  • Places should be represented by place elements within a listPlace parent element.
    • Use a type attribute to indicate whether each is a 'settlement', a 'region', or a 'country'
    • TODO: More rules
  • Organizations should be represented by org elements within a listOrg parent element.
    • TODO: More rules

In all cases, an xml:id attribute must be specified, containing a unique ID, which cross-references the key attributes of corresponding element in manuscript descriptions.

If multiple people are going to be editing the authority files, conflicts are likely. TODO: Move stuff about resolving them here?

Setting up the web site

When you have decided which authority files you want to maintain, and begun to create them, raise an issue in your repository on GitHub to request the processing scripts required to read the authority files, cross-reference them with the manuscript descriptions, and build indexes to enable browsing on the web site.