Skip to content

Processing and Indexing

andrew-morrison edited this page Dec 21, 2017 · 3 revisions

****** DRAFT ******

This is typically something one or two people per catalogue should take on the role of performing, maybe once a week, monthly, or ad hoc, depending on your needs.

Setting up the processing scripts

  1. If you haven't already, clone the repository for your catalogue from GitHub, see Editing TEI files. As well as the TEI manuscript descriptions, this downloads copies of the processing scripts. But the first time you will need to set a few additional things up.
  2. Go to https://git-scm.com/downloads, download and run the installer. Accept all the default options.
  3. Open a command prompt and enter java -version. If that returns an error message (or a version less than 1.6 or “Java 6”) then you will need to install/update Java on your PC.
  4. Ask the Bodleian for access to the Solr server (which is the search engine that powers the web site.) You can proceed with the following steps while waiting, but you will get a message that "Emptying Solr failed" when you reach the point at which the web site would be updated until you access has been granted.

Running the processing scripts

  1. Find the local copy of your catalogue's repository, wherever you chose to put it when you cloned it.
  2. If using Windows, right-click on the processing folder and select Git Bash here. On macOS or Linux open a terminal and cd to that directory.
  3. Ensure that your local copy of the TEI files and processing scripts is up-to-date and synchronized with the central repository on GitHub. See Editing TEI files for how to do this in the GitHub desktop client. Or, with Git installed, you can simply run the git pull command.
  4. Enter ./index-all-qa.sh
  5. This will process the TEI manuscript descriptions and generate files containing the information needed by Solr to populate indexes on the web site. It does so for each index in turn (typically one index for each of the main buttons on the web site, although depending on your catalogue's needs there may be more.) These are stored in html and solr subfolders (these are blocked from being committed to GitHub and they should not be manually edited.) This process can take several minutes.
  6. The script checks for discrepancies between the authority files and the TEI manuscript descriptions. If it finds any, it will inform you, and ask if you wish to send the files to be re-indexed anyway. The issues are categorized as either "warnings" (something that will cause a visible but minor problem on the web site, such as a broken link) and "info" messages (things that might result in something being missed out, but visitors won't notice.) If anything more serious goes wrong it will log this as an error you will not be able to re-index.
  7. If you want to update the web site without being prompted about minor issues, run ./index-all-qa.sh force. If you just want to build the files and check for data issues, but not update the web site, run ./index-all-qa.sh noindex.
  8. The above instructions will update the QA site. To update the live site, run ./index-all-prd.sh instead.