Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MEI parsing and creation of OMR search tokens #845

Merged
merged 14 commits into from
May 15, 2024

Conversation

dchiller
Copy link
Collaborator

@dchiller dchiller commented Apr 11, 2024

This PR makes a number of changes to helpers/mei_processing/mei_parser.py and helpers/mei_processing/mei_tokenizer.py and their associated type and test files. These changes include:

  • refactoring MEITokenizer so that we no longer return two different types of ngram documents (one on the neume level and one on the neume component level) but a single type of ngram document. This single type always includes pitch (and therefore contour and interval) information and will also include neume names if the ngram coincides with a set of complete neumes. This refactoring ensures that we can: 1. return pitch information when a neume name is queried; and 2. we don't have multiple ngrams (one containing pitch information and one containing neume names) for the same set of pitches.
  • removing empty syllables and neume from an MEI file during parsing. It seems that previous versions of MEI encoding during the OMR process could create these empty object. While this issue has been fixed, we will, at least for a little while, encounter files from before the fix.
  • modifying the dictionaries created by MEITokenizer to include fields required for indexing (id and type) and fields that we want to be easily available in the documents returned by Solr (manuscript and folio)

Additional refactoring includes:

  • changing neume_type to neume_name (and NeumeType to NeumeName, etc.)
  • adding a neume's system to the neume component objects it contains
  • add a few additional development dependencies for typing and linting

@dchiller dchiller changed the title Mei parsing updates Update MEI parsing and creation of OMR search tokens Apr 11, 2024
@dchiller dchiller marked this pull request as ready for review April 11, 2024 22:10
@jacobdgm jacobdgm removed their request for review April 12, 2024 15:08
@dchiller dchiller requested review from lucasmarchd01 and removed request for lucasmarchd01 April 23, 2024 11:53
@dchiller
Copy link
Collaborator Author

@lucasmarchd01 Could you give this another look? Thanks!

@dchiller dchiller merged commit 37bf9ae into DDMAL:main May 15, 2024
2 checks passed
@dchiller dchiller deleted the mei-parsing-updates branch May 15, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants