Update MEI parsing and creation of OMR search tokens #845

dchiller · 2024-04-11T18:58:51Z

This PR makes a number of changes to helpers/mei_processing/mei_parser.py and helpers/mei_processing/mei_tokenizer.py and their associated type and test files. These changes include:

refactoring MEITokenizer so that we no longer return two different types of ngram documents (one on the neume level and one on the neume component level) but a single type of ngram document. This single type always includes pitch (and therefore contour and interval) information and will also include neume names if the ngram coincides with a set of complete neumes. This refactoring ensures that we can: 1. return pitch information when a neume name is queried; and 2. we don't have multiple ngrams (one containing pitch information and one containing neume names) for the same set of pitches.
removing empty syllables and neume from an MEI file during parsing. It seems that previous versions of MEI encoding during the OMR process could create these empty object. While this issue has been fixed, we will, at least for a little while, encounter files from before the fix.
modifying the dictionaries created by MEITokenizer to include fields required for indexing (id and type) and fields that we want to be easily available in the documents returned by Solr (manuscript and folio)

Additional refactoring includes:

changing neume_type to neume_name (and NeumeType to NeumeName, etc.)
adding a neume's system to the neume component objects it contains
add a few additional development dependencies for typing and linting

app/public/cantusdata/helpers/mei_processing/mei_tokenizer.py

- change "s" (stay) to "r" (repeat) in ContourType - specify intervals as semitone intervals (to disambiguate other type of intervals that will be introduced later) - fix a number of comments that were missed in the neume_type -> neume_name transition - adjust tests of mei parser for various name changes

Fix neume_name type in Neume type Rename NeumeType to NeumeName

…uments

dchiller · 2024-05-14T16:09:36Z

@lucasmarchd01 Could you give this another look? Thanks!

dchiller changed the title ~~Mei parsing updates~~ Update MEI parsing and creation of OMR search tokens Apr 11, 2024

dchiller requested review from jacobdgm and lucasmarchd01 April 11, 2024 22:10

dchiller marked this pull request as ready for review April 11, 2024 22:10

jacobdgm removed their request for review April 12, 2024 15:08

dchiller requested review from lucasmarchd01 and removed request for lucasmarchd01 April 23, 2024 11:53

lucasmarchd01 reviewed May 7, 2024

View reviewed changes

app/public/cantusdata/helpers/mei_processing/mei_tokenizer.py Outdated Show resolved Hide resolved

app/public/cantusdata/helpers/mei_processing/mei_tokenizer.py Show resolved Hide resolved

app/public/cantusdata/helpers/mei_processing/mei_tokenizer.py Outdated Show resolved Hide resolved

dchiller requested a review from lucasmarchd01 May 8, 2024 11:26

dchiller added 14 commits May 14, 2024 12:07

Add manuscript and folio identifiers to MEITokenizer types

8408d7b

Add removal of empty neumes to MEIParser

6aa7577

Add id and type fields to ngram documents

3a96ddb

Add django-related development dependencies

d0232f2

Remove empty syllables along with empty neumes

b24b7c5

Change neume_type to neume_name

aff51c4

Add djangorestframework stubs

6ee2b88

Add system to NeumeComponent type

08d4712

Fix neume_name type in Neume type Rename NeumeType to NeumeName

Rewrite MEITokenizer for combine neume component- and neume-level doc…

7138019

…uments

Fix NeumeName types in mei_parser.py

95cd6f0

Add break to neume name collection while loop

9128255

Move NgramDocument type to mei_parsing_types.py

322a7b6

Add types to _create_documents_from_neume_ngrams

526011d

dchiller force-pushed the mei-parsing-updates branch from 59eaed7 to 526011d Compare May 14, 2024 16:09

lucasmarchd01 approved these changes May 15, 2024

View reviewed changes

dchiller merged commit 37bf9ae into DDMAL:main May 15, 2024
2 checks passed

dchiller deleted the mei-parsing-updates branch May 15, 2024 16:48

dchiller mentioned this pull request May 15, 2024

Add command for indexing MEI files in solr #848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MEI parsing and creation of OMR search tokens #845

Update MEI parsing and creation of OMR search tokens #845

dchiller commented Apr 11, 2024 •

edited

Loading

dchiller commented May 14, 2024

Update MEI parsing and creation of OMR search tokens #845

Update MEI parsing and creation of OMR search tokens #845

Conversation

dchiller commented Apr 11, 2024 • edited Loading

dchiller commented May 14, 2024

dchiller commented Apr 11, 2024 •

edited

Loading