This repository contains Python scripts and workflow for a) taking an MFA forced alignment model that was trained for one language, and b) running that model onto a different language.
I tested the code with some Armenian data by aligning with an English model (and some other high-resource models). The alignment seems to work well.
The rationale is that for low-resource languages, it takes a lot of data (sound files, transcriptions, pronunciation dictionaries) to create a high-quality alignment model. As a stepping stone, you can run a model from a high-resource language (like English) onto your low-resource language (like Armenian). The generated alignments seem to be quite sensible. In my anecdotal experience, the alignments I get from an English-based model (that's trained on over >1000hrs) are better than the alignments from a custom-made model (based on 1-20hrs of data).
The following workflow explains the steps to running the scripts alongside MFA. There are example files in Examples. A lot of the background work was done thanks to TextGridTools.
- Ensure that MFA is running on your system.
- Ensure you have a high-resource acoustic model like the
english_mfa
. - Ensure you have an original pronunciation dictionary, called
pronDictOriginal.txt
. The dictionary should have the format ofword IPA
-
Review the [list of phones] in the
english_mfa
model. -
Create a phone mapping file like
phoneMapping.txt
.
This file will map phones that exist in the low-resource language's pronunciation dictionary pronDictOriginal.txt
but which are absent in english_mfa
.
For every such non-English phone, write an approximate English phone. For example, a non-English trill /r/ can be approximated and mapped to an English flap /ɾ/. For example, see here.
- Convert your original dictionary
pronDictOriginal.txt
into an intermediate dictionarypronDictIntermediate.txt
by running the following command:
python convertPronDict.py pronDictOriginal.txt phoneMapping.txt pronDictIntermediate.txt
This command will replace the non-English phones with English phones. The script should return errors if there any issues in your original dictionary or phone-mapping file.
-
Keep note of the generated file
wordTranscriptions.pkl
which will be used to transfer information about the dictionary across the Python files. -
Validate MFA on your dictionary and corpus to make sure there are no non-English phones.
mfa validate $CORPUS_DIRECTORY pronDictIntermediate.txt english_mfa --ignore_acoustics
- Run the MFA aligner on your corpus with the intermediate dictionary.
mfa align $CORPUS_DIRECTORY pronDictIntermediate.txt english_mfa $OUTPUT_DIRECTORY --clean --overwrite
- Convert the generated alignments from English phones back to non-English phones.
python convertAlignments.py wordTranscriptions.pkl $OUTPUT_DIRECTORY
There are measures to minimize variation in the data. But I haven't yet incorporated fixes for some likely common errors.
- You can have words that have multiple possible pronunciations. However, the conversion codes currently cannot support converting an alignment where a segment was deleted. To allow this level of flexibility, the conversion would likely need to incorporate a type of
shortest edit distance
algorithm. - I haven't tested out the conversion scripts with funny edge cases like case-sensitivity.
It would be interesting to also use this workflow to examine how different high-resource language models handle the same data from different languages. Feel free to contact me if you have any ideas for collaboration or fixes.