Written by Joscha S. Rieber (Fraunhofer IAIS) in 2020
This project shows how to train a language recognizer from scratch that is able to distinguish between German and English. These notebooks build up a playground together with the data from Common Voice to build strong models. With the data, pre-processing and model the accuracy is 93.8 %.
This repository is also described in more detail in my article published by Towards AI.
- On Linux:
- Download this repository from GitHub
- Call "bash run.sh"
- This script will first look if the environment is ready, if not, it will download Miniconda and create the conda environment. Please note that you will need "wget" to succeed.
- Now go through the notebooks in the right order and follow the given instructions.
A fast CPU is recommended for data augmentation and pre-processing. For the model training, a well-suited GPU is necessary. I have tested the scripts with an Nvidia P5000 and an Nvidia Tesla G80. The dataset coming from Mozilla Common Voice has a huge size. It might take a lot of time to process all of the data.
- Bartz et al.: Language Identification Using Deep Convolutional Recurrent Neural Networks
- Paul-Louis Pietz Prove: Spoken Language Recognition
- Sarthak et al.: Spoken Language Identification using ConvNets
- Szegedy et al.: Rethinking the Inception Architecture for Computer Vision
- Husein Zolkepli: Sound Augmentation Librosa