Deepaudio-tts is a framework for training neural network based Text-to-Speech (TTS) models. It inlcudes or will include popular neural network architectures for tts and vocoder models.
To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework. It is still in development.
- Preprocess you data. (Scripts comming soon, or you can follow the tutorial of paddle speech for this step.)
- Train the model. You can choose one experiment in deepaudio/tts/cli/configs/experiment. Then train the model with following lines:
$ export PYTHONPATH="${PYTHONPATH}:/dir/of/this/project/"
$ python -m deepaudio.tts.cli.train experiment=tacotron2 datamodule.train_metadata=/you/path/to/train_metadata datamodule.dev_metadata=/you/path/to/dev_metadata
- Tacotron2
- FastSpeech2
- Transformer TTS
- Parallel WaveGAN
- HiFiGAN
- VITS
- Remove redundant codes.
- make deepaudio.tts.models more clean.
- Other models.
- Pretrained models.
- onnx
- jit
It is a personal project. So I don't have enough gpu resources to do a lot of experiments. This project is still in development. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.
I borrowed a lot of codes from espnet and paddle speech