We open sourced our three simulated datasets, VCTK-Art, VCTK-Pro and AISHELL3-Pro.
Download link will be attached soon.
check Audio samples
Please refer environment.yml
If you have Miniconda/Anaconda installed, you can directly use the command: conda env create -f environment.yml
We opensourced our inference code and checkpoints, here are the steps to perform inference:
-
Clone this repository
-
Download VITS pretrained model, here we use
pretrained_ljs.pth
. -
Download Stutter-Solver-checkpoints, create a folder under
stutter-solver
, namedsaved_models
, and put all downloaded models into it. -
We also provide testing datasets for quick inference, you can download it here.
-
Build Monotonic Alignment Search
cd stutter-solver/monotonic_align
python setup.py build_ext --inplace
- Run
stutter-solver/etc/inference.ipynb
to perform inference step by step.
We use VITS as our TTS model.
-
Clone this repository
-
Download VITS pretrained models, here we need
pretrained_vctk.pth
to achieve multi-speaker.- create a folder
dysfluency_simulation/path/to
, and put the downloaded model into it.
- create a folder
-
Build Monotonoic Alignment Search
cd dysfluency_simulation/monotonic_align
python setup.py build_ext --inplace
- Generate simulated speech
# Phoneme level
python generate_phn.py
# Word level
python generate_word.py
- Switch to vits-chinese branch
We use vits_chinese as our TTS model for Chinese simulation. Download checkpoints according to its README
and place it in the specified path.
- Build Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace
- Generate simulated speech
python generate_dysfluency.py
If you find our paper helpful, please cite it by:
@misc{zhou2024stuttersolverendtoendmultilingualdysfluency,
title={Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection},
author={Xuanru Zhou and Cheol Jun Cho and Ayati Sharma and Brittany Morin and David Baquirin and Jet Vonk and Zoe Ezzes and Zachary Miller and Boon Lead Tee and Maria Luisa Gorno Tempini and Jiachen Lian and Gopala Anumanchipalli},
year={2024},
eprint={2409.09621},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2409.09621},
}