EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.
- BLSTM EEND (INTERSPEECH 2019)
- Self-attentive EEND (ASRU 2019)
The EEND extension for various number of speakers is also provided in this repository.
- Self-attentive EEND with encoder-decoder based attractors
- NVIDIA CUDA GPU
- CUDA Toolkit (8.0 <= version <= 10.1)
cd tools
make
- This command builds kaldi at
tools/kaldi
- if you want to use pre-build kaldi
This option make a symlink at
cd tools make KALDI=<existing_kaldi_root>
tools/kaldi
- if you want to use pre-build kaldi
- This command extracts miniconda3 at
tools/miniconda3
, and creates conda envirionment named 'eend' - Then, installs Chainer and cupy into 'eend' environment
- use CUDA in
/usr/local/cuda/
- if you need to specify your CUDA path
This command installs cupy-cudaXX according to your CUDA version. See https://docs-cupy.chainer.org/en/stable/install.html#install-cupy
cd tools make CUDA_PATH=/your/path/to/cuda-8.0
- if you need to specify your CUDA path
- use CUDA in
- Modify
egs/mini_librispeech/v1/cmd.sh
according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.
cd egs/mini_librispeech/v1
./run_prepare_shared.sh
./run.sh
- If you use encoder-decoder based attractors [3], modify
run.sh
to useconfig/eda/{train,infer}.yaml
- See
RESULT.md
and compare with your result.
- Modify
egs/callhome/v1/cmd.sh
according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html. - Modify
egs/callhome/v1/run_prepare_shared.sh
according to storage paths of your corpora.
cd egs/callhome/v1
./run_prepare_shared.sh
# If you want to conduct 1-4 speaker experiments, run below.
# You also have to set paths to your corpora properly.
./run_prepare_shared_eda.sh
./run.sh
local/run_blstm.sh
./run_eda.sh
[1] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Permutation-free Objectives," Proc. Interspeech, pp. 4300-4304, 2019
[2] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019
[3] Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, " End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. INTERSPEECH, 2020
@inproceedings{Fujita2019Interspeech,
author={Yusuke Fujita and Naoyuki Kanda and Shota Horiguchi and Kenji Nagamatsu and Shinji Watanabe},
title={{End-to-End Neural Speaker Diarization with Permutation-free Objectives}},
booktitle={Interspeech},
pages={4300--4304}
year=2019
}