This repository is for supplementing the paper, "FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning", INTERSPEECH 2022.
-
Download the model checkpoint to perform knowledge distillation (e.g. HuBERT Base):
-
Download the LibriSpeech dataset.
Modify the configuration file in /data/conf/
. The configuration file fithubert.yaml
contains all the settings for reproducing FitHuBERT. Set the path to the teacher model checkpoint at teacher_model
, and the root path to the LibriSpeech dataset at libri_root
.
Then, run the following command:
python train.py --config ./data/conf/fithubert.yaml
After training, the model checkpoints and the corresponding configuration file will be created at /results/pretrain/
.
-
Download and install the S3PRL toolkit.
-
Copy the
fithubert
folder intos3prl/upstream/
. -
Run the following the command to use the FitHuBERT model for automatic speech recognition(ASR).
python run_downstream.py -m train -n FitHuBERT-ASR -u fithubert -d asr -s last_hidden_state -k <path to .ckpt file> -g <path to .yaml file>
Refer to the SUPERB docs for more information on usage details and data preparation.
For our checkpoints, check below links!
- FitHuBERT-100h
- FitHuBERT-960h
- FitW2V2-960h
To cite our paper:
@article{lee2022fithubert,
title={FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning},
author={Lee, Yeonghyeon and Jang, Kangwook and Goo, Jahyun and Jung, Youngmoon and Kim, Hoirin},
journal={arXiv preprint arXiv:2207.00555},
year={2022}
}