16 kHz checkpoint #5

yxlu-0102 · 2023-11-08T06:34:24Z

Excuse me, what bandwidth range was your 16kHz model trained on, and can it extend waveforms from 2k, 4k, and 8k to 16k?

yoyolicoris · 2023-11-08T07:40:27Z

Hi @yxlu-0102 ,
the checkpoints I gave are for 48 kHz. We downsampled the output to 16 kHz for evaluation.
The training settings are the same as in the NU-Wave 2 paper.

yoyolicoris · 2023-12-26T13:23:20Z

Excuse me, what bandwidth range was your 16kHz model trained on, and can it extend waveforms from 2k, 4k, and 8k to 16k?

Sorry, I misunderstood your question (was thinking about the checkpoints I gave in #4 😅 ).
The 16 kHz UDM checkpoint in the repository was trained on full bandwidth. Yeah, it can upsample speech from any sample rate < 16k to 16k. However, we found it performs badly for any rate <8k.

aanugraha · 2024-09-27T06:14:02Z

If I upsample from 4 kHz to 16 kHz using the provided 16-kHz UDM model, is the following command correct? I adapted the command from here.

python -W ignore vctk_infer.py ckpt/vctk_16k_udm/saved/training_checkpoint_500000.pt ckpt/vctk_16k_udm/.hydra/config.yaml VCTK-Corpus-0.92/wav48_silence_trimmed_wav/s5 --rate 4 -T 50 --infer-type manifold --downsample-type stft --lr 0.67 --out out_vctk_infer

Strangely, I found 12-kHz input signals and 48-kHz output signals in the output directory (out_vctk_infer). Do I need to downsample the VCTK Corpus to 16 kHz beforehand?

In addition, does it mean that the 16-kHz UDM model can output 48 kHz? If so, what are the differences between the 16-kHz UDM model and the 48-kHz UDM model?

Thank you!

yoyolicoris · 2024-09-27T10:12:22Z

Strangely, I found 12-kHz input signals and 48-kHz output signals in the output directory (out_vctk_infer). Do I need to downsample the VCTK Corpus to 16 kHz beforehand?

Exactly. That's what we did for the evaluation.

aanugraha · 2024-09-28T00:01:28Z

Thank you for the answer!

Could you comment on the following point?

..., what are the differences between the 16-kHz UDM model and the 48-kHz UDM model?

This would also clarify a statement in a previous comment.

The 16 kHz UDM checkpoint in the repository was trained on full bandwidth.

yoyolicoris · 2024-09-28T02:01:27Z

The kHz difference is just the sample rate the model was trained on, so the evaluation data has to be the same rate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

16 kHz checkpoint #5

16 kHz checkpoint #5

yxlu-0102 commented Nov 8, 2023

yoyolicoris commented Nov 8, 2023

yoyolicoris commented Dec 26, 2023

aanugraha commented Sep 27, 2024 •

edited

Loading

yoyolicoris commented Sep 27, 2024

aanugraha commented Sep 28, 2024

yoyolicoris commented Sep 28, 2024

16 kHz checkpoint #5

16 kHz checkpoint #5

Comments

yxlu-0102 commented Nov 8, 2023

yoyolicoris commented Nov 8, 2023

yoyolicoris commented Dec 26, 2023

aanugraha commented Sep 27, 2024 • edited Loading

yoyolicoris commented Sep 27, 2024

aanugraha commented Sep 28, 2024

yoyolicoris commented Sep 28, 2024

aanugraha commented Sep 27, 2024 •

edited

Loading