Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16 kHz checkpoint #5

Open
yxlu-0102 opened this issue Nov 8, 2023 · 6 comments
Open

16 kHz checkpoint #5

yxlu-0102 opened this issue Nov 8, 2023 · 6 comments

Comments

@yxlu-0102
Copy link

Excuse me, what bandwidth range was your 16kHz model trained on, and can it extend waveforms from 2k, 4k, and 8k to 16k?

@yoyolicoris
Copy link
Member

Hi @yxlu-0102 ,
the checkpoints I gave are for 48 kHz. We downsampled the output to 16 kHz for evaluation.
The training settings are the same as in the NU-Wave 2 paper.

@yoyolicoris
Copy link
Member

Excuse me, what bandwidth range was your 16kHz model trained on, and can it extend waveforms from 2k, 4k, and 8k to 16k?

Sorry, I misunderstood your question (was thinking about the checkpoints I gave in #4 😅 ).
The 16 kHz UDM checkpoint in the repository was trained on full bandwidth. Yeah, it can upsample speech from any sample rate < 16k to 16k. However, we found it performs badly for any rate <8k.

@aanugraha
Copy link

aanugraha commented Sep 27, 2024

If I upsample from 4 kHz to 16 kHz using the provided 16-kHz UDM model, is the following command correct? I adapted the command from here.

python -W ignore vctk_infer.py ckpt/vctk_16k_udm/saved/training_checkpoint_500000.pt ckpt/vctk_16k_udm/.hydra/config.yaml VCTK-Corpus-0.92/wav48_silence_trimmed_wav/s5 --rate 4 -T 50 --infer-type manifold --downsample-type stft --lr 0.67 --out out_vctk_infer

Strangely, I found 12-kHz input signals and 48-kHz output signals in the output directory (out_vctk_infer). Do I need to downsample the VCTK Corpus to 16 kHz beforehand?

In addition, does it mean that the 16-kHz UDM model can output 48 kHz? If so, what are the differences between the 16-kHz UDM model and the 48-kHz UDM model?

Thank you!

@yoyolicoris
Copy link
Member

Strangely, I found 12-kHz input signals and 48-kHz output signals in the output directory (out_vctk_infer). Do I need to downsample the VCTK Corpus to 16 kHz beforehand?

Exactly. That's what we did for the evaluation.

@aanugraha
Copy link

Thank you for the answer!

Could you comment on the following point?

..., what are the differences between the 16-kHz UDM model and the 48-kHz UDM model?

This would also clarify a statement in a previous comment.

The 16 kHz UDM checkpoint in the repository was trained on full bandwidth.

@yoyolicoris
Copy link
Member

The kHz difference is just the sample rate the model was trained on, so the evaluation data has to be the same rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants