[Enhancement] New fully open source TTS with steerable voice characteristics #21

phirsch · 2024-04-16T07:46:32Z

Just wanted to bring this new TTS library+model to your attention which allows voice characteristics to be steered via a separate prompt:

https://github.com/huggingface/parler-tts (impressive demos on the HF space linked there).

Afterthought: Wondering whether an LLM might be able to derive such prompts from a pure text transcript...

FlorianEagox · 2024-04-16T07:50:08Z

Ooh thanx so much for sharing this with me! I will look into it and consider integrating it if it's a good fit!

phirsch · 2024-04-16T23:55:20Z

FYI: mkiol/dsnote/issues/122 might be relevant and unfortunately limit the usefulness of this model until huggingface/parler-tts/issues/11 is fixed/implemented.

Feel free to close the issue if you prefer.

FlorianEagox · 2024-04-17T14:40:55Z

Thanks again! I'll leave it open to remember to check out this project from time to time. <3

MethanJess · 2024-05-15T15:09:35Z

@FlorianEagox there are also other really cool TTS models you could implement if you ever get the chance to

Metavoice: a very realistic and emotional tts that can also clone a voice with one shot or finetuning, but it requires at least 12gb of vram
MeloTTS: the results are kinda realistic and emotional, the audio quality is also really nice, it also is very lightweight so it can generate very long sentences in less than a second, it also has finetuning support
OpenVoice V2: Pretty much just melotts but with one shot voice cloning support, (it sounds worse than melo in my opinion). here's a demo: https://huggingface.co/spaces/myshell-ai/OpenVoiceV2

phirsch · 2024-10-21T07:20:28Z

And there is another new steerable open source model which looks promising (and even seems to support translation internally, but only EN/CN for now):

https://github.com/SWivid/F5-TTS

MethanJess · 2024-10-27T01:52:39Z

Honestly, i really loved the new GPTSoVits V2, it also has really fast generations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] New fully open source TTS with steerable voice characteristics #21

[Enhancement] New fully open source TTS with steerable voice characteristics #21

phirsch commented Apr 16, 2024

FlorianEagox commented Apr 16, 2024

phirsch commented Apr 16, 2024

FlorianEagox commented Apr 17, 2024

MethanJess commented May 15, 2024

phirsch commented Oct 21, 2024

MethanJess commented Oct 27, 2024

[Enhancement] New fully open source TTS with steerable voice characteristics #21

[Enhancement] New fully open source TTS with steerable voice characteristics #21

Comments

phirsch commented Apr 16, 2024

FlorianEagox commented Apr 16, 2024

phirsch commented Apr 16, 2024

FlorianEagox commented Apr 17, 2024

MethanJess commented May 15, 2024

phirsch commented Oct 21, 2024

MethanJess commented Oct 27, 2024