Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] New fully open source TTS with steerable voice characteristics #21

Open
phirsch opened this issue Apr 16, 2024 · 6 comments

Comments

@phirsch
Copy link

phirsch commented Apr 16, 2024

Just wanted to bring this new TTS library+model to your attention which allows voice characteristics to be steered via a separate prompt:

https://github.com/huggingface/parler-tts (impressive demos on the HF space linked there).

Afterthought: Wondering whether an LLM might be able to derive such prompts from a pure text transcript...

@FlorianEagox
Copy link
Owner

Ooh thanx so much for sharing this with me! I will look into it and consider integrating it if it's a good fit!

@phirsch
Copy link
Author

phirsch commented Apr 16, 2024

FYI: mkiol/dsnote/issues/122 might be relevant and unfortunately limit the usefulness of this model until huggingface/parler-tts/issues/11 is fixed/implemented.

Feel free to close the issue if you prefer.

@FlorianEagox
Copy link
Owner

Thanks again! I'll leave it open to remember to check out this project from time to time. <3

@MethanJess
Copy link

@FlorianEagox there are also other really cool TTS models you could implement if you ever get the chance to

  • Metavoice: a very realistic and emotional tts that can also clone a voice with one shot or finetuning, but it requires at least 12gb of vram
  • MeloTTS: the results are kinda realistic and emotional, the audio quality is also really nice, it also is very lightweight so it can generate very long sentences in less than a second, it also has finetuning support
  • OpenVoice V2: Pretty much just melotts but with one shot voice cloning support, (it sounds worse than melo in my opinion). here's a demo: https://huggingface.co/spaces/myshell-ai/OpenVoiceV2

@phirsch
Copy link
Author

phirsch commented Oct 21, 2024

And there is another new steerable open source model which looks promising (and even seems to support translation internally, but only EN/CN for now):

https://github.com/SWivid/F5-TTS

@MethanJess
Copy link

Honestly, i really loved the new GPTSoVits V2, it also has really fast generations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants