Inquiry Regarding Dataset and Training Details for Ganji's Voice in the Pertts-Streamlit Project #9

AliYazdanifar · 2024-10-19T07:46:30Z

AliYazdanifar
Oct 19, 2024

Hello,
First and foremost, I would like to express my sincere appreciation to the Datacula team and dear Sadegh for all their hard work.
Upon reviewing your project, I came across a few questions that I hope will help me contribute more effectively to the development of the pertts-streamlit project on Git, as well as progress with my own project.

Based on my test of https://tts.datacula.com/, it appears that Mr. Ganji’s voice sample demonstrates higher quality compared to Amir’s, with most sentences being delivered more clearly. Considering this, I reviewed the dataset you have made available for the project and noticed that while Amir’s dataset was included, Ganji’s dataset was not present.

From my examination of Amir’s dataset, it seems to contain around 10 hours of audio, with the texts seemingly having been corrected using a tool (the texts differed from the transcriptions), which raised a few questions. I would greatly appreciate it if you could kindly provide some clarification on the following:

How many hours of data were used for Ganji’s dataset, and how extensive is the data?
To what step was the dataset trained to achieve this level of quality?
For generating the audiobook dataset, was the same method used as for Amir’s dataset?
Would it be possible to release Ganji’s dataset for further review?
Were any different parameters used in the VITS training for Ganji’s dataset?
Was the training conducted using Coqui’s method, or were other methods, such as ESPnet, employed?

Thank you very much in advance for your time and assistance. I look forward to your response.

SadeghKrmi · 2024-10-20T12:15:43Z

SadeghKrmi
Oct 20, 2024
Maintainer

سلام و دورد بر شما، تشکر از نظر مثبت شما نسبت به این پروژه
در جواب به سوال های شما:

How many hours of data were used for Ganji’s dataset, and how extensive is the data?
دیتاست گنجی، متشکل از حدود 8 ساعت داده های آموزشی بود
To what step was the dataset trained to achieve this level of quality?

آموزش مدل گنجی چون روی مدل قبلی "امیر" انجام میشد، حدود بعد از 2000 epoch به نتیجۀ دلخواهِ من رسید، درواقع باید خطای داد های آموزشی رو شما بررسی کنید

For generating the audiobook dataset, was the same method used as for Amir’s dataset?
خیر، برای تولید دیتاست گنجی، در قالب یک پروژه برای تبدیل گفتار به متن از چندین نفر کمک گرفته شد (پیر شدم بخدا سرِ این دیتاست)
Would it be possible to release Ganji’s dataset for further review?

متاسفانه خیر، چون به تنهایی بنده در این دیتاست مشارکت نداشتم و طبق صحبت های هم تیمی های بنده، اجازه انتشار این دیتاست رو ندارم.

Were any different parameters used in the VITS training for Ganji’s dataset?
خیر، پارامترهای استفاده شده همون پارامتر های مدل قبلی بو
Was the training conducted using Coqui’s method, or were other methods, such as ESPnet, employed?

من ترینیتگ رو با روش vits از piper استفاده کردم، بقیه روش ها رو هم امتحان کردم، مثل coqui ولی خب حداقل من نتیجۀ خوبی نگرفتم (البته روش های متعدد تری داره coqui )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding Dataset and Training Details for Ganji's Voice in the Pertts-Streamlit Project #9

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Inquiry Regarding Dataset and Training Details for Ganji's Voice in the Pertts-Streamlit Project #9

AliYazdanifar Oct 19, 2024

Replies: 1 comment

SadeghKrmi Oct 20, 2024 Maintainer

AliYazdanifar
Oct 19, 2024

SadeghKrmi
Oct 20, 2024
Maintainer