-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Automatic Trimming #2
Comments
Silence removal and better speech awareness is definitely something I want to add. I am also hoping to add an option to detect and adjust the pitch of each segment to maintain consistency (like Adobe Voco). The best results usually come from aligning to another speech sample, as this ensures decent timing. Adobe Voco accomplishes this by aligning to a speech synthesizer. The transcript-only mode does not align to a speech synthesizer by default, rather choosing the longest phones by default (this is why di-ff-er-in-tuh is pieced together slowly). This can be altered with the "Choose Method" dropdown, and an alignment file can be specified in "Destination" section. At the moment the only aligners which work on the website are Gentle and WebMAUS (by default it requests to Gentle, but it's easier to use Gentle directly so there is no need for a proxy). The JSON file from Gentle can be uploaded to the website in the "Destination" section, which often gives better results as it has decent timing information. I'm really surprised you managed to work out these problems already! I hope to improve this a lot more so thank you for the feedback! |
If you're interested, I am currently working on an audio editor which is supposed to act as an online replacement for Adobe Audition. It's still very buggy with lots of missing features and there is currently no way to align from this page, but eventually the old website will be replaced with this editor. I hope to add a bunch of features such as silence removal, crossfading, stretching, pitching, but also options to easily swap out words and phones with alternatives. |
I have added my todo list to the issues section, as it was previously on a text file. |
There are some samples of voices that have a long tail or just plain silence after the word, it seems that detecting an audio level below a threshold and marking this as a tail would be useful so that when it's used in the middle of a word or sentence the tail can be cut off, but when it's at the end of a sentence it's preserved.
Also there's some long vowel sounds that could be automatically chopped down just fine with the cross-fading in Audition. I think a good way to handle this would be to align all the parts of words to a user specified BPM since people tend to speak with a rhythm and cut it to divisions of a beat, but you'd likely need to have a way to index the lengths of all the different parts of words so that di-ff-er-in-tuh isn't pieced together slowly, so maybe not a great idea. FL Studio's Speech Synthesizer, based on DEC Talk or KlattTalk, has a BPM option and it sounds pretty natural given the quality of speech synthesis.
The text was updated successfully, but these errors were encountered: