Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Automatic Trimming #2

Closed
torridgristle opened this issue Sep 6, 2019 · 3 comments
Closed

Feature Request: Automatic Trimming #2

torridgristle opened this issue Sep 6, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@torridgristle
Copy link

There are some samples of voices that have a long tail or just plain silence after the word, it seems that detecting an audio level below a threshold and marking this as a tail would be useful so that when it's used in the middle of a word or sentence the tail can be cut off, but when it's at the end of a sentence it's preserved.

Also there's some long vowel sounds that could be automatically chopped down just fine with the cross-fading in Audition. I think a good way to handle this would be to align all the parts of words to a user specified BPM since people tend to speak with a rhythm and cut it to divisions of a beat, but you'd likely need to have a way to index the lengths of all the different parts of words so that di-ff-er-in-tuh isn't pieced together slowly, so maybe not a great idea. FL Studio's Speech Synthesizer, based on DEC Talk or KlattTalk, has a BPM option and it sounds pretty natural given the quality of speech synthesis.

@MysteryPancake MysteryPancake self-assigned this Sep 7, 2019
@MysteryPancake MysteryPancake added the enhancement New feature or request label Sep 7, 2019
@MysteryPancake
Copy link
Owner

MysteryPancake commented Sep 7, 2019

Silence removal and better speech awareness is definitely something I want to add. I am also hoping to add an option to detect and adjust the pitch of each segment to maintain consistency (like Adobe Voco).

The best results usually come from aligning to another speech sample, as this ensures decent timing. Adobe Voco accomplishes this by aligning to a speech synthesizer. The transcript-only mode does not align to a speech synthesizer by default, rather choosing the longest phones by default (this is why di-ff-er-in-tuh is pieced together slowly). This can be altered with the "Choose Method" dropdown, and an alignment file can be specified in "Destination" section.

At the moment the only aligners which work on the website are Gentle and WebMAUS (by default it requests to Gentle, but it's easier to use Gentle directly so there is no need for a proxy). The JSON file from Gentle can be uploaded to the website in the "Destination" section, which often gives better results as it has decent timing information.

I'm really surprised you managed to work out these problems already! I hope to improve this a lot more so thank you for the feedback!

@MysteryPancake
Copy link
Owner

MysteryPancake commented Sep 7, 2019

If you're interested, I am currently working on an audio editor which is supposed to act as an online replacement for Adobe Audition. It's still very buggy with lots of missing features and there is currently no way to align from this page, but eventually the old website will be replaced with this editor.

I hope to add a bunch of features such as silence removal, crossfading, stretching, pitching, but also options to easily swap out words and phones with alternatives.

@MysteryPancake
Copy link
Owner

I have added my todo list to the issues section, as it was previously on a text file.
As this issue is on the list at the moment, I will close it for now as I work down the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants