-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26
base: master
Are you sure you want to change the base?
Conversation
using HMM model, similar to pYIN algorithm
More information about this method can be found in Section 2.2 from pYIN paper: https://www.eecs.qmul.ac.uk/~simond/pub/2014/MauchDixon-PYIN-ICASSP2014.pdf |
@sannawag @0b01 CREPE already supports viterbi decoding: @sannawag could you perhaps elaborate a little more about why this feature is needed and how it differs from what's already supported? Thanks! |
Thanks for your response! @justinsalamon @0b01 My goal is to use the output of the program to determine when the singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. If I set a hard threshold based on confidence, I get quick alternation between the two states, as can be seen in the thick vertical lines in the plots below (with Code for this plot:
|
thanks @sannawag I'll have to give this a closer look, so it might take some time before I can give more feedback. As a general note it's helpful to first post an issue to discuss the problem, solution and implementation details before making a PR, so we can reach consensus on those things prior to implementation. It's out fault for not providing contribution guidelines (to be amended via #58). Could you please open an issue, explain what the problem is (as you have via these plots and code), your proposed solution, and cross reference this PR? Thanks! |
Thanks for the feedback, @justinsalamon! I've submitted an issue. |
This feature delimitates regions of activation and silence (in monophonic recordings). I am submitting a pull request in case it would be useful for others as well and am very open to feedback.
The modification was added as a function in core: "predict_voicing". The function returns a sequence of 0s and 1s, depending on the predicted voicing state according to a Gaussian HMM model.
Some more details about the function and API modification: "predict_voicing" can be called independently after calling "predict", as described in the update to the documentation. It is also possible to set the "apply-voicing" flag if calling crepe from the command line. This will cause the program to call "predict_voicing", multiply the result with the "frequency" array, setting unvoiced frames to zero, and save the new array, "voiced_frequency", to disk.