Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sannawag
Copy link

This feature delimitates regions of activation and silence (in monophonic recordings). I am submitting a pull request in case it would be useful for others as well and am very open to feedback.

The modification was added as a function in core: "predict_voicing". The function returns a sequence of 0s and 1s, depending on the predicted voicing state according to a Gaussian HMM model.

Some more details about the function and API modification: "predict_voicing" can be called independently after calling "predict", as described in the update to the documentation. It is also possible to set the "apply-voicing" flag if calling crepe from the command line. This will cause the program to call "predict_voicing", multiply the result with the "frequency" array, setting unvoiced frames to zero, and save the new array, "voiced_frequency", to disk.

@0b01
Copy link

0b01 commented May 20, 2020

More information about this method can be found in Section 2.2 from pYIN paper: https://www.eecs.qmul.ac.uk/~simond/pub/2014/MauchDixon-PYIN-ICASSP2014.pdf

@justinsalamon
Copy link
Collaborator

@sannawag @0b01 CREPE already supports viterbi decoding: crepe.predict(audio, sr, viterbi=True). For voicing activation we've found that a simple threshold on the returned voicing confidence values work well (where the confidence value is given by the maximum activation value in the activation matrix for each frame).

@sannawag could you perhaps elaborate a little more about why this feature is needed and how it differs from what's already supported?

Thanks!

@sannawag
Copy link
Author

sannawag commented Jun 3, 2020

Thanks for your response! @justinsalamon @0b01 My goal is to use the output of the program to determine when the singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. If I set a hard threshold based on confidence, I get quick alternation between the two states, as can be seen in the thick vertical lines in the plots below (with --viterbi flag set to true). That's what I hope to smooth out using Viterbi.

Screen Shot 2020-06-02 at 4 59 50 PM

Screen Shot 2020-06-02 at 5 15 31 PM

Code for this plot:

import csv
import matplotlib.pyplot as plt
import numpy as np

f0 = []
conf = []
thresh = 0.5

with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            f0.append(float(row[1]))
            conf.append(float(row[2]))
            line_count += 1
    print(f'Processed {line_count} lines.')

voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()

@justinsalamon
Copy link
Collaborator

thanks @sannawag I'll have to give this a closer look, so it might take some time before I can give more feedback.

As a general note it's helpful to first post an issue to discuss the problem, solution and implementation details before making a PR, so we can reach consensus on those things prior to implementation. It's out fault for not providing contribution guidelines (to be amended via #58).

Could you please open an issue, explain what the problem is (as you have via these plots and code), your proposed solution, and cross reference this PR?

Thanks!

@sannawag
Copy link
Author

sannawag commented Jun 8, 2020

Thanks for the feedback, @justinsalamon! I've submitted an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants