Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

sannawag · 2018-08-30T14:06:23Z

This feature delimitates regions of activation and silence (in monophonic recordings). I am submitting a pull request in case it would be useful for others as well and am very open to feedback.

The modification was added as a function in core: "predict_voicing". The function returns a sequence of 0s and 1s, depending on the predicted voicing state according to a Gaussian HMM model.

Some more details about the function and API modification: "predict_voicing" can be called independently after calling "predict", as described in the update to the documentation. It is also possible to set the "apply-voicing" flag if calling crepe from the command line. This will cause the program to call "predict_voicing", multiply the result with the "frequency" array, setting unvoiced frames to zero, and save the new array, "voiced_frequency", to disk.

using HMM model, similar to pYIN algorithm

0b01 · 2020-05-20T01:48:29Z

More information about this method can be found in Section 2.2 from pYIN paper: https://www.eecs.qmul.ac.uk/~simond/pub/2014/MauchDixon-PYIN-ICASSP2014.pdf

justinsalamon · 2020-06-02T19:10:32Z

@sannawag @0b01 CREPE already supports viterbi decoding: crepe.predict(audio, sr, viterbi=True). For voicing activation we've found that a simple threshold on the returned voicing confidence values work well (where the confidence value is given by the maximum activation value in the activation matrix for each frame).

@sannawag could you perhaps elaborate a little more about why this feature is needed and how it differs from what's already supported?

Thanks!

sannawag · 2020-06-03T00:12:36Z

Thanks for your response! @justinsalamon @0b01 My goal is to use the output of the program to determine when the singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. If I set a hard threshold based on confidence, I get quick alternation between the two states, as can be seen in the thick vertical lines in the plots below (with --viterbi flag set to true). That's what I hope to smooth out using Viterbi.

Code for this plot:

import csv
import matplotlib.pyplot as plt
import numpy as np

f0 = []
conf = []
thresh = 0.5

with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            f0.append(float(row[1]))
            conf.append(float(row[2]))
            line_count += 1
    print(f'Processed {line_count} lines.')

voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()

justinsalamon · 2020-06-03T19:10:14Z

thanks @sannawag I'll have to give this a closer look, so it might take some time before I can give more feedback.

As a general note it's helpful to first post an issue to discuss the problem, solution and implementation details before making a PR, so we can reach consensus on those things prior to implementation. It's out fault for not providing contribution guidelines (to be amended via #58).

Could you please open an issue, explain what the problem is (as you have via these plots and code), your proposed solution, and cross reference this PR?

Thanks!

sannawag · 2020-06-08T02:58:53Z

Thanks for the feedback, @justinsalamon! I've submitted an issue.

Sanna Wager and others added 5 commits August 29, 2018 21:54

Added voiced versus unvoiced frame prediction

fc2ddb7

using HMM model, similar to pYIN algorithm

Update README.md

a1c5254

changed variable name

a29c8c6

Update README.md

adee92c

Merge branch 'master' of https://github.com/sannawag/crepe

c77b97d

sannawag mentioned this pull request Jun 8, 2020

Viterbi algorithm does not apply to activation probabilities #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

sannawag commented Aug 30, 2018

0b01 commented May 20, 2020

justinsalamon commented Jun 2, 2020

sannawag commented Jun 3, 2020 •

edited

Loading

justinsalamon commented Jun 3, 2020

sannawag commented Jun 8, 2020

Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

Are you sure you want to change the base?

Apply Viterbi algorithm to predict voiced/unvoiced state of every state based on confidence array #26

Conversation

sannawag commented Aug 30, 2018

0b01 commented May 20, 2020

justinsalamon commented Jun 2, 2020

sannawag commented Jun 3, 2020 • edited Loading

justinsalamon commented Jun 3, 2020

sannawag commented Jun 8, 2020

sannawag commented Jun 3, 2020 •

edited

Loading