Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Decoding from the categorical posterior distribution #38

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maxrmorrison
Copy link

Implements the proposed alternative to posterior decoding of the network output as suggested in #37.

Remaining tasks:

  • Test on original data. My tests have all been conducted on speech data. This proposed alternative should be run on the same data as the original continuous decoding for fair comparison.
  • I took the phrasing "a PR that replaces hmmlearn with librosa.sequence.viterbi" literally—so the hmmlearn decoding has been removed. Based on the performance comparison, we may want to allow both decoding options.

@justinsalamon
Copy link
Collaborator

@maxrmorrison would it be possible to carve out the relevant viterbi code only, so as to avoid having the entirety of librosa as a (heavy) dependency?

@jongwook
Copy link
Member

jongwook commented Sep 1, 2019

I wouldn't mind adding librosa as a dependency

@justinsalamon
Copy link
Collaborator

@jongwook what's your justification? The greater the dependency chain, the greater the technical debt (=more maintenance), and the greater the chances of users running into issues during installation. It also inflates the storage space requires and limits deployment on less standard/embedded platforms if some dependencies don't support these.

It seems like all the sequence modelling code is concentrated in one source file, why not extract only what we need?

@maxrmorrison
Copy link
Author

@justinsalamon we can further narrow the dependency requirement to just two functions (librosa.sequence.viterbi and librosa.sequence._viterbi) and a custom error class (librosa.util.exceptions.ParameterError). Those 3 have no other dependencies within librosa. I would be okay with extracting them. Where would be the best place to give the proper attribution that those functions and exceptions had come from librosa?

@jongwook
Copy link
Member

jongwook commented Sep 2, 2019

@justinsalamon Justifications:

Since we already depend on resampy, I don't think adding librosa would be a substantial roadblock anywhere especially when we require the whole TensorFlow and Keras too. TensorFlow itself is ~500MB while a librosa install is just 2.6 MB. In comparison, crepe takes up ~160 MB because of the pretrained weights.

In less standard/embedded platforms, it'll require special treatment of dealing with TensorFlow anyway, and I don't think there is a platform where librosa can't run while TensorFlow runs without modification.

Since librosa is a de-facto standard library of audio analysis in Python, I expect many "customers" of crepe would already have librosa.

In terms of maintenance, by duplicating librosa code we take the responsibility of backporting any future issues of librosa's Viterbi code. So it's debatable which creates more "technical debt".

@justinsalamon
Copy link
Collaborator

I'd be in favor of carving out only the needed functionality, but @jongwook it's your final call.

@jongwook
Copy link
Member

jongwook commented Sep 5, 2019

It appears that llvmlite inherited via numba sometimes can be an issue on ARM and Windows hosts.

I personally had no problem installing librosa on Windows and a Raspberry Pi, so I suspect those issues are isolated ones. Meanwhile, we rely on numba via resampy anyway, and easy alternatives like Scipy's resampling methods have many drawbacks.

Also, because of my intentional choice of not using librosa, we used scipy.wavfile and supported WAV files only because of that. So using librosa we can just use librosa.load(file, sr=16000) to support a wide variety of file formats.

So I'd like to keep the current PR which calls librosa.sequence.viterbi, to keep the separation of concerns and avoid code duplication. Sorry about putting on the dictator hat @justinsalamon !

@maxrmorrison Thanks for the PR! The code looks good to me, but let me confirm that it definitely improves on the RWC-synth dataset compared to hmmlearn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants