Suitable for detecting speech/singing in music? #546

rokezu · 2024-10-05T19:52:11Z

rokezu
Oct 5, 2024

Hi! I'm just trying out several tools for trying to organize my music, as for working I prefer instrumental music only (I get carried by the singing and get distracted) I have been trying silero-vad in Python, however I'm not getting good results, for example, a song like this one (Some Star-Warsy-latino-cumbia): https://www.youtube.com/watch?v=3awbR_EzYr8 , will show an empty list returned by speech_timestamps, same with children songs where many kids are singing, and so on.

So, maybe I'm doing something wrong? I first convert audio with ffmpeg/sox-resampler to mono, 16ch.

I'm running the example "get speech timestamps from full audio file" with my own files. As I don't know a thing about these models, I'm guessing voice pitch like in the video above is not picked up as regular voice, same with children voice? Thanks for any insight!

Answered by snakers4

Oct 6, 2024

Hi, you can do several things:

Try a v4 VAD - it works better with very noisy inputs;
Plot a chart with a built in function (pass a parameter visualize_probs=True) and see how it behaves overall;

These VADs are designed for streaming. Non-steaming VADs typically work better on such domains, but we are not decided if we should publish a non-steaming VAD as well.

View full answer

snakers4 · 2024-10-06T03:54:55Z

snakers4
Oct 6, 2024
Maintainer

Hi, you can do several things:

Try a v4 VAD - it works better with very noisy inputs;
Plot a chart with a built in function (pass a parameter visualize_probs=True) and see how it behaves overall;

These VADs are designed for streaming. Non-steaming VADs typically work better on such domains, but we are not decided if we should publish a non-steaming VAD as well.

1 reply

dgoryeo Oct 14, 2024

Hi @snakers4 , what are examples of Non-steaming VADs? Is Pyannote considered a Non-steaming VAD? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suitable for detecting speech/singing in music? #546

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Suitable for detecting speech/singing in music? #546

rokezu Oct 5, 2024

Replies: 1 comment · 1 reply

snakers4 Oct 6, 2024 Maintainer

dgoryeo Oct 14, 2024

rokezu
Oct 5, 2024

Replies: 1 comment 1 reply

snakers4
Oct 6, 2024
Maintainer