-
Hi! I'm just trying out several tools for trying to organize my music, as for working I prefer instrumental music only (I get carried by the singing and get distracted) I have been trying silero-vad in Python, however I'm not getting good results, for example, a song like this one (Some Star-Warsy-latino-cumbia): https://www.youtube.com/watch?v=3awbR_EzYr8 , will show an empty list returned by speech_timestamps, same with children songs where many kids are singing, and so on. So, maybe I'm doing something wrong? I first convert audio with ffmpeg/sox-resampler to mono, 16ch. I'm running the example "get speech timestamps from full audio file" with my own files. As I don't know a thing about these models, I'm guessing voice pitch like in the video above is not picked up as regular voice, same with children voice? Thanks for any insight! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi, you can do several things:
These VADs are designed for streaming. Non-steaming VADs typically work better on such domains, but we are not decided if we should publish a non-steaming VAD as well. |
Beta Was this translation helpful? Give feedback.
Hi, you can do several things:
v4
VAD - it works better with very noisy inputs;visualize_probs=True
) and see how it behaves overall;These VADs are designed for streaming. Non-steaming VADs typically work better on such domains, but we are not decided if we should publish a non-steaming VAD as well.