Are different samples in a batch processed independently? #142

RuABraun · 2021-12-21T10:33:27Z

RuABraun
Dec 21, 2021

I'm getting different results depending on how I give the input to a model (one large matrix versus multiple smaller ones), implying the samples are not processed independently?

Answered by snakers4

Dec 21, 2021

The V3 models, even for large, long files, are to be used chunk-wise:

silero-vad/utils_vad.py

Lines 198 to 211 in f6b1294

     model.reset_states()  
   min_speech_samples = sampling_rate * min_speech_duration_ms / 1000  
   min_silence_samples = sampling_rate * min_silence_duration_ms / 1000  
   speech_pad_samples = sampling_rate * speech_pad_ms / 1000  
    
   audio_length_samples = len(audio)  
    
   speech_probs = []  
   for current_start_sample in range(0, audio_length_samples, window_size_samples):  
   chunk = audio[current_start_sample: current_start_sample + window_size_samples]  
   if len(chunk) < window_size_samples:  
   chunk = torch.nn.functional.pad(chunk, (0, int

View full answer

snakers4 · 2021-12-21T11:08:50Z

snakers4
Dec 21, 2021
Maintainer

The V3 models, even for large, long files, are to be used chunk-wise:

silero-vad/utils_vad.py

Lines 198 to 211 in f6b1294

    
           model.reset_states() 
        
           min_speech_samples = sampling_rate * min_speech_duration_ms / 1000 
        
           min_silence_samples = sampling_rate * min_silence_duration_ms / 1000 
        
           speech_pad_samples = sampling_rate * speech_pad_ms / 1000 
        
           audio_length_samples = len(audio) 
        
           speech_probs = [] 
        
           for current_start_sample in range(0, audio_length_samples, window_size_samples): 
        
               chunk = audio[current_start_sample: current_start_sample + window_size_samples] 
        
               if len(chunk) < window_size_samples: 
        
                   chunk = torch.nn.functional.pad(chunk, (0, int(window_size_samples - len(chunk)))) 
        
               speech_prob = model(chunk, sampling_rate).item() 
        
               speech_probs.append(speech_prob)

The model resets its state and then processes chunks sequentially

6 replies

snakers4 Dec 21, 2021
Maintainer

Are you saying that I should use a for loop and call model(..) for each chunk and that it is wrong to stack several chunks into one matrix and pass that?

The model was trained sequentially mostly with streaming in mind.
The model contains internal attention layers and buffers that may behave unpredictably in case of extra long inputs.
But I believe model inputs can be batched, i.e. have batch size > 1 to process several audios at once.

What is your use case?

RuABraun Dec 21, 2021
Author

"But I believe model inputs can be batched" this is my question. based on my experience the output changes depending on how you batch. which means the samples are not processed independently.

here is code

chunks = []
for i in range(0, len(wav) - step_size, step_size):
    chunk = wav[i: i + window_size]
    if len(chunk) < window_size:
        chunk = F.pad(chunk, (0, window_size - len(chunk)))
    chunks.append(chunk)
chunks = torch.stack(chunks, dim=0)
if False:  # changing to True/False to compare `outs`
    # this uses less memory
    with torch.no_grad():
        outs = []
        slice_size = 256
        for i in range(0, math.ceil(chunks.size(0) / slice_size)):
            chunk = chunks[i * slice_size: (i+1) * slice_size]
            out = model(chunk, FS)
            outs.append(out)
        outs = torch.cat(outs, dim=0)
else:
    with torch.no_grad():
        outs = model(chunks, FS)
#  `outs` has the same shape but does not contain the same values!

I would expect outs to be the same for both branches of the if statement. It is not. This can only be explained by the model not processing inputs independently. which means batching inputs is wrong.

snakers4 Dec 21, 2021
Maintainer

chunks = torch.stack(chunks, dim=0)

This means that in terms of this model you treat these chunks like starts of several different audios.
I.e. chunk number one in a batch of N separate audios.

I would expect outs to be the same for both branches of the if statement. It is not. This can only be explained by the model not processing inputs independently. which means batching inputs is wrong.

I see no contradiction here.
You should process inputs sequentially, because the VAD keeps state, and too long chunks will probably also cause errors.
But you can batch sequential chunks from different audios.

RuABraun Dec 21, 2021
Author

Ah I understand now, thank you! Is there a way to reset state?

snakers4 Dec 21, 2021
Maintainer

silero-vad/utils_vad.py

Line 198 in f6b1294

model.reset_states()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are different samples in a batch processed independently? #142

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

	model.reset_states()
	min_speech_samples = sampling_rate * min_speech_duration_ms / 1000
	min_silence_samples = sampling_rate * min_silence_duration_ms / 1000
	speech_pad_samples = sampling_rate * speech_pad_ms / 1000

	audio_length_samples = len(audio)

	speech_probs = []
	for current_start_sample in range(0, audio_length_samples, window_size_samples):
	chunk = audio[current_start_sample: current_start_sample + window_size_samples]
	if len(chunk) < window_size_samples:
	chunk = torch.nn.functional.pad(chunk, (0, int

Are different samples in a batch processed independently? #142

RuABraun Dec 21, 2021

Replies: 1 comment · 6 replies

snakers4 Dec 21, 2021 Maintainer

snakers4 Dec 21, 2021 Maintainer

RuABraun Dec 21, 2021 Author

snakers4 Dec 21, 2021 Maintainer

RuABraun Dec 21, 2021 Author

snakers4 Dec 21, 2021 Maintainer

RuABraun
Dec 21, 2021

Replies: 1 comment 6 replies

snakers4
Dec 21, 2021
Maintainer

snakers4 Dec 21, 2021
Maintainer

RuABraun Dec 21, 2021
Author

snakers4 Dec 21, 2021
Maintainer

RuABraun Dec 21, 2021
Author

snakers4 Dec 21, 2021
Maintainer