Skip to content

Latest commit

 

History

History
467 lines (417 loc) · 12.2 KB

RunWhisperSegAsWebService.md

File metadata and controls

467 lines (417 loc) · 12.2 KB

Run WhisperSeg as a Web Service

Runing WhisperSeg as a Web service make it possible to disentangle the environment of the WhisperSeg and the environment where this segmenting function is called. For example, we can set up a WhisperSeg segmenting service at one machine, and call the segmenting service in different working environment (Matlab, Webpage frontend, Jupyter Notebook) at different physical locations.

This enables an easy implementation of calling WhisperSeg in Matlab and is essential for setting up a Web page for automatic vocal segmentation.

Step 1: Starting the segmenting service

In a terminal, go to the main folder of this repository, and run the following command:

python segment_service.py --flask_port 8050 --model_path nccratliri/whisperseg-large-ms-ct2 --device cuda

Illustration of the parameters:

  • flask_port: the port that this service will keep listening to. Requests that are sent to this port will be handled by this service
  • model_path: the path to the WhisperSeg model. This model can either be original huggingface model, e.g., nccratliri/whisperseg-large-ms, or CTranslate converted model, e.g., nccratliri/whisperseg-large-ms-ct2. If you choose to use the Ctranslate converted model, please make sure the converted model exists. If you have a different trained WhisperSeg checkpoint, replace "nccratliri/whisperseg-large-ms-ct2" with the path to the checkpoint.
  • device: where to run the WhisperSeg. It can be cuda or cpu. By default we run the model on cuda

Note: The terminal that runs this service needs to be kept open. On Linux system's terminal, one can first create a new screen and run the service in the created screen, to allow the service runing in the background.

Step 2: Calling the segmenting service

call the segmenting service in python:

For example, we are segmenting a zebra finch recording:

import requests,json,base64
import pandas as pd
import librosa

## define a function for segmentation
def call_segment_service( service_address, 
                          audio_file_path,
                          sr = None,
                          channel_id = 0,
                          min_frequency=None,
                          spec_time_step=None,
                          min_segment_length=None,
                          eps=None,
                          num_trials=3,
                          adobe_audition_compatible=False
                        ):
    if sr is None:
        sr = librosa.get_samplerate(audio_file_path)
    audio_file_base64_string = base64.b64encode( open(audio_file_path, 'rb').read()).decode('ASCII')
    response = requests.post( service_address,
                              data = json.dumps( {
                                  "audio_file_base64_string":audio_file_base64_string,
                                  "channel_id":channel_id,
                                  "sr":sr,
                                  "min_frequency":min_frequency,
                                  "spec_time_step":spec_time_step,
                                  "min_segment_length":min_segment_length,
                                  "eps":eps,
                                  "num_trials":num_trials,
                                  "adobe_audition_compatible":adobe_audition_compatible
                              } ),
                              headers = {"Content-Type": "application/json"}
                            )
    return response.json()

Note (Important):

  1. Runing the above code does not require any further dependencies or load any models
  2. The service_address is composed of SEGMENTING_SERVER_IP_ADDRESS + ":" + FLASK_PORT_NUMBER + "/segment". If the server is running in the local machine, then the SEGMENTING_SERVER_IP_ADDRESS is "http://localhost", otherwise, you will need to know the IP address of the server machine.
  3. channel_id is useful when the input audio file has multiple channels. In this case, channel_id can be used to specify which channel to segment. By default channel_id = 0, which means the first channel is used for segmentation.
  4. The parameter adobe_audition_compatible is used to control the returned segmentation results format. If adobe_audition_compatible=1, the returned segmentation result is a dictionary that is comptible with Adobe Audition. This means after converting the dictionary to a Dataframe and then to a csv file, this csv file can be directly loaded into Adobe Audition. If adobe_audition_compatible=0, the segmentation result is a simple dictionary containing only "onset", "offset" and "cluster".

Get the Adobe Audition compitible segmentation results

prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          adobe_audition_compatible = True
                        )
## we can convert the returned dictionary into a pandas Dataframe
df = pd.DataFrame(prediction)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Name Start Duration Time Format Type Description
0 0:00.010 0:00.063 decimal Cue
1 0:00.380 0:00.067 decimal Cue
2 0:00.603 0:00.070 decimal Cue
3 0:00.758 0:00.074 decimal Cue
4 0:00.912 0:00.571 decimal Cue
5 0:01.812 0:00.070 decimal Cue
6 0:01.963 0:00.074 decimal Cue
7 0:02.073 0:00.570 decimal Cue
8 0:02.840 0:00.053 decimal Cue
9 0:02.982 0:00.081 decimal Cue
10 0:03.112 0:00.171 decimal Cue
11 0:03.668 0:00.074 decimal Cue
12 0:03.828 0:00.070 decimal Cue
13 0:03.953 0:00.570 decimal Cue
14 0:05.158 0:00.065 decimal Cue
15 0:05.323 0:00.070 decimal Cue
16 0:05.468 0:00.575 decimal Cue

We can save the df to the Adobe Audition compitible csv by (note: index = False, sep="\t" is necessary!):

df.to_csv( "prediction_result.csv", index = False, sep="\t")

Get the simple segmentation results

prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          adobe_audition_compatible = False
                        )
## we can convert the returned dictionary into a pandas Dataframe
df = pd.DataFrame(prediction)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
onset offset cluster
0 0.010 0.073 vocal
1 0.380 0.447 vocal
2 0.603 0.673 vocal
3 0.758 0.832 vocal
4 0.912 1.483 vocal
5 1.812 1.882 vocal
6 1.963 2.037 vocal
7 2.073 2.643 vocal
8 2.840 2.893 vocal
9 2.982 3.063 vocal
10 3.112 3.283 vocal
11 3.668 3.742 vocal
12 3.828 3.898 vocal
13 3.953 4.523 vocal
14 5.158 5.223 vocal
15 5.323 5.393 vocal
16 5.468 6.043 vocal

call the segmenting service in MATLAB:

First define a matlab function

function response = call_segment_service(service_address, audio_file_path, sr, channel_id, min_frequency, spec_time_step, min_segment_length, eps, num_trials, adobe_audition_compatible)

    fileID = fopen(audio_file_path, 'r');
    fileData = fread(fileID, inf, 'uint8=>uint8');

    audio_file_base64_string = matlab.net.base64encode( fileData );
    data = struct('audio_file_base64_string', audio_file_base64_string, ...
                  "channel_id", channel_id, ...
                  "sr", sr, ...
                  "min_frequency", min_frequency, ...
                  "spec_time_step", spec_time_step, ...
                  "min_segment_length", min_segment_length, ...
                  "eps", eps, ...
                  "num_trials", num_trials, ... 
                  "adobe_audition_compatible", adobe_audition_compatible );
    jsonData = jsonencode(data);

    options = weboptions( 'RequestMethod', 'POST', 'MediaType', 'application/json'  );
    response = webwrite(service_address, jsonData, options);

end

Then call the matlab function in MATLAB console:

prediction = prediction = call_segment_service( 'http://localhost:8050/segment', '/Users/meilong/Downloads/zebra_finch_g17y2U-f00007.wav', 32000, 0, 0, 0.0025, 0.01, 0.02, 3, 0 )
disp(prediction)