Run WhisperSeg as a Web Service

Runing WhisperSeg as a Web service make it possible to disentangle the environment of the WhisperSeg and the environment where this segmenting function is called. For example, we can set up a WhisperSeg segmenting service at one machine, and call the segmenting service in different working environment (Matlab, Webpage frontend, Jupyter Notebook) at different physical locations.

This enables an easy implementation of calling WhisperSeg in Matlab and is essential for setting up a Web page for automatic vocal segmentation.

Step 1: Starting the segmenting service

In a terminal, go to the main folder of this repository, and run the following command:

python segment_service.py --flask_port 8050 --model_path nccratliri/whisperseg-large-ms-ct2 --device cuda

Illustration of the parameters:

flask_port: the port that this service will keep listening to. Requests that are sent to this port will be handled by this service
model_path: the path to the WhisperSeg model. This model can either be original huggingface model, e.g., nccratliri/whisperseg-large-ms, or CTranslate converted model, e.g., nccratliri/whisperseg-large-ms-ct2. If you choose to use the Ctranslate converted model, please make sure the converted model exists. If you have a different trained WhisperSeg checkpoint, replace "nccratliri/whisperseg-large-ms-ct2" with the path to the checkpoint.
device: where to run the WhisperSeg. It can be cuda or cpu. By default we run the model on cuda

Note: The terminal that runs this service needs to be kept open. On Linux system's terminal, one can first create a new screen and run the service in the created screen, to allow the service runing in the background.

Step 2: Calling the segmenting service

call the segmenting service in python:

For example, we are segmenting a zebra finch recording:

import requests,json,base64
import pandas as pd
import librosa

## define a function for segmentation
def call_segment_service( service_address, 
                          audio_file_path,
                          sr = None,
                          channel_id = 0,
                          min_frequency=None,
                          spec_time_step=None,
                          min_segment_length=None,
                          eps=None,
                          num_trials=3,
                          adobe_audition_compatible=False
                        ):
    if sr is None:
        sr = librosa.get_samplerate(audio_file_path)
    audio_file_base64_string = base64.b64encode( open(audio_file_path, 'rb').read()).decode('ASCII')
    response = requests.post( service_address,
                              data = json.dumps( {
                                  "audio_file_base64_string":audio_file_base64_string,
                                  "channel_id":channel_id,
                                  "sr":sr,
                                  "min_frequency":min_frequency,
                                  "spec_time_step":spec_time_step,
                                  "min_segment_length":min_segment_length,
                                  "eps":eps,
                                  "num_trials":num_trials,
                                  "adobe_audition_compatible":adobe_audition_compatible
                              } ),
                              headers = {"Content-Type": "application/json"}
                            )
    return response.json()

Note (Important):

Runing the above code does not require any further dependencies or load any models
The service_address is composed of SEGMENTING_SERVER_IP_ADDRESS + ":" + FLASK_PORT_NUMBER + "/segment". If the server is running in the local machine, then the SEGMENTING_SERVER_IP_ADDRESS is "http://localhost", otherwise, you will need to know the IP address of the server machine.
channel_id is useful when the input audio file has multiple channels. In this case, channel_id can be used to specify which channel to segment. By default channel_id = 0, which means the first channel is used for segmentation.
The parameter adobe_audition_compatible is used to control the returned segmentation results format. If adobe_audition_compatible=1, the returned segmentation result is a dictionary that is comptible with Adobe Audition. This means after converting the dictionary to a Dataframe and then to a csv file, this csv file can be directly loaded into Adobe Audition. If adobe_audition_compatible=0, the segmentation result is a simple dictionary containing only "onset", "offset" and "cluster".

Get the Adobe Audition compitible segmentation results

prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          adobe_audition_compatible = True
                        )
## we can convert the returned dictionary into a pandas Dataframe
df = pd.DataFrame(prediction)
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Start	Duration	Time Format	Type
0	0:00.010	0:00.063	decimal	Cue
1	0:00.380	0:00.067	decimal	Cue
2	0:00.603	0:00.070	decimal	Cue
3	0:00.758	0:00.074	decimal	Cue
4	0:00.912	0:00.571	decimal	Cue
5	0:01.812	0:00.070	decimal	Cue
6	0:01.963	0:00.074	decimal	Cue
7	0:02.073	0:00.570	decimal	Cue
8	0:02.840	0:00.053	decimal	Cue
9	0:02.982	0:00.081	decimal	Cue
10	0:03.112	0:00.171	decimal	Cue
11	0:03.668	0:00.074	decimal	Cue
12	0:03.828	0:00.070	decimal	Cue
13	0:03.953	0:00.570	decimal	Cue
14	0:05.158	0:00.065	decimal	Cue
15	0:05.323	0:00.070	decimal	Cue
16	0:05.468	0:00.575	decimal	Cue

We can save the df to the Adobe Audition compitible csv by (note: index = False, sep="\t" is necessary!):

df.to_csv( "prediction_result.csv", index = False, sep="\t")

Get the simple segmentation results

prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          adobe_audition_compatible = False
                        )
## we can convert the returned dictionary into a pandas Dataframe
df = pd.DataFrame(prediction)
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	onset	offset	cluster
0	0.010	0.073	vocal
1	0.380	0.447	vocal
2	0.603	0.673	vocal
3	0.758	0.832	vocal
4	0.912	1.483	vocal
5	1.812	1.882	vocal
6	1.963	2.037	vocal
7	2.073	2.643	vocal
8	2.840	2.893	vocal
9	2.982	3.063	vocal
10	3.112	3.283	vocal
11	3.668	3.742	vocal
12	3.828	3.898	vocal
13	3.953	4.523	vocal
14	5.158	5.223	vocal
15	5.323	5.393	vocal
16	5.468	6.043	vocal

call the segmenting service in MATLAB:

First define a matlab function

function response = call_segment_service(service_address, audio_file_path, sr, channel_id, min_frequency, spec_time_step, min_segment_length, eps, num_trials, adobe_audition_compatible)

    fileID = fopen(audio_file_path, 'r');
    fileData = fread(fileID, inf, 'uint8=>uint8');

    audio_file_base64_string = matlab.net.base64encode( fileData );
    data = struct('audio_file_base64_string', audio_file_base64_string, ...
                  "channel_id", channel_id, ...
                  "sr", sr, ...
                  "min_frequency", min_frequency, ...
                  "spec_time_step", spec_time_step, ...
                  "min_segment_length", min_segment_length, ...
                  "eps", eps, ...
                  "num_trials", num_trials, ... 
                  "adobe_audition_compatible", adobe_audition_compatible );
    jsonData = jsonencode(data);

    options = weboptions( 'RequestMethod', 'POST', 'MediaType', 'application/json'  );
    response = webwrite(service_address, jsonData, options);

end

Then call the matlab function in MATLAB console:

prediction = prediction = call_segment_service( 'http://localhost:8050/segment', '/Users/meilong/Downloads/zebra_finch_g17y2U-f00007.wav', 32000, 0, 0, 0.0025, 0.01, 0.02, 3, 0 )
disp(prediction)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RunWhisperSegAsWebService.md

RunWhisperSegAsWebService.md

Run WhisperSeg as a Web Service

Step 1: Starting the segmenting service

Step 2: Calling the segmenting service

call the segmenting service in python:

Get the Adobe Audition compitible segmentation results

Get the simple segmentation results

call the segmenting service in MATLAB:

Files

RunWhisperSegAsWebService.md

Latest commit

History

RunWhisperSegAsWebService.md

File metadata and controls

Run WhisperSeg as a Web Service

Step 1: Starting the segmenting service

Step 2: Calling the segmenting service

call the segmenting service in python:

Get the Adobe Audition compitible segmentation results

Get the simple segmentation results

call the segmenting service in MATLAB: