Releases: bklynhlth/openwillis
v2.2
OpenWillis v2.2
Release date: Wednesday August 14th, 2024
Version 2.2 introduces new capabilities to improve speaker labeling during speech transcription. It also introduces new features for preprocessing videos with multiple faces to better support downstream measurement of facial and emotional expressivity.
If you have feedback or questions, please bring it up in the Discussions tab.
Contributors
GeorgeEfstathiadis
vjbytes102
Kcmcveigh
maworthington
WillisDiarize v1.0
A new function was added for correcting speaker labeling errors after speech transcription. This function takes the JSON file of a transcript as input, passes it through an ensemble LLM model, and outputs the corrected JSON file.
WillisDiarize AWS v1.0
WillisDiarize AWS performs the same task as the previous function. However, it is best suited for users that are operating within their own EC2 instance. This function assumes the user has already deployed the WillisDiarize model as a SageMaker endpoint (see Getting Started page for instructions).
Speech transcription with AWS v1.2 / Speech transcription with Whisper v1.2
Updated to add the option of implementing the WillisDiarize functions to correct speaker labeling errors prior to creating the JSON output file.
Speech characteristics v3.1
Added functionality to only compute sets of speech coherence variables if desired, to avoid unnecessary computational burden, using the option parameter.
Vocal acoustics v2.1
Updated to include the option to calculate framewise summary statistics only for voiced segments longer than 100ms.
Video preprocessing for faces v1.0
This function adds preprocessing capabilities for video files containing the face of more than one individual. For contexts such as video calls and recordings of clinic visits, this function detects unique faces; the output can be used to apply the facial_expressivity and emotional_expressivity functionings to a unique face in a video
Video cropping v1.0
This function, designed to be used in conjunction with preprocess_face_video , allows the user to adjust parameters related to cropping and trimming videos to extract frames for each unique face.
General updates
Updated Pyannote from 3.0.0 to 3.1.1 to match WhisperX dependencies.
v2.1
OpenWillis v2.1
Release date: Thursday March 21st, 2024
Version 2.1 adds new measures in vocal acoustics and speech characteristics analyses, specifically relating to MDD, Schizophrenia and Parkinson's Disease. It also increases support for speaker identification (in speech transcription functions) for more clinical interview scripts.
If you have feedback or questions, please bring it up in the Discussion tab
Contributors
Speech characteristics v3.0
New measures based on recent reports in the scientific literature on speech characteristics associated with schizophrenia and depression; now includes variables such as speech coherence, sentence tangentiality, semantic perplexity, and improved measurement of parts of speech.
Vocal acoustics v2.0
New measures are added that are grouped in several categories:
- Relative variations and durations of pauses related to Parkinson’s Disease
- Depression related cepstral variables
- Vocal tremor variables (to be run in sustained vowel phonation)
- Advanced variables in Normalized Amplitude Quotient (NAQ), opening quotient (OQ) and Harmonic Richness Factor (HRF)
Added functionality to only compute sets of variables if desired, to avoid unnecessary computational burden, using the option parameter.
Removed:
- Min/max features, which we noticed were not useful or interpretable
- Pause characteristics (these were redundant with speech characteristics)
Speech Transcription with AWS v1.1/ Speech Transcription with Whisper v1.1
Added more clinical interview support for:
- HAM-A, conducted in accordance with Hamilton Anxiety Rating Scale (SIGH-A)
- CAPS past week conducted in accordance with DSM-5 (CAPS-5) Past Week Version
- CAPS past month conducted in accordance with DSM-5 (CAPS-5) Past Month Version
- CAPS DSM IV conducted in accordance with Clinician-Administered PTSD Scale For DSM-IV.
- MINI conducted in accordance with Version 7.0.2 for DSM-5
- CAINS conducted in accordance with CAINS (v1.0)
v2.0
OpenWillis v2.0
Release date: Monday February 5th, 2024
Version 2.0 adds support for GPS analysis and addresses potential issues caused by the min_turn_length functionality in the speech characteristics function.
If you have feedback or questions, please bring it up in the Dicussions tab
Contributors
vjbytes102
anzarabbas
GeorgeEfstathiadis
General updates
Upgraded requirements, by bumping transformers to version 4.36.0 and downgrading Vosk to version 0.3.44 as to avoid installation issues on MacOS.
Speech characteristics v2.3
Updated logic for calculating variables that are affected when a minimum turn length is specified
GPS analysis v1.0
A new function for GPS analysis was added; it calculates clinically meaningful measures from passively collected GPS data. Specifically we measure things such as:
- Time and speed of travel
- Time spent idle
- home related variables, such as time spent at home and maximum distance from home during
v1.6
OpenWillis v1.6
Release date: Wednesday November 15th, 2023
Version 1.6 brings significant changes leading to flexibility in speech transcription, speaker separation, and subsequent quantification of speech characteristics.
The user is now able to easily choose between different models for speech transcription and separate audio files with multiple speakers regardless of speech transcription model used. The speech characteristics function has been updated to support outputs from any of these routes.
If you have feedback or questions, please reach out.
Contributors
General updates
There are now three separate speech transcription functions: one using Vosk, one using WhisperX, and one using Amazon Transcribe, each with its own pros and cons as described below.
Speech Transcription with Vosk | Speech transcription conducted locally on a user’s machine; needs fewer computational resources but is less accurate |
---|---|
Speech Transcription with Whisper | Speech transcription conducted locally on a user’s machine; needs greater computational resources but is more accurate |
Speech Transcription with AWS | Speech transcription conducted via the Amazon Transcribe API; requires (typically) paid access to the API and AWS resources |
Finally, the Speech Characteristics function has been updated to support JSON transcripts from each of the speech transcription functions. It also contains bug fixes that were leading to certain variables not being calculated in certain contexts.
v1.5
OpenWillis v1.5
Release date: Thursday Oct 5th, 2023
Version 1.5 brings refined methods for speech transcription and speaker separation. OpenWillis is now able to use Whisper for speech transcription. This integration ensures consistent transcription accuracy, whether processed locally or on cloud-based servers, and introduces support for multiple languages.
If you have feedback or questions, please reach out.
Contributors
General updates
The speech transcription and speaker separation functions have been updated to allow for a processing workflow similar to that of the cloud-based speech transcription and speaker separation functions through the integration of Whisper as one of the transcription models available. This also prompted a revision to the Speech Characteristics function so that it may support JSON files produced by Whisper.
Speech transcription v2.0
The new speech transcription function can now use WhisperX to transcribe speech to text, which can label speakers in the case of multiple speakers and has integrated speaker identification in case of structured clinical interviews.
Speaker separation v2.0
The speaker separation function has been updated to support JSON files with labeled speakers that the user can now obtain by leaning on WhisperX during speech transcription. In this scenario, it simply splits the speakers based on the labels in the JSON file.
Speech characteristics v2.1
The speech characteristics function now supports JSON files acquired through WhisperX. All output variables remain the same.
v1.3
OpenWillis v1.3
Release date: August 17th, 2023
Version 1.3 now supports multi-speaker speech analysis and video-based eye blink detection.
If you have feedback or questions, please reach out
Contributors
General updates
- Updated versions for dependencies for improved optimization:
- Tensorflow: 2.9.0 to 2.11.1
- Protobuf: 3.20.0 to 3.20.2
Version 2.0 of the speech characteristics function processes multi-speaker JSONs, allowing user-selected speaker analysis. Outputs are now segmented by word, phrase, turn, and overall file. Refer to speech transcription cloud v1.0 on how to acquire labeled transcripts.
The new eye blink rate function allows for precise quantification of both basic blink rates and blink characteristics from videos of an individual.
For improved scalability, we’ve isolated speaker separation based on pre-labeled multi-speaker JSONs into its own function. The existing speaker separation v1.1 function will be meant to work on JSONs without speaker labels.
v1.2
OpenWillis v1.2
Release date: June 14th, 2023
The v1.2 release improves OpenWillis’ speech analysis capabilities and improves processing workflows.
If you have feedback or questions, please do reach out.
Contributors
General updates
- For better accessibility, all method description documentation has been moved from Google Docs to the repo’s wiki––a much more appropriate place for it.
- The example uses from the notebook included in the code have been moved to the same methods description documents in the wiki, consolidating this information in one place.
Repository updates
We have restructured the folder organization: Functions are now categorized based on the modality of data they process. This will feel more intuitive to independent contributors.
Function updates
We've separated speech transcription into two functions:
- Speech transcription v1.1: This uses locally executable models for speech transcription, maintaining the functionality of the previous version of the same method.
- Speech transcription cloud v1.0: This new function uses cloud-based models for speech transcription, specifically incorporating Amazon Transcribe. Users must input their own AWS credentials for this. A notable feature of this version is its ability to label speakers in a dual-speaker audio file. In the case of clinical interview recordings, speakers can also be identified as 'clinician' or 'participant', with these labels included in the outputted JSON.
The speaker separation function has been updated to accommodate both transcription workflows:
- The locally executable models that separate speakers remain the same, the difference being that they use the JSON output from the speech transcription v1.1 function for improved efficiency.
- For when the user employs the speech transcription cloud v1.0 function to get a JSON with speaker labels included, the speaker separation function can simply use those labels to separate the audio into individual files for each speaker. This is a much faster option.
In response to these function modifications, we are also releasing speech characteristics v1.1, which enables users to choose which speaker they wish to calculate speech characteristics from thanks to the labeling included in the output JSON file from the cloud-based speech transcription function.