GitHub - actuallyyun/JoyScribe: A delightful AI powered tool to transcribe large audio/video files into pleasant text form.

About

This tool transcribes audio to text using OpenAI's Speech to text API, and post process it using Text generation API.

Getting started

Pre-requisitions

Before you begin, ensure you have met the following requirements:

You have ffmpeg installed. Please refer to Getting ffmpeg set up
You have a valid OpenAI API key and configurated.
You have Python 3.9 or higher installed on your machine.

Installation

Clone the repository:

git clone git@github.com:actuallyyun/JoyScribe.git

Navigate to the project directory:
```
cd JoyScribe
```
Install the required dependencies. You can do this by running:

pip install -r requirements.txt

How to use

Open a terminal and navigate to the project directory.
Run the script with the following command:
```
python transcribe.py --file <path_to_audio_file> --output <output_directory>
```
Replace <path_to_audio_file> with the path to your audio file and <output_directory> with the directory where you want to save the output files.
Wait for the the program to finish. You can follow the terminal to see the progress.

To customize the prompt in the post_processing.py file, you can modify the system_prompt variable.

How I built it

Test it with short audio

By default, the Whisper API only supports files that are less than 25 MB.

This is a limitation I have to address since my audio files are bigger than 25 MB.

But first, I want to make sure my setup with OpenAI works.

So I tested it with a shorter audio, run the script, waited for a few seconds, and woala, it worked.

Research solutions for longer audio inputs

OpenAI's documentation recommened the to use the PyDub open source Python package to split the audio

But this packages has some dependencies, and one of which is ffmpeg. The offical ffmpeg website is a bit confusing and only provides download option.

I use Mac OS and perfer install packages with brew. And indeed, brew has this package.

The logic is straightforward: cut the audio into smaller chunks, save it to a directory and pass the segement one by one to whisper.

Optimizations

Up until this point,it does the job transcribing it. However, the response is not easy to read. It does not include puncutations and does not have formatting.

Using gpt to post process the transcript is a common practice.

To simplify things, I decided to use one assistant for the job. It does the following:

Format the text
Convert traditional Chinese characters to simplified Chinese
Format English with Chinese translations included in parenthese
Extract subheading every 4 paragraphs

Future improvements

Keep fine tuning the assistant to be more useful.
Clean the audio before segementing. For example, trim leading silence in audio, this increases Wisper's transcribing performance.

Got suggestions?

For any inquiries or feedback, please reach out to:

Name: Yun Ji
Email: this.jiyun@gmail.com
LinkedIn: your-linkedin-profile

Feel free to connect with me!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
README.md		README.md
post_processing.py		post_processing.py
requirements.txt		requirements.txt
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Getting started

Pre-requisitions

Installation

How to use

How I built it

Optimizations

Future improvements

Got suggestions?

About

Releases

Packages

Languages

actuallyyun/JoyScribe

Folders and files

Latest commit

History

Repository files navigation

About

Getting started

Pre-requisitions

Installation

How to use

How I built it

Optimizations

Future improvements

Got suggestions?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages