Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription of 1 minute file takes 20 - 40 seconds #113

Open
martin-opensky opened this issue Oct 12, 2024 · 2 comments
Open

Transcription of 1 minute file takes 20 - 40 seconds #113

martin-opensky opened this issue Oct 12, 2024 · 2 comments

Comments

@martin-opensky
Copy link

Hello,

I am using the faster-whisper-server on a Mac M1 with the following start command:

docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu

And I'm receiving no performance improvements over the original Open AI Model with the following command:
curl http://localhost:8000/v1/audio/transcriptions -F "file=@test.wav" -F "stream=false" -F "language=en" -F "model=Systran/faster-whisper-small"

This is a 1 minute file which takes between 20s - 40s to transcribe depending on the model size.

To transcribe the same audio file using the Systran faster-whisper directly it takes around a 1-3s.

I'm really unsure why this would be the case. Can anyone shed some light onto what may be causing this?

Many thanks
Martin

@fedirz
Copy link
Owner

fedirz commented Oct 12, 2024

Try the running the following command instead

docker run --publish 8000:8000 --env WHISPER__INFERENCE_DEVICE=auto --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu

Please let me know if that improves the inference speed or not.

@yimejky
Copy link

yimejky commented Nov 26, 2024

I believe this is happening because the CPU version in docker cannot utilize more than one thread. I even tried overriding the config via ENV, and it's still the same :(

docker run \
    -p 8000:8000 \
    --cpus=10 \
    -e WHISPER__INFERENCE_DEVICE=auto \
    -e WHISPER__CPU_THREADS=10 \
    -e OMP_NUM_THREADS=10 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --name faster-whisper-server fedirz/faster-whisper-server:latest-cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants