Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper quality validation in CI #216

Open
FL33TW00D opened this issue Jun 19, 2024 · 0 comments
Open

Whisper quality validation in CI #216

FL33TW00D opened this issue Jun 19, 2024 · 0 comments
Labels
10x skills needed help wanted Extra attention is needed

Comments

@FL33TW00D
Copy link
Collaborator

FL33TW00D commented Jun 19, 2024

I don't care how fast your bit crunched model can spit out ****** tokens.

Predictable. We practiced accuracy-driven development where our internal testing infrastructure validates code and model commits on Whisper accuracy evaluation benchmarks comprising librispeech (~2.6k short audio clips, ~5 hours total) and earnings22 (~120 long audio clips, ~120 hours total) datasets. Results of periodic testing are published here. This approach enables us to detect and mitigate quality-of-inference (more on this below) regressions due to code changes in WhisperKit as well as performance and functional regressions from lower levels of the software stack. This helps us improve time-to-detect and time-to-fix most issues with best-effort. Taking it a step further, we offer customer-level SLAs to detect and fix all issues within a maximum time period for specific model and device versions to developers or enterprises.

The above quote from Argmax is spot on, we need to do exactly this.

I want a CI job added that runs all whisper variants and quantization formats across librispeech.

@FL33TW00D FL33TW00D added help wanted Extra attention is needed 10x skills needed labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10x skills needed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant