Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Whisper V3 Turbo using Fireworks' reliable, high-performance system with no rate limits. |
Whisper V3 Turbo is a fine-tuned variant of OpenAI's Whisper large-v3 model in which the decoder layers were reduced from 32 to 4, delivering much faster inference with only minor quality loss. The model is served on Fireworks AI and the underlying model was created by OpenAI.
Whisper V3 Turbo is optimized for:
Whisper models process up to 30 seconds of audio per forward pass (known as the model's "receptive field").
The usable window is effectively the 30-second receptive field. Longer recordings must be chunked or streamed in sequential windows.
Known limitations include:
Whisper V3 Turbo has approximately 809 million parameters.
Whisper V3 Turbo was released under the MIT License.