Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Whisper V3 Large using Fireworks' reliable, high-performance system with no rate limits. |
Whisper V3 Large is a multilingual, Transformer-based automatic-speech-recognition (ASR) and speech-translation model created by OpenAI and hosted on Fireworks AI.
Whisper V3 Large is best suited for:
The model's receptive field is 30 seconds of audio per inference window.
Fireworks recommends chunking longer audio into 30-second segments (with optional overlap) for stable performance.
Yes. 16 quantized variants (including 4-bit & 8-bit) are supported for Whisper V3 Large.
Known limitations of Whisper V3 Large include:
Whisper V3 Large has approximately 1.54 billion parameters.