Whisper V3 Large API & Playground

Whisper V3 Large API Features

Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Whisper V3 Large using Fireworks' reliable, high-performance system with no rate limits.

Whisper V3 Large FAQs

What is Whisper V3 Large and who developed it?

Whisper V3 Large is a multilingual, Transformer-based automatic-speech-recognition (ASR) and speech-translation model created by OpenAI and hosted on Fireworks AI.

What applications and use cases does Whisper V3 Large excel at?

Whisper V3 Large is best suited for:

High-accuracy speech transcription
Zero-shot speech-to-English translation across 99 languages

What is the maximum context length for Whisper V3 Large?

The model's receptive field is 30 seconds of audio per inference window.

What is the usable context window?

Fireworks recommends chunking longer audio into 30-second segments (with optional overlap) for stable performance.

Does Whisper V3 Large support quantized formats (4-bit/8-bit)?

Yes. 16 quantized variants (including 4-bit & 8-bit) are supported for Whisper V3 Large.

What are known failure modes of Whisper V3 Large?

Known limitations of Whisper V3 Large include:

Possible hallucinated text
Uneven accuracy on low-resource languages or certain accents
Occasional repetitive outputs

How many parameters does Whisper V3 Large have?

Whisper V3 Large has approximately 1.54 billion parameters.

What rate limits apply on the shared endpoint?

On-demand deployments of Whisper V3 Large run on dedicated GPUs with no rate limits.

Whisper V3 Large

Whisper V3 Large API Features

Serverless

On-demand Deployment

Whisper V3 Large FAQs

Metadata

Specification

Supported Functionality