Whisper V3 Turbo
ServerlessAudio
ServerlessAudio
Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
Whisper V3 Turbo is available via Fireworks' Speech-to-Text APIs, where you are billed based on the duration of the transcribed audio. The API supports multiple languages and additional features, including forced alignment.
You can call the Fireworks Speech-to-Text API using HTTP requests from any language. You can see the API references here:
Transcribe audio to text in the language in which the audio was spoken.
import requests with open("<AUDIO_FILE_PATH>", "rb") as f: response = requests.post( "https://api.fireworks.ai/inference/v1/audio/transcriptions", headers={"Authorization": f"Bearer <YOUR_API_KEY>"}, files={"file": f}, data={ "model": "whisper-v3-turbo", "vad_model": "silero", "alignment_model": "tdnn_ffn", "preprocessing": "none", "temperature": "0", "timestamp_granularities": "segment" }, ) if response.status_code == 200: print(response.json()) else: print(f"Error: {response.status_code}", response.text)
Transcribe audio from many language to text in English.
import requests with open("<AUDIO_FILE_PATH>", "rb") as f: response = requests.post( "https://api.fireworks.ai/inference/v1/audio/translations", headers={"Authorization": f"Bearer <YOUR_API_KEY>"}, files={"file": f}, data={ "model": "whisper-v3-turbo", "vad_model": "silero", "alignment_model": "tdnn_ffn", "preprocessing": "none", "temperature": "0", "timestamp_granularities": "segment" }, ) if response.status_code == 200: print(response.json()) else: print(f"Error: {response.status_code}", response.text)
Run forced alignment over audio and a transcript. That is, compute start and end timestamp boundaries for each word and return the computed timestamps
import requests with open("<AUDIO_FILE_PATH>", "rb") as f: response = requests.post( "https://api.fireworks.ai/inference/v1/audio/alignments", headers={"Authorization": f"Bearer <YOUR_API_KEY>"}, files={"file": f}, data={ "text": "<TEXT_TO_ALIGN>", "vad_model": "silero", "alignment_model": "tdnn_ffn", "preprocessing": "none", "temperature": "0", "timestamp_granularities": "segment" }, ) if response.status_code == 200: print(response.json()) else: print(f"Error: {response.status_code}", response.text)