
Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API
By Fireworks AI|5/20/2025
Qwen 3 models are now available with SOTA reasoning, coding and agentic tool use capabilities. Try Qwen 3 now
By Fireworks AI|5/20/2025
Last winter, Fireworks launched the market’s fastest Whisper-based speech transcription service (as measured by Artificial Analysis). Fireworks offers both:
Since then we’ve been overwhelmed by the customer response. We’ve already seen a variety of innovative use cases built on top of Fireworks, ranging from drive-thru analytics to customer service evaluations. However, we believe the industry is just getting started with audio.
Today, we’re improving on our transcription service by introducing:
One of our most common feature requests for pre-recorded transcription was speaker diarization. Speaker diarization is the ability to identify speakers in audio. It’s crucial for a variety of use cases like providing detailed meeting note transcriptions or getting per-speaker analytics for phone calls.
Our Speaker Diarization feature is built for real-world scale and reliability, with:
Get started with the diarization today in code through docs or try it in our UI playground.
To enable word-level speaker diarization, follow these steps:
These settings ensure that each word in the response includes a speaker_id field.
import requests
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions",
headers={"Authorization": f"Bearer <YOUR_API_KEY>"},
files={"file": f},
data={
"model": "whisper-v3",
"temperature": "0",
"vad_model": "silero",
"response_format" : "verbose_json",
"timestamp_granularities" : "word,segment",
"diarize" : "true",
"min_speakers" : "1",
"max_speakers" : "2"
},
)
if response.status_code == 200:
print(response.json())
else:
print(f"Error: {response.status_code}", response.text)
Here's an example response for diarization.
We've added a speaker_id field to each word and segment to indicate who’s speaking. (Only part of the full response is shown below for brevity.)
Ready to try? Diarization normally includes a 40% compute surcharge — but we’re waiving that fee for the rest of May.
The second major update is Batch API. We’ve heard from both enterprise and public platform customers that:
We’re solving these problems with the introduction of our audio Batch API.
Why use the Batch API:
The Batch API solution works as follows:
For more details, see the Create Batch Request and Check Batch Status docs.
Now, here’s a simple code example to show how it works:
import requests
import time
import sys
FIREWORKS_API_KEY = "<YOUR_API_KEY>"
file_list = ["audio1.flac", "audio2.flac"] # List of files to submit
batch_ids = []
account_ids = []
# Submit requests
for idx, filename in enumerate(file_list):
with open(filename, "rb") as f:
response = requests.post(
"https://audio-batch.link.fireworks.ai/v1/audio/transcriptions?endpoint_id=audio-prod",
headers={"Authorization": f"Bearer {FIREWORKS_API_KEY}"},
files={"file": f},
data={
"model": "whisper-v3",
"temperature": "0",
"vad_model": "silero",
"response_format": "json"
},
)
if response.status_code != 200:
print(f"[Request {idx+1}] Error: {response.status_code}", response.text)
sys.exit(1)
data = response.json()
account_ids.append(data["account_id"])
batch_ids.append(data["batch_id"])
print(f"[Request {idx+1}] Batch submitted successfully.")
print(data)
# Wait before polling (you can adjust the delay as needed)
wait_seconds = 10
print(f"Waiting {wait_seconds} seconds before polling...")
time.sleep(wait_seconds)
# Poll for results
for i in range(len(file_list)):
response = requests.get(
f"https://audio-batch.link.fireworks.ai/v1/accounts/{account_ids[i]}/batch_job/{batch_ids[i]}",
headers={"Authorization": f"Bearer {FIREWORKS_API_KEY}"}
)
if response.status_code == 200:
print(f"[Result {i+1}]")
print(response.json())
else:
print(f"[Result {i+1}] Error: {response.status_code}", response.text)
To explore more use cases—like managing the progress of each batch request locally and parsing raw responses—check out this cookbook.
Ready to try? Batch API is 100 % FREE for the next two weeks—sign up now and start batch-processing your audio workloads (transcription, translation, and more) at scale.
With speaker diarization and batch processing onboard, it’s even easier to build AI applications with Fireworks. Our audio service pairs with text inference and other modalities to drive compound AI pipelines for contact-center analytics, large-scale transcription, and media indexing—linking speech, text, imagery, and domain-specific models in one streamlined workflow.
Fireworks makes it easy to build AI applications on top of the fastest inference. Fireworks provides one place for:
Keep in touch with us on Discord or Twitter. Stay tuned for more updates coming soon about Fireworks and audio!