Announcing our Series D and $1B ARR

Fireworks Blog

Headline Image Showing Why Routing Kimi K3 and Fable Outperforms the Rest

Kimi K3 is competitive with Fable; Kimi K3 + Fable is SoTA.

We ran both Kimi K3 and Fable 5 through ~1,000 agentic benchmark tasks. They tie on the overall top-line numbers, but specialize beneath the surface. K3 outperforms terminal and dev tooling, while Fable leads on web and multi-language tasks. Most importantly, we demonstrate that efficient routing between them improves overall accuracy and dramatically reducing token spend.

Filters

Understanding Embeddings and Reranking at Scale

Developer Experience9/12/2025

Understanding Embeddings and Reranking at Scale

DeepSeek V3.1

Model Releases8/26/2025

DeepSeek V3.1 now on Fireworks AI!

Eval Driven Development with Claude Code

Developer Experience8/25/2025

LLM Eval Driven Development with Claude Code

Your AI Benchmark is Lying to You. Here's How We Caught It

Benchmarks8/15/2025

Your AI Benchmark is Lying to You. Here's How We Caught It

Test driven agent development with eval protocol

Developer Experience8/14/2025

Test-Driven Agent Development with Eval Protocol

Quality first: how Fireworks.ai is the go-to place for gpt-oss

Developer Experience8/12/2025

Quality first: how Fireworks.ai is the go-to place for gpt-oss

GPT-OSS Models

Model Releases8/5/2025

Introducing OpenAI gpt-oss (20b & 120b)

Eval Protocol

Model Releases8/4/2025

Announcing Eval Protocol

Qwen 3 Decoded

Model Releases8/1/2025

Qwen3 Decoded: Choosing the Right Model For Your Task

Kimi K2 Deep Dive

Developer Experience8/1/2025

Kimi K2: Deep Dive into model performance and use-cases

Fireworks AI Batch API

Model Releases7/31/2025

Run bulk async workloads with Fireworks Batch API

Real-world leaderboard

Benchmarks7/30/2025

Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job

Introducing Vision-Language Model Fine-tuning

Model Releases7/29/2025

Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain

Notion

Case Studies7/25/2025

How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI

QK-Clip

Developer Experience7/22/2025

A Deep Dive into MLA training/inference difference and why QK-Clip from Kimi is such an elegant idea

VibeRL: When AI Trains AI

Model Releases7/22/2025

VibeRL: When AI Trains AI

Sentient & Fireworks Powers Decentralized AI At Viral Scale

Case Studies7/17/2025

Sentient & Fireworks Powers Decentralized AI At Viral Scale

Fireworks Sagemaker

Model Releases7/15/2025

Fireworks AI Now Supports Amazon SageMaker

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training

Developer Experience7/15/2025

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training

Understanding Function Calling: The Bridge to Agentic AI

Developer Experience7/11/2025

Understanding Function Calling: The Bridge to Agentic AI

Using Model as Judge for Reward in Reinforcement Fine Tuning

Developer Experience7/10/2025

Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning