Announcing our Series D and $1B ARR

Fireworks Blog

Headline Image Showing Why Routing Kimi K3 and Fable Outperforms the Rest

Kimi K3 is competitive with Fable; Kimi K3 + Fable is SoTA.

We ran both Kimi K3 and Fable 5 through ~1,000 agentic benchmark tasks. They tie on the overall top-line numbers, but specialize beneath the surface. K3 outperforms terminal and dev tooling, while Fable leads on web and multi-language tasks. Most importantly, we demonstrate that efficient routing between them improves overall accuracy and dramatically reducing token spend.

Filters

Flux Kontext on Fireworks

Model Releases7/9/2025

Introducing FLUX.1 Kontext on Fireworks

Announcing Response API with MCP

Model Releases6/22/2025

Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)

Announcing Virtual Cloud on Fireworks AI

Model Releases6/16/2025

Build for Scale with Fireworks Virtual Cloud (GA)

Announcing Updated 3D FireOptimizer

Model Releases6/14/2025

3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving

Updated Supervised Fine Tuning

Model Releases6/13/2025

Introducing Supervised Fine Tuning V2

Reinforcement fine tuning announcement

Model Releases6/9/2025

Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

Fireworks AI Dev Day 2025 Wrapped

Company News5/29/2025

Fireworks DevDay 2025 Wrapped

Independent benchmarking of Fireworks shows >250 tokens / second on DeepSeek V3

Model Releases5/28/2025

FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4

Building an open-source Browser Agent on Fireworks AI

Developer Experience5/21/2025

Building an open-source Browser Agent on Fireworks AI

Agentic AI Systems

Developer Experience5/19/2025

Agentic AI Systems

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Developer Experience5/12/2025

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Qwen 3 on Fireworks AI

Model Releases5/6/2025

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

Llama 4 Maverick on Fireworks AI

Developer Experience4/28/2025

Optimizing Llama 4 Maverick on Fireworks AI

RAG application using MongoDB Atlas and Fireworks AI

Developer Experience4/9/2025

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Model Releases3/18/2025

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Model Releases3/18/2025

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Model Releases3/12/2025

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

Model Releases2/14/2025

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

Developer Experience2/7/2025

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

Model Releases2/5/2025

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

From text to task: Constrained generation for structured extraction in R1

Developer Experience2/1/2025

From text to task: Constrained generation for structured extraction in R1