Announcing our Series D and $1B ARR

Fireworks Blog

Headline Image Showing Why Routing Kimi K3 and Fable Outperforms the Rest

Kimi K3 is competitive with Fable; Kimi K3 + Fable is SoTA.

We ran both Kimi K3 and Fable 5 through ~1,000 agentic benchmark tasks. They tie on the overall top-line numbers, but specialize beneath the surface. K3 outperforms terminal and dev tooling, while Fable leads on web and multi-language tasks. Most importantly, we demonstrate that efficient routing between them improves overall accuracy and dramatically reducing token spend.

Filters

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Developer Experience1/31/2025

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Model Releases1/30/2025

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

Developer Experience1/27/2025

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

DeepSeek R1: All you need to know 🐳

Model Releases1/24/2025

DeepSeek R1: All you need to know 🐳

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Case Studies1/22/2025

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

DeepSeek V3 just got vision capabilities!

Model Releases12/18/2024

DeepSeek V3 just got vision capabilities!

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

Model Releases12/9/2024

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

Case Studies12/8/2024

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

Fireworks f1: A breakthrough in complex reasoning with Compound AI

Model Releases11/15/2024

Fireworks f1: A breakthrough in complex reasoning with Compound AI

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

Case Studies11/11/2024

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

FLUX.1 on Fireworks: Fast, frugal, and flexible

Model Releases10/22/2024

FLUX.1 on Fireworks: Fast, frugal, and flexible

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

Developer Experience10/15/2024

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

Three projects, one platform: A developer's winning streak with Fireworks AI

Case Studies10/14/2024

Three projects, one platform: A developer's winning streak with Fireworks AI

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

Model Releases9/25/2024

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

How Enterprises are using Multimodal Models in production with Fireworks

Case Studies9/25/2024

How Enterprises are using Multimodal Models in production with Fireworks

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

Developer Experience9/18/2024

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

FireOptimizer: Customizing latency and quality for your production inference workload

Model Releases8/30/2024

FireOptimizer: Customizing latency and quality for your production inference workload

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Developer Experience8/29/2024

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Developer Experience8/14/2024

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

How Fireworks evaluates quantization precisely and interpretably

Developer Experience8/1/2024

How Fireworks evaluates quantization precisely and interpretably

Introducing Llama 3.1 inference endpoints in partnership with Meta

Model Releases7/23/2024

Introducing Llama 3.1 inference endpoints in partnership with Meta