Announcing our Series D and $1B ARR

Fireworks Blog

Headline Image Showing Why Routing Kimi K3 and Fable Outperforms the Rest

Kimi K3 is competitive with Fable; Kimi K3 + Fable is SoTA.

We ran both Kimi K3 and Fable 5 through ~1,000 agentic benchmark tasks. They tie on the overall top-line numbers, but specialize beneath the surface. K3 outperforms terminal and dev tooling, while Fable leads on web and multi-language tasks. Most importantly, we demonstrate that efficient routing between them improves overall accuracy and dramatically reducing token spend.

Filters

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Company News7/11/2024

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

How Cursor built Fast Apply using the Speculative Decoding API

Developer Experience6/23/2024

How Cursor built Fast Apply using the Speculative Decoding API

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

Model Releases6/20/2024

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Model Releases6/17/2024

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

Model Releases6/3/2024

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

GPUs on-demand: Not serverless, not reserved, but some third thing

Developer Experience6/3/2024

GPUs on-demand: Not serverless, not reserved, but some third thing

Code Generation with Large Language Models - Fireworks AI Take

Developer Experience5/8/2024

Code Generation with Large Language Models - Fireworks AI Take

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Developer Experience5/6/2024

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Model Releases4/18/2024

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Getting Started with Stability’s API Powered by Fireworks

Developer Experience4/17/2024

Getting Started with Stability’s API Powered by Fireworks

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Developer Experience3/21/2024

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

multi-operation-fusions-MoE

Developer Experience3/10/2024

Training-Inference Parity in MoE Models: Where Numerics Drift

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Model Releases3/8/2024

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks Platform Spring 2024 Updates

Model Releases3/1/2024

Fireworks Platform Spring 2024 Updates

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

Model Releases2/20/2024

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

Why do all LLMs need structured output modes?

Model Releases2/20/2024

Why do all LLMs need structured output modes?

FireLLaVA: the first commercially permissive OSS LLaVA model

Model Releases1/18/2024

FireLLaVA: the first commercially permissive OSS LLaVA model

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

Developer Experience1/8/2024

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Model Releases12/20/2023

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Model Releases12/14/2023

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

LLM Inference Performance Benchmarking (Part 1)

Developer Experience11/3/2023

LLM Inference Performance Benchmarking (Part 1)