DeepSeek R1 0528, an updated version of the state-of-the-art DeepSeek R1 model, is now available. Try it now!

Fireworks Blog

Announcing Updated 3D FireOptimizer

3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving

Updated Supervised Fine Tuning

Introducing Supervised Fine Tuning V2

6/13/2025
Updated Vision Model Platform

Vision Model Platform Updates: Enhanced Capabilities and New Features

6/12/2025
Announcing Experimentation Platform

Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)

6/11/2025
Announcing Voice Agent Platform Beta

Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)

6/10/2025
Reinforcement fine tuning announcement

Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

6/9/2025
Building a high-quality Synthetic Data Pipeline for Supervised Fine-Tuning

Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning

6/4/2025
Fireworks AI Dev Day 2025 Wrapped

Fireworks DevDay 2025 Wrapped

5/29/2025
 Independent benchmarking of Fireworks shows >250 tokens / second on DeepSeek V3

FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4

5/28/2025
Building an open-source Browser Agent on Fireworks AI

Building an open-source Browser Agent on Fireworks AI

Demo5/21/2025
Fireworks Summer Audio Updates

Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API

5/20/2025
Agentic AI Systems

Agentic AI Systems

5/19/2025
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

5/12/2025
Qwen 3 on Fireworks AI

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

5/6/2025
Llama 4 Maverick on Fireworks AI

Optimizing Llama 4 Maverick on Fireworks AI

4/28/2025
RAG application using MongoDB Atlas and Fireworks AI

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

4/9/2025
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

3/18/2025
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

3/18/2025
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

3/12/2025
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

2/14/2025
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

2/7/2025
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

2/5/2025
From text to task: Constrained generation for structured extraction in R1

From text to task: Constrained generation for structured extraction in R1

2/1/2025
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

1/31/2025
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

1/30/2025
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

1/27/2025
DeepSeek R1: All you need to know 🐳

DeepSeek R1: All you need to know 🐳

1/24/2025
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

1/23/2025
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

1/22/2025
Document inlining: Crossing the modality gap with Compound AI

Document inlining: Crossing the modality gap with Compound AI

12/23/2024
DeepSeek V3 just got vision capabilities!

DeepSeek V3 just got vision capabilities!

12/18/2024
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

12/9/2024
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

12/8/2024
Fireworks f1: A breakthrough in complex reasoning with Compound AI

Fireworks f1: A breakthrough in complex reasoning with Compound AI

11/15/2024
How Upwork and Fireworks deliver faster, smarter proposals for freelancers

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

11/11/2024
FLUX.1 on Fireworks: Fast, frugal, and flexible

FLUX.1 on Fireworks: Fast, frugal, and flexible

10/22/2024
FireAttention V3: Enabling AMD as a viable alternative for GPU inference

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

10/15/2024
Three projects, one platform: A developer's winning streak with Fireworks AI

Three projects, one platform: A developer's winning streak with Fireworks AI

10/14/2024
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

9/25/2024
How Enterprises are using Multimodal Models in production with Fireworks

How Enterprises are using Multimodal Models in production with Fireworks

9/25/2024
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

9/18/2024
FireOptimizer: Customizing latency and quality for your production inference workload

FireOptimizer: Customizing latency and quality for your production inference workload

8/30/2024
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

8/29/2024
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

8/14/2024
How Fireworks evaluates quantization precisely and interpretably

How Fireworks evaluates quantization precisely and interpretably

8/1/2024
Introducing Llama 3.1 inference endpoints in partnership with Meta

Introducing Llama 3.1 inference endpoints in partnership with Meta

7/23/2024
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

7/11/2024
How Cursor built Fast Apply using the Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

6/23/2024
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

6/20/2024
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

6/17/2024
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

6/3/2024
GPUs on-demand: Not serverless, not reserved, but some third thing

GPUs on-demand: Not serverless, not reserved, but some third thing

6/3/2024
Code Generation with Large Language Models - Fireworks AI Take

Code Generation with Large Language Models - Fireworks AI Take

5/8/2024
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

5/6/2024
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

4/18/2024
Getting Started with Stability’s API Powered by Fireworks

Getting Started with Stability’s API Powered by Fireworks

4/17/2024
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

3/21/2024
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

3/8/2024
Fireworks Platform Spring 2024 Updates

Fireworks Platform Spring 2024 Updates

3/1/2024
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

2/20/2024
Why do all LLMs need structured output modes?

Why do all LLMs need structured output modes?

2/20/2024
FireLLaVA: the first commercially permissive OSS LLaVA model

FireLLaVA: the first commercially permissive OSS LLaVA model

1/18/2024
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

1/8/2024
Fireworks Raises the Quality Bar with Function Calling Model and API Release

Fireworks Raises the Quality Bar with Function Calling Model and API Release

12/20/2023
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

12/14/2023
LLM Inference Performance Benchmarking (Part 1)

LLM Inference Performance Benchmarking (Part 1)

11/3/2023
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

11/2/2023
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

10/27/2023
Accelerating Code Completion with Fireworks Fast LLM Inference

Accelerating Code Completion with Fireworks Fast LLM Inference

10/11/2023
Fireworks.ai Now Available on LangChain Prompt Playground

Fireworks.ai Now Available on LangChain Prompt Playground

10/2/2023
Simplifying Code Infilling with Code Llama and Fireworks.ai

Simplifying Code Infilling with Code Llama and Fireworks.ai

9/12/2023
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

8/29/2023
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

8/17/2023
Multi-Query Attention is All You Need

Multi-Query Attention is All You Need

7/12/2023