DeepSeek R1 0528, an updated version of the state-of-the-art DeepSeek R1 model, is now available. Try it now!

Fireworks Blog

Announcing Virtual Cloud on Fireworks AI

Build for Scale with Fireworks Virtual Cloud (GA)

Announcing Updated 3D FireOptimizer

3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving

6/14/2025
Updated Supervised Fine Tuning

Introducing Supervised Fine Tuning V2

6/13/2025
Updated Vision Model Platform

Vision Model Platform Updates: Enhanced Capabilities and New Features

6/12/2025
Announcing Experimentation Platform

Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)

6/11/2025
Announcing Voice Agent Platform Beta

Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)

6/10/2025
Reinforcement fine tuning announcement

Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

6/9/2025
Building a high-quality Synthetic Data Pipeline for Supervised Fine-Tuning

Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning

6/4/2025
Fireworks AI Dev Day 2025 Wrapped

Fireworks DevDay 2025 Wrapped

5/29/2025
 Independent benchmarking of Fireworks shows >250 tokens / second on DeepSeek V3

FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4

5/28/2025
Building an open-source Browser Agent on Fireworks AI

Building an open-source Browser Agent on Fireworks AI

Demo5/21/2025
Fireworks Summer Audio Updates

Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API

5/20/2025
Agentic AI Systems

Agentic AI Systems

5/19/2025
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

5/12/2025
Qwen 3 on Fireworks AI

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

5/6/2025
Llama 4 Maverick on Fireworks AI

Optimizing Llama 4 Maverick on Fireworks AI

4/28/2025
RAG application using MongoDB Atlas and Fireworks AI

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

4/9/2025
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

3/18/2025
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

3/18/2025
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

3/12/2025
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

2/14/2025
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

2/7/2025
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

2/5/2025
From text to task: Constrained generation for structured extraction in R1

From text to task: Constrained generation for structured extraction in R1

2/1/2025
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

1/31/2025
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

1/30/2025
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

1/27/2025
DeepSeek R1: All you need to know 🐳

DeepSeek R1: All you need to know 🐳

1/24/2025
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

1/23/2025
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

1/22/2025
Document inlining: Crossing the modality gap with Compound AI

Document inlining: Crossing the modality gap with Compound AI

12/23/2024
DeepSeek V3 just got vision capabilities!

DeepSeek V3 just got vision capabilities!

12/18/2024
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

12/9/2024
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

12/8/2024
Fireworks f1: A breakthrough in complex reasoning with Compound AI

Fireworks f1: A breakthrough in complex reasoning with Compound AI

11/15/2024
How Upwork and Fireworks deliver faster, smarter proposals for freelancers

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

11/11/2024
FLUX.1 on Fireworks: Fast, frugal, and flexible

FLUX.1 on Fireworks: Fast, frugal, and flexible

10/22/2024
FireAttention V3: Enabling AMD as a viable alternative for GPU inference

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

10/15/2024
Three projects, one platform: A developer's winning streak with Fireworks AI

Three projects, one platform: A developer's winning streak with Fireworks AI

10/14/2024
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

9/25/2024
How Enterprises are using Multimodal Models in production with Fireworks

How Enterprises are using Multimodal Models in production with Fireworks

9/25/2024
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

9/18/2024
FireOptimizer: Customizing latency and quality for your production inference workload

FireOptimizer: Customizing latency and quality for your production inference workload

8/30/2024
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

8/29/2024
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

8/14/2024
How Fireworks evaluates quantization precisely and interpretably

How Fireworks evaluates quantization precisely and interpretably

8/1/2024
Introducing Llama 3.1 inference endpoints in partnership with Meta

Introducing Llama 3.1 inference endpoints in partnership with Meta

7/23/2024
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

7/11/2024
How Cursor built Fast Apply using the Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

6/23/2024
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

6/20/2024
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

6/17/2024
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

6/3/2024
GPUs on-demand: Not serverless, not reserved, but some third thing

GPUs on-demand: Not serverless, not reserved, but some third thing

6/3/2024
Code Generation with Large Language Models - Fireworks AI Take

Code Generation with Large Language Models - Fireworks AI Take

5/8/2024
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

5/6/2024
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

4/18/2024
Getting Started with Stability’s API Powered by Fireworks

Getting Started with Stability’s API Powered by Fireworks

4/17/2024
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

3/21/2024
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

3/8/2024
Fireworks Platform Spring 2024 Updates

Fireworks Platform Spring 2024 Updates

3/1/2024
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

2/20/2024
Why do all LLMs need structured output modes?

Why do all LLMs need structured output modes?

2/20/2024
FireLLaVA: the first commercially permissive OSS LLaVA model

FireLLaVA: the first commercially permissive OSS LLaVA model

1/18/2024
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

1/8/2024
Fireworks Raises the Quality Bar with Function Calling Model and API Release

Fireworks Raises the Quality Bar with Function Calling Model and API Release

12/20/2023
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

12/14/2023
LLM Inference Performance Benchmarking (Part 1)

LLM Inference Performance Benchmarking (Part 1)

11/3/2023
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

11/2/2023
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

10/27/2023
Accelerating Code Completion with Fireworks Fast LLM Inference

Accelerating Code Completion with Fireworks Fast LLM Inference

10/11/2023
Fireworks.ai Now Available on LangChain Prompt Playground

Fireworks.ai Now Available on LangChain Prompt Playground

10/2/2023
Simplifying Code Infilling with Code Llama and Fireworks.ai

Simplifying Code Infilling with Code Llama and Fireworks.ai

9/12/2023
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

8/29/2023
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

8/17/2023
Multi-Query Attention is All You Need

Multi-Query Attention is All You Need

7/12/2023