Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Fireworks Blog

Turn Your LLM into a Classifier for $2

Turn Your LLM into a Classifier for $2

Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI
12/2/2025

Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI

Fireworks Expands AWS Alliance: Strategic Collaboration Agreement
11/24/2025

Fireworks Expands AWS Alliance: Strategic Collaboration Agreement + GenAI Competency

Eval Protocol: RL on your agents, in any environment
11/20/2025

Eval Protocol: RL on your agents, in any environment

Fireworks ISO Certifications
11/19/2025

Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in AI at Scale

50 Trillion Tokens Per Day The State of Agent Environments
11/19/2025

50 Trillion Tokens Per Day: The State of Agent Environments

Fireworks RFT: Build AI Agents with fine-tuned open models that outperform frontier closed models
11/10/2025

Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier closed models

By Leela S. Karumbunathan
RADPAIR and Fireworks Unlock Smarter Radiology Workflows
11/9/2025

Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows

Vercel and Fireworks Partnership
Product
11/3/2025

40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model with Open Models, Speculative Decoding and Reinforcement Fine Tuning on Fireworks

Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks Reinforcement Fine Tuning, Achieving a 50% Cost Reduction
10/31/2025

Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction

Series C
10/28/2025

We raised $250M To Help Enterprises Own Their AI

Deploy NVIDIA Nemotron Nano 2 VL on Fireworks
10/27/2025

Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI

Deployment Shapes One Click Deployment Configured for You
10/23/2025

Deployment Shapes: One-Click Deployment Configured For You

fireworks amd
10/20/2025

Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs

LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama
10/15/2025

LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama

Announcing Embeddings  and Reranking  on Fireworks AI
10/9/2025

Announcing Embeddings and Reranking On Fireworks AI

Deep-Dive into LLM Fine Tuning
10/6/2025

Deep-Dive into LLM Fine-Tuning

Production-Ready AI Agents with Optimized Inference with AWS AgentCore
10/2/2025

Production-Ready AI Agents with Optimized Inference with AWS AgentCore

Fireworks for Startups
10/1/2025

Launching Fireworks for Startups Program!

Audio September Release
9/25/2025

Audio September Release - Streaming Transcription V2 and Streaming Speaker Diarization

image
9/22/2025

Traces Are All You Need (to rank LLMs)

Understanding Embeddings and Reranking at Scale
9/12/2025

Understanding Embeddings and Reranking at Scale

DeepSeek V3.1
8/26/2025

DeepSeek V3.1 now on Fireworks AI!

Eval Driven Development with Claude Code
8/25/2025

LLM Eval Driven Development with Claude Code

Your AI Benchmark is Lying to You. Here's How We Caught It
8/15/2025

Your AI Benchmark is Lying to You. Here's How We Caught It

Test driven agent development with eval protocol
8/14/2025

Test-Driven Agent Development with Eval Protocol

Quality first: how Fireworks.ai is the go-to place for gpt-oss
8/12/2025

Quality first: how Fireworks.ai is the go-to place for gpt-oss

GPT-OSS Models
8/5/2025

Introducing OpenAI gpt-oss (20b & 120b)

Eval Protocol
8/4/2025

Announcing Eval Protocol

Qwen 3 Decoded
8/1/2025

Qwen3 Decoded: Choosing the Right Model For Your Task

Kimi K2 Deep Dive
8/1/2025

Kimi K2: Deep Dive into model performance and use-cases

Fireworks AI Batch API
7/31/2025

Run bulk async workloads with Fireworks Batch API

Real-world leaderboard
7/30/2025

Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job

Introducing Vision-Language Model Fine-tuning
7/29/2025

Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain

Notion
7/25/2025

How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI

VibeRL: When AI Trains AI
7/22/2025

VibeRL: When AI Trains AI

Fireworks Sagemaker
7/15/2025

Fireworks AI Now Supports Amazon SageMaker

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training
7/15/2025

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training

Understanding Function Calling: The Bridge to Agentic AI
7/11/2025

Understanding Function Calling: The Bridge to Agentic AI

Sentient & Fireworks Powers Decentralized AI At Viral Scale
7/11/2025

Sentient & Fireworks Powers Decentralized AI At Viral Scale

Using Model as Judge for Reward in Reinforcement Fine Tuning
7/10/2025

Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning

Flux Kontext on Fireworks
7/9/2025

Introducing FLUX.1 Kontext on Fireworks

Announcing Response API with MCP
6/22/2025

Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)

Fast Food Group Drive Thru
6/17/2025

Global Fast Food Group Transforms Drive-Thru with Real-Time Voice Intelligence with Fireworks

Announcing Virtual Cloud on Fireworks AI
6/16/2025

Build for Scale with Fireworks Virtual Cloud (GA)

Announcing Updated 3D FireOptimizer
6/14/2025

3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving

Updated Supervised Fine Tuning
6/13/2025

Introducing Supervised Fine Tuning V2

Updated Vision Model Platform
6/12/2025

Vision Model Platform Updates: Enhanced Capabilities and New Features

Announcing Experimentation Platform
6/11/2025

Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)

Announcing Voice Agent Platform Beta
6/10/2025

Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)

Reinforcement fine tuning announcement
6/9/2025

Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

Building a high-quality Synthetic Data Pipeline for Supervised Fine-Tuning
6/4/2025

Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning

Fireworks AI Dev Day 2025 Wrapped
5/29/2025

Fireworks DevDay 2025 Wrapped

 Independent benchmarking of Fireworks shows >250 tokens / second on DeepSeek V3
5/28/2025

FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4

Building an open-source Browser Agent on Fireworks AI
Demo
5/21/2025

Building an open-source Browser Agent on Fireworks AI

Fireworks Summer Audio Updates
5/20/2025

Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API

Agentic AI Systems
5/19/2025

Agentic AI Systems

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial
5/12/2025

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Qwen 3 on Fireworks AI
5/6/2025

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

Llama 4 Maverick on Fireworks AI
4/28/2025

Optimizing Llama 4 Maverick on Fireworks AI

RAG application using MongoDB Atlas and Fireworks AI
4/9/2025

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference
3/18/2025

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud
3/18/2025

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
3/12/2025

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action
2/14/2025

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical
2/7/2025

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining
2/5/2025

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

From text to task: Constrained generation for structured extraction in R1
2/1/2025

From text to task: Constrained generation for structured extraction in R1

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?
1/31/2025

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient
1/30/2025

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels
1/27/2025

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

DeepSeek R1: All you need to know 🐳
1/24/2025

DeepSeek R1: All you need to know 🐳

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality
1/23/2025

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI
1/22/2025

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Document inlining: Crossing the modality gap with Compound AI
12/23/2024

Document inlining: Crossing the modality gap with Compound AI

DeepSeek V3 just got vision capabilities!
12/18/2024

DeepSeek V3 just got vision capabilities!

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds
12/9/2024

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks
12/8/2024

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

Fireworks f1: A breakthrough in complex reasoning with Compound AI
11/15/2024

Fireworks f1: A breakthrough in complex reasoning with Compound AI

How Upwork and Fireworks deliver faster, smarter proposals for freelancers
11/11/2024

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

FLUX.1 on Fireworks: Fast, frugal, and flexible
10/22/2024

FLUX.1 on Fireworks: Fast, frugal, and flexible

FireAttention V3: Enabling AMD as a viable alternative for GPU inference
10/15/2024

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

Three projects, one platform: A developer's winning streak with Fireworks AI
10/14/2024

Three projects, one platform: A developer's winning streak with Fireworks AI

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
9/25/2024

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

How Enterprises are using Multimodal Models in production with Fireworks
9/25/2024

How Enterprises are using Multimodal Models in production with Fireworks

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency
9/18/2024

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

FireOptimizer: Customizing latency and quality for your production inference workload
8/30/2024

FireOptimizer: Customizing latency and quality for your production inference workload

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction
8/29/2024

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
8/14/2024

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

How Fireworks evaluates quantization precisely and interpretably
8/1/2024

How Fireworks evaluates quantization precisely and interpretably

Introducing Llama 3.1 inference endpoints in partnership with Meta
7/23/2024

Introducing Llama 3.1 inference endpoints in partnership with Meta

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
7/11/2024

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

How Cursor built Fast Apply using the Speculative Decoding API
6/23/2024

How Cursor built Fast Apply using the Speculative Decoding API

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference
6/20/2024

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=
6/17/2024

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM
6/3/2024

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

GPUs on-demand: Not serverless, not reserved, but some third thing
6/3/2024

GPUs on-demand: Not serverless, not reserved, but some third thing

Code Generation with Large Language Models - Fireworks AI Take
5/8/2024

Code Generation with Large Language Models - Fireworks AI Take

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell
5/6/2024

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
4/18/2024

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Getting Started with Stability’s API Powered by Fireworks
4/17/2024

Getting Started with Stability’s API Powered by Fireworks

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI
3/21/2024

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference
3/8/2024

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks Platform Spring 2024 Updates
3/1/2024

Fireworks Platform Spring 2024 Updates

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights
2/20/2024

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

Why do all LLMs need structured output modes?
2/20/2024

Why do all LLMs need structured output modes?

FireLLaVA: the first commercially permissive OSS LLaVA model
1/18/2024

FireLLaVA: the first commercially permissive OSS LLaVA model

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
1/8/2024

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

Fireworks Raises the Quality Bar with Function Calling Model and API Release
12/20/2023

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release
12/14/2023

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

LLM Inference Performance Benchmarking (Part 1)
11/3/2023

LLM Inference Performance Benchmarking (Part 1)

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!
11/2/2023

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance
10/27/2023

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Accelerating Code Completion with Fireworks Fast LLM Inference
10/11/2023

Accelerating Code Completion with Fireworks Fast LLM Inference

Fireworks.ai Now Available on LangChain Prompt Playground
10/2/2023

Fireworks.ai Now Available on LangChain Prompt Playground

Simplifying Code Infilling with Code Llama and Fireworks.ai
9/12/2023

Simplifying Code Infilling with Code Llama and Fireworks.ai

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning
8/29/2023

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform
8/17/2023

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Multi-Query Attention is All You Need
7/12/2023

Multi-Query Attention is All You Need