GLM 5 is now live on Fireworks. Try It Today.
Platform
Models
Developers
Pricing
Partners
Resources
Company
Log In
Get Started
Fireworks Blog
Fireworks Acquires Hathora to Accelerate Global Compute Orchestration
Read More
Case Studies
Model Releases
Benchmarks
Partner Announcements
Developer Experience
Company News
Agentic
Use Cases
Multimodal
Training
Filters
2/3/2026
The Benchmark Gap: What It Takes to Ship Kimi K2.5
1/30/2026
The Missing Piece of the OpenClaw Mania: Truly ‘Own Your AI’ with Fireworks AI
1/27/2026
Build powerful agents on OSS models with Blazing Fast Inference on Fireworks
1/26/2026
Kimi K2.5 is Live on Fireworks: Vibe Coding, Agents, and Full-Parameter RFT
1/23/2026
Turning Production Logs into Evaluation Datasets: A Data-Driven Approach
12/31/2025
DPO, your simplest RL pipeline with two rollouts
12/17/2025
Self-Improving Agents, Powered by Your Evals
12/15/2025
NVIDIA Nemotron 3 Nano on Fireworks: The Engine for Next-Generation AI Agents
12/10/2025
Best Practices for Multi-Turn RL
12/4/2025
Turn Your LLM into a Calibrated Classifier for $2
12/2/2025
Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI
11/24/2025
Fireworks Expands AWS Alliance: Strategic Collaboration Agreement + GenAI Competency
11/20/2025
Eval Protocol: RL on your agents, in any environment
11/19/2025
Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in AI at Scale
11/19/2025
50 Trillion Tokens Per Day: The State of Agent Environments
11/10/2025
Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier closed models
11/9/2025
Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows
Product
11/3/2025
40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model with Open Models, Speculative Decoding and Reinforcement Fine Tuning on Fireworks
10/31/2025
Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction
10/28/2025
We raised $250M To Help Enterprises Own Their AI
10/27/2025
Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI
10/23/2025
Deployment Shapes: One-Click Deployment Configured For You
10/20/2025
Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs
10/15/2025
LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama
10/9/2025
Announcing Embeddings and Reranking On Fireworks AI
10/6/2025
Deep-Dive into LLM Fine-Tuning
10/2/2025
Production-Ready AI Agents with Optimized Inference with AWS AgentCore
10/1/2025
Launching Fireworks for Startups Program!
9/22/2025
Traces Are All You Need (to rank LLMs)
9/12/2025
Understanding Embeddings and Reranking at Scale
8/26/2025
DeepSeek V3.1 now on Fireworks AI!
8/25/2025
LLM Eval Driven Development with Claude Code
8/15/2025
Your AI Benchmark is Lying to You. Here's How We Caught It
8/14/2025
Test-Driven Agent Development with Eval Protocol
8/12/2025
Quality first: how Fireworks.ai is the go-to place for gpt-oss
8/5/2025
Introducing OpenAI gpt-oss (20b & 120b)
8/4/2025
Announcing Eval Protocol
8/1/2025
Qwen3 Decoded: Choosing the Right Model For Your Task
8/1/2025
Kimi K2: Deep Dive into model performance and use-cases
7/31/2025
Run bulk async workloads with Fireworks Batch API
7/30/2025
Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job
7/29/2025
Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain
7/25/2025
How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI
7/22/2025
A Deep Dive into MLA training/inference difference and why QK-Clip from Kimi is such an elegant idea
7/22/2025
VibeRL: When AI Trains AI
7/17/2025
Sentient & Fireworks Powers Decentralized AI At Viral Scale
7/15/2025
Fireworks AI Now Supports Amazon SageMaker
7/15/2025
Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training
7/11/2025
Understanding Function Calling: The Bridge to Agentic AI
7/10/2025
Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning
7/9/2025
Introducing FLUX.1 Kontext on Fireworks
6/22/2025
Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)
6/16/2025
Build for Scale with Fireworks Virtual Cloud (GA)
6/14/2025
3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving
6/13/2025
Introducing Supervised Fine Tuning V2
6/12/2025
Vision Model Platform Updates: Enhanced Capabilities and New Features
6/11/2025
Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)
6/9/2025
Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models
6/4/2025
Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning
5/29/2025
Fireworks DevDay 2025 Wrapped
5/28/2025
FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4
Demo
5/21/2025
Building an open-source Browser Agent on Fireworks AI
5/19/2025
Agentic AI Systems
5/12/2025
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial
5/6/2025
Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale
4/28/2025
Optimizing Llama 4 Maverick on Fireworks AI
4/9/2025
Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas
3/18/2025
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference
3/18/2025
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud
3/12/2025
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
2/14/2025
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action
2/7/2025
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical
2/5/2025
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining
2/1/2025
From text to task: Constrained generation for structured extraction in R1
1/31/2025
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?
1/30/2025
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient
1/27/2025
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels
1/24/2025
DeepSeek R1: All you need to know 🐳
1/22/2025
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI
12/18/2024
DeepSeek V3 just got vision capabilities!
12/9/2024
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds
12/8/2024
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks
11/15/2024
Fireworks f1: A breakthrough in complex reasoning with Compound AI
11/11/2024
How Upwork and Fireworks deliver faster, smarter proposals for freelancers
10/22/2024
FLUX.1 on Fireworks: Fast, frugal, and flexible
10/15/2024
FireAttention V3: Enabling AMD as a viable alternative for GPU inference
10/14/2024
Three projects, one platform: A developer's winning streak with Fireworks AI
9/25/2024
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
9/25/2024
How Enterprises are using Multimodal Models in production with Fireworks
9/18/2024
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency
8/30/2024
FireOptimizer: Customizing latency and quality for your production inference workload
8/29/2024
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction
8/14/2024
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
8/1/2024
How Fireworks evaluates quantization precisely and interpretably
7/23/2024
Introducing Llama 3.1 inference endpoints in partnership with Meta
7/11/2024
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
6/23/2024
How Cursor built Fast Apply using the Speculative Decoding API
6/20/2024
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference
6/17/2024
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=
6/3/2024
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM
6/3/2024
GPUs on-demand: Not serverless, not reserved, but some third thing
5/8/2024
Code Generation with Large Language Models - Fireworks AI Take
5/6/2024
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell
4/18/2024
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
4/17/2024
Getting Started with Stability’s API Powered by Fireworks
3/21/2024
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI
3/8/2024
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference
3/1/2024
Fireworks Platform Spring 2024 Updates
2/20/2024
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights
2/20/2024
Why do all LLMs need structured output modes?
1/18/2024
FireLLaVA: the first commercially permissive OSS LLaVA model
1/8/2024
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
12/20/2023
Fireworks Raises the Quality Bar with Function Calling Model and API Release
12/14/2023
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release
11/3/2023
LLM Inference Performance Benchmarking (Part 1)
11/2/2023
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!
10/27/2023
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance
10/11/2023
Accelerating Code Completion with Fireworks Fast LLM Inference
10/2/2023
Fireworks.ai Now Available on LangChain Prompt Playground
9/12/2023
Simplifying Code Infilling with Code Llama and Fireworks.ai
8/29/2023
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning
8/17/2023
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform
7/12/2023
Multi-Query Attention is All You Need