Skip to main content

Deepseek V3 0324, an updated version of the state-of-the-art DeepSeek V3 model, is now available. Try it now or read our DeepSeek quickstart!

Featured Blogs

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Discover how Fireworks AI Developer Cloud accelerates AI innovation with faster, optimized DeepSeek R1 deployments. Learn about new GPU options, improved speed, and enhanced developer tools for efficient, scalable AI solutions.

Read More
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

3/18/2025

View Article
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

3/12/2025

View Article
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

2/14/2025

View Article
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

2/7/2025

View Article
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

2/5/2025

View Article
From text to task: Constrained generation for structured extraction in R1

From text to task: Constrained generation for structured extraction in R1

2/1/2025

View Article
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

1/31/2025

View Article
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

1/30/2025

View Article
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

1/27/2025

View Article
DeepSeek R1: All you need to know 🐳

DeepSeek R1: All you need to know 🐳

1/24/2025

View Article
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality

1/23/2025

View Article
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

1/22/2025

View Article
Document inlining: Crossing the modality gap with Compound AI

Document inlining: Crossing the modality gap with Compound AI

12/23/2024

View Article
DeepSeek V3 just got vision capabilities!

DeepSeek V3 just got vision capabilities!

12/18/2024

View Article
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

12/9/2024

View Article
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

12/8/2024

View Article
Fireworks f1: A breakthrough in complex reasoning with Compound AI

Fireworks f1: A breakthrough in complex reasoning with Compound AI

11/15/2024

View Article
How Upwork and Fireworks deliver faster, smarter proposals for freelancers

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

11/11/2024

View Article
FLUX.1 on Fireworks: Fast, frugal, and flexible

FLUX.1 on Fireworks: Fast, frugal, and flexible

10/22/2024

View Article
FireAttention V3: Enabling AMD as a viable alternative for GPU inference

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

10/15/2024

View Article
Three projects, one platform: A developer's winning streak with Fireworks AI

Three projects, one platform: A developer's winning streak with Fireworks AI

10/14/2024

View Article
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

9/25/2024

View Article
How Enterprises are using Multimodal Models in production with Fireworks

How Enterprises are using Multimodal Models in production with Fireworks

9/25/2024

View Article
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

9/18/2024

View Article
FireOptimizer: Customizing latency and quality for your production inference workload

FireOptimizer: Customizing latency and quality for your production inference workload

8/30/2024

View Article
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

8/29/2024

View Article
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

8/14/2024

View Article
How Fireworks evaluates quantization precisely and interpretably

How Fireworks evaluates quantization precisely and interpretably

8/1/2024

View Article
Introducing Llama 3.1 inference endpoints in partnership with Meta

Introducing Llama 3.1 inference endpoints in partnership with Meta

7/23/2024

View Article
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

7/11/2024

View Article
How Cursor built Fast Apply using the Speculative Decoding API

How Cursor built Fast Apply using the Speculative Decoding API

6/23/2024

View Article
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

6/20/2024

View Article
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost

6/17/2024

View Article
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

6/3/2024

View Article
GPUs on-demand: Not serverless, not reserved, but some third thing

GPUs on-demand: Not serverless, not reserved, but some third thing

6/3/2024

View Article
Code Generation with Large Language Models - Fireworks AI Take

Code Generation with Large Language Models - Fireworks AI Take

5/8/2024

View Article
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

5/6/2024

View Article
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

4/18/2024

View Article
Getting Started with Stability’s API Powered by Fireworks

Getting Started with Stability’s API Powered by Fireworks

4/17/2024

View Article
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

3/21/2024

View Article
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

3/8/2024

View Article
Fireworks Platform Spring 2024 Updates

Fireworks Platform Spring 2024 Updates

3/1/2024

View Article
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

2/20/2024

View Article
Why do all LLMs need structured output modes?

Why do all LLMs need structured output modes?

2/20/2024

View Article
FireLLaVA: the first commercially permissive OSS LLaVA model

FireLLaVA: the first commercially permissive OSS LLaVA model

1/18/2024

View Article
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

1/8/2024

View Article
Fireworks Raises the Quality Bar with Function Calling Model and API Release

Fireworks Raises the Quality Bar with Function Calling Model and API Release

12/20/2023

View Article
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

12/14/2023

View Article
LLM Inference Performance Benchmarking (Part 1)

LLM Inference Performance Benchmarking (Part 1)

11/3/2023

View Article
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

11/2/2023

View Article
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

10/27/2023

View Article
Accelerating Code Completion with Fireworks Fast LLM Inference

Accelerating Code Completion with Fireworks Fast LLM Inference

10/11/2023

View Article
Fireworks.ai Now Available on LangChain Prompt Playground

Fireworks.ai Now Available on LangChain Prompt Playground

10/2/2023

View Article
Simplifying Code Infilling with Code Llama and Fireworks.ai

Simplifying Code Infilling with Code Llama and Fireworks.ai

9/12/2023

View Article
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

8/29/2023

View Article
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

8/17/2023

View Article
Multi-Query Attention is All You Need

Multi-Query Attention is All You Need

7/12/2023

View Article