Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Pricing to seamlessly scale from idea to enterprise

Start building in seconds, self-serve. Contact us for enterprise deployments with faster speeds, lower costs, and higher rate limits.

Serverless Pricing

Pay per token, with high rate limits and postpaid billing. Get started with $1 in free credits.

Text and Vision

Base model$ / 1M tokens
Less than 4B parameters$0.10
4B - 16B parameters$0.20
More than 16B parameters$0.90
MoE 0B - 56B parameters (e.g. Mixtral 8x7B)$0.50
MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B)$1.20
DeepSeek V3 family$0.56 input, $1.68 output
DeepSeek R1 0528$1.35 input, $5.4 output
GLM-4.5, GLM-4.6$0.55 input, $2.19 output
Meta Llama 3.1 405B$3.00
Meta Llama 4 Maverick (Basic)$0.22 input, $0.88 output
Meta Llama 4 Scout (Basic)$0.15 input, $0.60 output
Qwen3 235B Family, GLM-4.5 Air$0.22 input, $0.88 output
Qwen3 30B, Qwen Coder Flash$0.15 input, $0.60 output
Kimi K2 Instruct, Kimi K2 Thinking$0.60 input, $2.50 output
Qwen3 Coder 480B$0.45 input, $1.80 output
OpenAI gpt-oss-120b$0.15 input, $0.60 output
OpenAI gpt-oss-20b$0.07 input, $0.30 output

Discounts for prompt caching are available for enterprise deployments. Contact us to learn more.

Batch inference is priced at 50% of our serverless pricing for both input and output tokens. Learn more here.

Speech to Text (STT)

Pay per second of audio input

Model$ / audio minute (billed per second)
Whisper-v3-large$0.0015
Whisper-v3-large-turbo$0.0009
Streaming ASR v1$0.0032
Streaming ASR v2$0.0035
  • Diarization adds a 40% surcharge to pricing
  • Batch API prices are reduced 40%

Image Generation

Image model name$ / step Approx $ / image
All Non-Flux Models (SDXL, Playground, etc)$0.00013 per step ($0.0039 per 30 step image)$0.0002 per step ($0.006 per 30 step image)
FLUX.1 [dev]$0.0005 per step ($0.014 per 28 step image)N/A on serverless
FLUX.1 [schnell]$0.00035 per step ($0.0014 per 4 step image)N/A on serverless
FLUX.1 Kontext Pro$0.04 per imageN/A
FLUX.1 Kontext Max$0.08 per imageN/A

All models besides the Flux Kontext models are charged by the number of inference steps (denoising iterations). The Flux Kontext models are charged a flat rate per generated image.

Embeddings

Base model parameter count$ / 1M input tokens
up to 150M$0.008
150M - 350M$0.016
Qwen3 8B$0.1

Fine Tuning Pricing

Serve fine-tuned models for the same price as base models.

Supervised & Preference Fine Tuning

Priced per 1M training tokens

Base ModelSupervised Fine TuningDirect Preference Optimization
Models up to 16B parameters$0.50$1.00
Models 16.1B - 80B$3.00$6.00
Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B)$6.00$12.00
Models >300B (e.g. DeepSeek V3, Kimi K2)$10.00$20.00

SFT and DPO prices are shown in $ per 1M training tokens. Training tokens can be estimated with number of tokens in training dataset * number of epochs. Estimation should be multiplied by the average number conversation turns /2 for tuning with intermediate thinking traces.

Fine-tuning with images (VLM supervised fine-tuning) is also billed per 1M tokens. See this FAQ on calculating image tokens.

Reinforcement Fine Tuning

Reinforcement fine tuning jobs are priced per GPU hour (billed per second), at the same price as Fireworks on-demand deployment. Please see the section below for details on RFT pricing.

On-Demand Pricing

Pay per GPU second, with no extra charges for start-up times

On demand deployments

GPU Type$ / hour (billed per second)
A100 80 GB GPU$2.90
H100 80 GB GPU$4.00
H200 141 GB GPU$6.00
B200 180 GB GPU$9.00

For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.