Base model | $ / 1M tokens |
---|---|
Less than 4B parameters | $0.10 |
4B - 16B parameters | $0.20 |
More than 16B parameters | $0.90 |
MoE 0B - 56B parameters (e.g. Mixtral 8x7B) | $0.50 |
MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B) | $1.20 |
DeepSeek V3 | $0.90 |
DeepSeek R1 (Fast) | $3.00 input, $8.00 output |
DeepSeek R1 0528 (Fast) | $3.00 input, $8.00 output |
GLM-4.5 and DeepSeek R1 (Basic) | $0.55 input, $2.19 output |
Meta Llama 3.1 405B | $3.00 |
Meta Llama 4 Maverick (Basic) | $0.22 input, $0.88 output |
Meta Llama 4 Scout (Basic) | $0.15 input, $0.60 output |
Qwen3 235B Family and GLM-4.5 Air | $0.22 input, $0.88 output |
Qwen3 30B and Qwen Coder Flash | $0.15 input, $0.60 output |
Kimi K2 Instruct | $0.60 input, $2.50 output |
Qwen3 Coder 480B | $0.45 input, $1.80 output |
OpenAI gpt OSS 120b | $0.15 input, $0.60 output |
OpenAI gpt OSS 20b | $0.07 input, $0.30 output |
Discounts for prompt caching are available for enterprise deployments. Contact us to learn more.
Batch inference is priced at 50% of our serverless pricing for both input and output tokens. Learn more here.
Pay per second of audio input
Model | $ / audio minute (billed per second) |
---|---|
Whisper-v3-large | $0.0015 |
Whisper-v3-large-turbo | $0.0009 |
Streaming ASR v1 | $0.0032 |
Streaming ASR v2 | $0.0035 |
Image model name | $ / step | Approx $ / image |
---|---|---|
All Non-Flux Models (SDXL, Playground, etc) | $0.00013 per step ($0.0039 per 30 step image) | $0.0002 per step ($0.006 per 30 step image) |
FLUX.1 [dev] | $0.0005 per step ($0.014 per 28 step image) | N/A on serverless |
FLUX.1 [schnell] | $0.00035 per step ($0.0014 per 4 step image) | N/A on serverless |
FLUX.1 Kontext Pro | $0.04 per image | N/A |
FLUX.1 Kontext Max | $0.08 per image | N/A |
All models besides the Flux Kontext models are charged by the number of inference steps (denoising iterations). The Flux Kontext models are charged a flat rate per generated image.
Base model parameter count | $ / 1M input tokens |
---|---|
up to 150M | $0.008 |
150M - 350M | $0.016 |
Base Model | $ / 1M training tokens |
---|---|
Models up to 16B parameters | $0.50 |
Models 16.1B - 80B | $3.00 |
Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B) | $6.00 |
Models >300B (e.g. DeepSeek V3, Kimi K2) | $10.00 |
Fine-tuning with images (VLM supervised fine-tuning) is billed per token as well. See this FAQ on calculating image tokens.
There is no additional cost for storing LoRA fine-tunes up to the quota for an account.
GPU Type | $ / hour (billed per second) |
---|---|
A100 80 GB GPU | $2.90 |
H100 80 GB GPU | $4.00 |
H200 141 GB GPU | $6.00 |
B200 180 GB GPU | $9.00 |
AMD MI300X | $4.99 |
For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.