| Base model | $ / 1M tokens |
|---|---|
| Less than 4B parameters | $0.10 |
| 4B - 16B parameters | $0.20 |
| More than 16B parameters | $0.90 |
| MoE 0B - 56B parameters (e.g. Mixtral 8x7B) | $0.50 |
| MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B) | $1.20 |
| DeepSeek V3 family | $0.90 |
| DeepSeek R1 0528 | $1.35 input, $5.4 output |
| GLM-4.5, GLM-4.6 | $0.55 input, $2.19 output |
| Meta Llama 3.1 405B | $3.00 |
| Meta Llama 4 Maverick (Basic) | $0.22 input, $0.88 output |
| Meta Llama 4 Scout (Basic) | $0.15 input, $0.60 output |
| Qwen3 235B Family and GLM-4.5 Air | $0.22 input, $0.88 output |
| Qwen3 30B and Qwen Coder Flash | $0.15 input, $0.60 output |
| Kimi K2 Instruct | $0.60 input, $2.50 output |
| Qwen3 Coder 480B | $0.45 input, $1.80 output |
| OpenAI gpt-oss-120b | $0.15 input, $0.60 output |
| OpenAI gpt-oss-20b | $0.07 input, $0.30 output |
Discounts for prompt caching are available for enterprise deployments. Contact us to learn more.
Batch inference is priced at 50% of our serverless pricing for both input and output tokens. Learn more here.
Pay per second of audio input
| Model | $ / audio minute (billed per second) |
|---|---|
| Whisper-v3-large | $0.0015 |
| Whisper-v3-large-turbo | $0.0009 |
| Streaming ASR v1 | $0.0032 |
| Streaming ASR v2 | $0.0035 |
| Image model name | $ / step | Approx $ / image |
|---|---|---|
| All Non-Flux Models (SDXL, Playground, etc) | $0.00013 per step ($0.0039 per 30 step image) | $0.0002 per step ($0.006 per 30 step image) |
| FLUX.1 [dev] | $0.0005 per step ($0.014 per 28 step image) | N/A on serverless |
| FLUX.1 [schnell] | $0.00035 per step ($0.0014 per 4 step image) | N/A on serverless |
| FLUX.1 Kontext Pro | $0.04 per image | N/A |
| FLUX.1 Kontext Max | $0.08 per image | N/A |
All models besides the Flux Kontext models are charged by the number of inference steps (denoising iterations). The Flux Kontext models are charged a flat rate per generated image.
| Base model parameter count | $ / 1M input tokens |
|---|---|
| up to 150M | $0.008 |
| 150M - 350M | $0.016 |
| Base Model | $ / 1M training tokens |
|---|---|
| Models up to 16B parameters | $0.50 |
| Models 16.1B - 80B | $3.00 |
| Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B) | $6.00 |
| Models >300B (e.g. DeepSeek V3, Kimi K2) | $10.00 |
Fine-tuning with images (VLM supervised fine-tuning) is billed per token as well. See this FAQ on calculating image tokens.
There is no additional cost for storing LoRA fine-tunes up to the quota for an account.
| GPU Type | $ / hour (billed per second) |
|---|---|
| A100 80 GB GPU | $2.90 |
| H100 80 GB GPU | $4.00 |
| H200 141 GB GPU | $6.00 |
| B200 180 GB GPU | $9.00 |
For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.