Pricing to seamlessly scale from idea to enterprise

Start building in seconds, self-serve. Contact us for enterprise deployments with faster speeds, lower costs, and higher rate limits.

Get started

Serverless Inference

Get started in seconds with per token pricing, zero setup and no cold starts

See Pricing

Fine Tuning

Customize open models with your own data with minimal setup

See Pricing

On Demand Deployments

Pay per GPU second for faster speeds, higher rate limits, and lower costs at scale

See Pricing

Serverless Pricing

Pay per token, with high rate limits and postpaid billing. Get started with $1 in free credits.

Text and Vision Models

To review current Serverless Pricing for our most popular models, please visit the Docs page below for specific prices and details on Turbo + Priority tiers:

•https://docs.fireworks.ai/serverless/pricing

Specific details for each individual model can also be found on the model listing page from within the models library.

•Additionally, please note that cached input tokens are by default priced at 50% for all text and vision language models, unless otherwise specified.
Likewise, batch inference is priced at 50% of our serverless pricing for both input and output tokens. Learn more here.

Embeddings

Base model parameter count	$ / 1M input tokens
up to 150M	$0.008
150M - 350M	$0.016
Qwen3 8B	$0.1

Fine Tuning Pricing

Serve fine-tuned models for the same price as base models.

Supervised & Preference Fine Tuning

Priced per 1M training tokens

Base Model	LoRA SFT	LoRA DPO	Full Param SFT	Full Param DPO
Models up to 16B parameters	$0.50	$1.00	$1.00	$2.00
Models 16.1B - 80B	$3.00	$6.00	$6.00	$12.00
Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B)	$6.00	$12.00	$12.00	$24.00
Models >300B (e.g. DeepSeek V3, Kimi K2)	$10.00	$20.00	$20.00	$40.00

•SFT and DPO prices are shown in $ per 1M training tokens. Training tokens can be estimated with number of tokens in training dataset * number of epochs. Estimation should be multiplied by the average number conversation turns /2 for tuning with intermediate thinking traces.
•Please note that when fine-tuning with reasoning traces, including the reasoning_content field for assistant turns will increase the total number of tuned tokens because multi-turn conversations are unrolled into user, assistant, and thinking traces. For further details, please refer to example 2 in the documentation about SFT fine tuning.
•Fine-tuning with images (VLM supervised fine-tuning) is also billed per 1M tokens. See this FAQ on calculating image tokens.

Reinforcement Fine Tuning

Reinforcement fine tuning jobs are priced per GPU hour (billed per second), at the same price as Fireworks on-demand deployment. Please see the section below for details on RFT pricing.

On-Demand Pricing

Pay per GPU second, with no extra charges for start-up times

On demand deployments

GPU Type	Price ($) per hour
H100 80 GB GPU	$7.00
H200 141 GB GPU	$7.00
B200 180 GB GPU	$10.00
B300 288 GB GPU	$12.00

•For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.