Serverless 2.0 is live: control reliability & speed without reserved capacity. Get Started.

FIREWORKS TRAINING - NOW IN PREVIEW

Train and Deploy Your Models at the Frontier

Full-parameter training, custom loss functions, and frontier RL. All on the same infrastructure already serving production for Cursor, Vercel, and Genspark.

Own your model, own your future. Make your data your moat.

___________________________________________

Start for free

Talk to our team

THREE ENTRY POINTS, ONE PLATFORM

Start where you are. Go as far as you need.

Training & inference together on one platform. Choose the level of control you need.

Training Agent

Describe your task. Deploy your model.

For Product Teams & App Builders

An autonomous agent that handles the entire training pipeline. Describe your goal, upload your data. The Agent handles data prep, model selection, hyperparameter sweep, evals, and deployment. Currently LoRA-only.

Automated data cleaning (the most common failure point in fine-tuning)
Evals generated and run automatically
Model live on production inference the moment training completes

•Per-job pricing, confirmed upfront.

Get Early Access

Managed Training

You pick the method. We run everything else.

For ML Engineers

Choose SFT, DPO, or RFT. We handle GPU provisioning, distributed training, checkpointing, and scaling. Full-parameter training available for the deepest behavioral changes. One-click deployment to production.

RFT: Define a reward function instead of writing thousands of demonstrations. Outperforms SFT on complex agentic tasks.
Full-parameter: Behavioral changes LoRA can't produce, at any scale up to 1T parameters
Multi-LoRA: Serve hundreds of adapters on a shared base, no extra infrastructure cost per experiment
Chain stages: SFT → DPO → RFT, warm-start from any checkpoint

•Per token / per GPU-hour depending on job type.

Get Started

Training API

Your objective. Our compute. No constraints.

For ML Researchers & Advanced Teams

Bring your own training loop. Write custom loss functions, run RL at scale across regions, or chain SFT into RFT with full optimizer state preserved. LoRA and full-parameter supported.

Custom loss: Write any objective: GRPO, DRO, DAPO, or your own. No rigid recipes.
Frontier RL: Elastic rollout inference across regions with weight sync. No co-located hardware required.
Full-parameter up to Kimi K2.5 (1T parameters) on 64 B200s

Per-GPU-hour pricing, predictable cost for rollout-heavy workflows

Explore the Training API

FULL-PARAMETER TRAINING

From a single node to 1T parameters at frontier scale.

Most training services cap out at LoRA. LoRA is the right starting point: fast, cost-effective, and well-suited for rapid iteration. But LoRA and full-parameter training learn in meaningfully different ways. LoRA learns less and forgets less. Full-parameter produces behavioral changes that adapter-based methods can't reach.

Fireworks Training now supports full-parameter training across the model catalog, from small dense models on a single node up to Kimi K2.5 at 1 trillion parameters. We handle the distributed systems complexity at every scale (composable parallelism, precision tuning, and streaming RL pipelines) so you don't have to.

LoRA and full-parameter run on the same platform. You don't have to choose your ceiling when you start.

ONE INFRASTRUCTURE

What you train is exactly what you serve.

Fireworks runs production inference across DeepSeek, Kimi, Qwen, and others at scale. That experience is built into the training platform. The numerical edge cases that surface in frontier MoE models aren't hypothetical to us. We've debugged them in production.

A trained checkpoint becomes a live endpoint in seconds. No format conversion, no serving stack migration. Training and inference share the same kernels, the same hardware, so model behavior in training is model behavior in production.

We publish k3 KL divergence between training and inference checkpoints for every model in our catalog. All values below 0.01 are production-grade. If your training and serving stacks disagree numerically, your evals are measuring the gap between them, not model quality.

LoRA and full-parameter run on the same platform. You don't have to choose your ceiling when you start.

DEPLOY CUSTOM MODELS AT SCALE

Serve hundreds of fine-tuned models on a single GPU

Training a great model is only half the battle. Fireworks multi-LoRA deployments serve hundreds of personalized LoRA models per GPU, deployed in one click, with zero extra infrastructure cost. Your training flywheel produces better models. Multi-LoRA makes deploying them economically viable.

Serve personalized models at scale

100s

Models per GPU

1-click

Deployment

Zero

Extra infra cost

PROVEN IN PRODUCTION

AI teams building the future train on Fireworks

See more customer stories

"Fireworks has been a fantastic partner in building AI dev tools at Sourcegraph. Their fast, reliable model inference lets us focus on fine-tuning, AI-powered code search, and deep code context, making Cody the best AI coding assistant. They are responsive and ship at an amazing pace."

Beyang Liu | CTO at Sourcegraph

"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM

why did Cursor rollout Composer 2 with @FireworksAI_HQ?

"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

Federico Cassano | AI Researcher at Cursor

"Fireworks enabled us to own our AI journey, and unlock better quality in just four weeks."

Kay Zhu | CTO at Genspark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA."

Malte Ubl | CTO at Vercel

"By partnering with Fireworks to fine-tune models, we reduced latency from about 2 seconds to 350 milliseconds, significantly improving performance and enabling us to launch AI features at scale. That improvement is a game changer for delivering reliable, enterprise-scale AI."

Sarah Sachs | AI Lead at Notion

Beyang Liu | CTO at Sourcegraph

Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM

why did Cursor rollout Composer 2 with @FireworksAI_HQ?

Federico Cassano | AI Researcher at Cursor

"Fireworks enabled us to own our AI journey, and unlock better quality in just four weeks."

Kay Zhu | CTO at Genspark

FROM THE BLOG

Hard-won lessons from training open models at scale

3/28/2026

The Fine-Tuning Bottleneck Isn't the Algorithm

Developer Experience3/22/2026

Frontier RL Is Cheaper Than You Think

Developer Experience3/10/2024

Training-Inference Parity in MoE Models: Where Numerics Drift

COMPARISON TO ALTERNATIVES

Fireworks AI is the only platform to combine full-spectrum training and built-in inference

ALTERNATIVE	EXAMPLES	THE LIMITATION	FIREWORKS ADVANTAGE
Closed Models	OpenAI, Anthropic	No weight ownership. High cost. Zero portability. No retraining loop.	✅ Open-source models you fully own. Retrain and redeploy continuously.
Training-Only	Fragmented vendors	Train here, serve elsewhere. Every iteration pays a migration tax.	✅ Unified platform. Training completes → model is live → collect data → retrain.
Cloud-Native	AWS, GCP	Training and inference are separate silos. No open model expertise.	✅ Model-agnostic. 1-click hot-loading from training to inference.
Self-managed	PyTorch distributed	3-6 months of infra work before your first model trains. Ongoing ops burden.	✅ Deploy on day one. Engineers build applications, not dev ops.

FAQ

Common Questions

What does "preview" mean? Is this production-ready?

Preview means the platform is live and serving real production workloads today. Cursor, Vercel, and Genspark are all running on Fireworks Training in production. It also means pricing and some features are still stabilizing before GA. Enterprise SLAs and GA timelines are available on request. Talk to our team if you have specific requirements.

Is my training data used to train Fireworks models?

No. Your data is used solely to fine-tune your models. We do not use nor share your training data for any purposes, and stand firmly by our zero data retention policy.

What's the difference between Training Agent, Managed Training, and Training API?

Training Agent is fully automated: describe your goal, upload data, get a deployed model. No ML knowledge required. Currently LoRA-only.
Managed Training gives you control over the training method (SFT, DPO, or RFT) while we handle all infrastructure. Supports full-parameter training.
Training API gives you full algorithmic control: bring your own training loop, write custom loss functions, run frontier RL. For advanced ML teams and researchers. See the comparison table above for a full breakdown.

How long does a training run take?

It depends on model size, dataset size, and training method. A small LoRA job on Qwen3 8B with a few thousand examples typically completes in under an hour. Larger full-parameter runs on frontier models take longer. See the cost estimator in our docs for estimates by scenario.

If we've already built our own training stack, can we use Fireworks for RL rollouts as a standalone service?

Yes. Rollout inference is available as its own product — bring your trainer, use Fireworks for rollout serving, policy updates, and fleet orchestration. It's the same inference stack already running Cursor's Composer 2 in production, exposed as a rollout fleet that plugs into any trainer (open-source frameworks or in-house). Teams typically see large end-to-end speedups versus the inference engines bundled with most training stacks, plus bounded policy-staleness for async RL via delta-compressed weight updates. Learn more about RL rollouts →

How is RL rollouts different from Managed Training?

Managed Training RFT is a fully managed workflow — you bring a dataset and reward function, Fireworks runs the entire training loop including rollouts, checkpointing, and deployment. RL rollouts is the rollout layer offered as a standalone service, for teams that already have a trainer and want production-grade rollout inference, elastic fleet orchestration, and hot-loaded policy updates without migrating off their existing stack. If you're earlier in the process and don't have a training loop yet, Managed Training is the best path. If you've outgrown it or built your own, RL rollouts is the next step.

START BUILDING TODAY

Build your continuously learning flywheel today

Whether you're looking for fully self-serve (agentic) tooling, a managed service, or need the full control and granularity of our Training API, the Fireworks AI Training platform can help you to train any open-source model to deliver frontier quality performance.

Get started