Excited to announce that Fireworks Training is now in preview. Train and deploy frontier models on one platform. Learn more

FIREWORKS TRAINING - NOW IN PREVIEW

Train and Deploy Your Models at the Frontier

Full-parameter training, custom loss functions, and frontier RL. All on the same infrastructure already serving production for Cursor, Vercel, and Genspark.

Own Your Model, Own Your Future. Make your data your moat.

___________________________________________

THREE ENTRY POINTS, ONE PLATFORM

Start where you are. Go as far as you need.

Training & inference together on one platform. Choose the level of control you need.

FULL-PARAMETER TRAINING

From a single node to 1T parameters at frontier scale.

Most training services cap out at LoRA. LoRA is the right starting point: fast, cost-effective, and well-suited for rapid iteration. But LoRA and full-parameter training learn in meaningfully different ways. LoRA learns less and forgets less. Full-parameter produces behavioral changes that adapter-based methods can't reach.

Fireworks Training now supports full-parameter training across the model catalog, from small dense models on a single node up to Kimi K2.5 at 1 trillion parameters. We handle the distributed systems complexity at every scale (composable parallelism, precision tuning, and streaming RL pipelines) so you don't have to.

LoRA and full-parameter run on the same platform. You don't have to choose your ceiling when you start.

ONE INFRASTRUCTURE

What you train is exactly what you serve.

Fireworks runs production inference across DeepSeek, Kimi, Qwen, and others at scale. That experience is built into the training platform. The numerical edge cases that surface in frontier MoE models aren't hypothetical to us. We've debugged them in production.

A trained checkpoint becomes a live endpoint in seconds. No format conversion, no serving stack migration. Training and inference share the same kernels, the same hardware, so model behavior in training is model behavior in production.

We publish k3 KL divergence between training and inference checkpoints for every model in our catalog. All values below 0.01 are production-grade. If your training and serving stacks disagree numerically, your evals are measuring the gap between them, not model quality.

Fireworks Training now supports full-parameter training across the model catalog, from small dense models on a single node up to Kimi K2.5 at 1 trillion parameters. We handle the distributed systems complexity at every scale (composable parallelism, precision tuning, and streaming RL pipelines) so you don't have to.

LoRA and full-parameter run on the same platform. You don't have to choose your ceiling when you start.

DEPLOY CUSTOM MODELS AT SCALE

Serve hundreds of fine-tuned models on a single GPU

Training a great model is only half the battle. Fireworks multi-LoRA deployments serve hundreds of personalized LoRA models per GPU, deployed in one click, with zero extra infrastructure cost. Your training flywheel produces better models. Multi-LoRA makes deploying them economically viable.

Serve personalized models at scale

100s

Models per GPU

1-click

Deployment

Zero

Extra infra cost
Multi-lora
PROVEN IN PRODUCTION

AI teams building the future train on Fireworks

Sourcegraph

"Fireworks has been a fantastic partner in building AI dev tools at Sourcegraph. Their fast, reliable model inference lets us focus on fine-tuning, AI-powered code search, and deep code context, making Cody the best AI coding assistant. They are responsive and ship at an amazing pace."

Beyang Liu Testimonial
Beyang Liu | CTO at Sourcegraph
rLLM

"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

rLLM
Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM
Cursor logo dark
why did Cursor rollout Composer 2 with @FireworksAI_HQ?

"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

federico cassano
Federico Cassano | AI Researcher at Cursor
Vercel Dark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Malte Ubl, CTO at Vercel
Malte Ubl | CTO at Vercel
Notion logo dark

"By partnering with Fireworks to fine-tune models, we reduced latency from about 2 seconds to 350 milliseconds, significantly improving performance and enabling us to launch AI features at scale. That improvement is a game changer for delivering reliable, enterprise-scale AI"

Sarah Sachs
Sarah Sachs | AI Lead at Notion
Sourcegraph

"Fireworks has been a fantastic partner in building AI dev tools at Sourcegraph. Their fast, reliable model inference lets us focus on fine-tuning, AI-powered code search, and deep code context, making Cody the best AI coding assistant. They are responsive and ship at an amazing pace."

Beyang Liu Testimonial
Beyang Liu | CTO at Sourcegraph
rLLM

"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

rLLM
Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM
Cursor logo dark
why did Cursor rollout Composer 2 with @FireworksAI_HQ?

"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

federico cassano
Federico Cassano | AI Researcher at Cursor
Vercel Dark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Malte Ubl, CTO at Vercel
Malte Ubl | CTO at Vercel
COMPARISON TO ALTERNATIVES

Fireworks AI is the only platform to combine full-spectrum training and built-in inference

ALTERNATIVEEXAMPLESTHE LIMITATIONFIREWORKS ADVANTAGE
Closed ModelsOpenAI, AnthropicNo weight ownership. High cost. Zero portability. No retraining loop.✅ Open-source models you fully own. Retrain and redeploy continuously.
Training-OnlyFragmented vendorsTrain here, serve elsewhere. Every iteration pays a migration tax.✅ Unified platform. Training completes → model is live → collect data → retrain.
Cloud-NativeAWS, GCPTraining and inference are separate silos. No open model expertise.✅ Model-agnostic. 1-click hot-loading from training to inference.
Self-managedPyTorch distributed3-6 months of infra work before your first model trains. Ongoing ops burden.✅ Deploy on day one. Engineers build applications, not dev ops.
FAQ

Common Questions

What does "preview" mean? Is this production-ready?

Preview means the platform is live and serving real production workloads today. Cursor, Vercel, and Genspark are all running on Fireworks Training in production. It also means pricing and some features are still stabilizing before GA. Enterprise SLAs and GA timelines are available on request. Talk to our team if you have specific requirements.



Is my training data used to train Fireworks models?

No. Your data is used solely to fine-tune your models. We do not use nor share your training data for any purposes, and stand firmly by our zero data retention policy.


What's the difference between Training Agent, Managed Training, and Training API?


Training Agent is fully automated: describe your goal, upload data, get a deployed model. No ML knowledge required. Currently LoRA-only.

Managed Training gives you control over the training method (SFT, DPO, or RFT) while we handle all infrastructure. Supports full-parameter training.

Training API gives you full algorithmic control: bring your own training loop, write custom loss functions, run frontier RL. For advanced ML teams and researchers. See the comparison table above for a full breakdown.


How long does a training run take?

It depends on model size, dataset size, and training method. A small LoRA job on Qwen3 8B with a few thousand examples typically completes in under an hour. Larger full-parameter runs on frontier models take longer. See the cost estimator in our docs for estimates by scenario.

START BUILDING TODAY

Build your continuously learning flywheel today

Whether you're looking for fully self-serve (agentic) tooling, or in need of a fully managed service, the Fireworks AI training platform will help train open-source models to deliver frontier quality performance. Continuously improving production models have arrived.