DeepSeek V4 Pro is Live → Try it now.

NOW AVAILABLE IN PREVIEW

RL Rollouts at Frontier Speed & Scale

RL is ~80% inference. Run yours on the Fireworks Training Platform.

___________________________________________

In synchronous RL, rollouts consume 70–80% of wall-clock time. Inference, not the trainer, is the dominant lever on iteration speed and time to production.

Fireworks operates one of the fastest inference platforms in production — the same stack behind Cursor's Composer 2 — and exposes it as a rollout fleet you can plug any trainer into.

Vercel Dark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Malte Ubl, CTO at Vercel
Malte Ubl | CTO at Vercel
rLLM

"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

rLLM
Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM
Cursor logo dark
why did Cursor rollout Composer 2 with @FireworksAI_HQ?

"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

federico cassano
Federico Cassano | AI Researcher at Cursor
Vercel Dark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Malte Ubl, CTO at Vercel
Malte Ubl | CTO at Vercel
rLLM

"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

rLLM
Kyle Montgomery & Sijun Tan | Core Contributors, rLLM at rLLM
Cursor logo dark
why did Cursor rollout Composer 2 with @FireworksAI_HQ?

"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

federico cassano
Federico Cassano | AI Researcher at Cursor
Vercel Dark

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Malte Ubl, CTO at Vercel
Malte Ubl | CTO at Vercel
Production-scale Inference Infra

RL Rollouts - At a Glance

  • Live in production today, powering Cursor's Composer 2
  • 1T-parameter full-parameter rollouts on B200/B300
  • Bring any trainer: open-source or in-house
  • Same kernels in training and serving — checkpoint to live endpoint in seconds

Train where you want. Roll out on Fireworks.

Whether you want a fully managed workflow or a Tinker-compatible setup, Fireworks can power the rollout layer: Bring your trainer. Fireworks handles rollout inference, policy updates, and fleet orchestration.

Why Choose Fireworks for RL Rollouts

How It Works

  1. Train with your preferred stack.
  2. Publish checkpoints to shared object storage.
  3. Signal Fireworks when a new policy is ready.
  4. Fireworks hot-loads the policy across the rollout fleet.
  5. Continue sampling through standard OpenAI-compatible APIs. Multi-turn rollouts get KV cache reuse across turns via session affinity headers. Choose your sync vs. async tradeoff; staleness stays bounded and predictable.

Because training and serving share one engine, your rollout fleet runs at production-inference speed from day one.

GET STARTED

Ready to run RL at scale?

Tell us about your trainer, model size, and current rollout latency → we will size the speedup and help design the right architecture for your workload.