Keep your trainer. Scale your rollouts on Fireworks.

The only dedicated rollout inference for teams running their own RL. Get elastic capacity, production-grade inference, and sub-minute weight updates.

Read the docs

Talk to our team

Frontier Coding Intelligence

Cursor Improves Composer with Fireworks

Cursor uses Fireworks to generate large volumes of realistic coding trajectories for reinforcement learning. Fireworks handles the high-throughput, distributed inference underneath the training loop, so Cursor’s researchers can focus on improving Composer’s data, rewards, and behavior.

Learn more

Why run rollouts on Fireworks

Your trainer stays. The rollouts run on Fireworks.

Point your rollout fleet at Fireworks through OpenAI-compatible sampling and a weight update API, and we run the inference layer for RL at scale. You own training. We run the rollouts: production-grade serving, sub-minute weight updates, and numerics built for RL.

WHAT YOU GET:

Faster, less expensive runs with optimized inference

Run rollouts with the same industry-leading Fireworks inference stack for maximum throughput.

Delta-compressed updates

Refresh rollout fleets with deltas, not full checkpoints. Typically ~2% of the model, swapped in under a minute.

Async RL, no idle GPUs

Keep trainer and rollout workers running continuously instead of waiting on each other.

Multi-region capacity

Scale across regions instead of standing up one large, co-located cluster.

Share capacity across product lines

Use capacity across product lines like RL rollouts, production inference, or training to maximize utilization.

Focus on RL, not rollout infrastructure.

.	Build It Yourself	Fireworks
Capacity	Manage massive, co-located clusters for training and rollouts	Elastic rollouts capacity across regions
Utilization	GPUs idle during synchronization	Async RL keeps rollouts running
Weight Updates	Build checkpoint distribution yourself	Delta-compressed updates built in
Numerics	Maintain alignment yourself	RL-specific sampling alignment features to help ensure model updates are verifiable.
Team	Dedicated inference platform work	Managed rollout infrastructure

Customer validation

Frontier teams run their rollouts on Fireworks

"Fireworks has been a key partner in helping us train and serve the models behind Cursor at scale. Their platform supports the high-throughput RL workloads and production inference required for Composer, giving us the speed, reliability, and efficiency to keep pushing the frontier of AI coding."

Sualeh Asif | CPO at Cursor

"By partnering with Fireworks to fine-tune models, we reduced latency from about 2 seconds to 350 milliseconds, significantly improving performance and enabling us to launch AI features at scale. That improvement is a game changer for delivering reliable, enterprise-scale AI."

Sarah Sachs | AI Lead at Notion

"I mean, the other alternative is we would build one in-house, but we have finite engineers like everybody else. We would prefer to have engineers make training more efficient and more precise rather than spin up an inference effort."

Federico Cassano | AI Researcher at Cursor

“On Fireworks, combining open-source worker models with frontier tool use and post-training closes much of the gap to frontier performance on Legal Agent Benchmark, while improving cost efficiency and system controllability.”

Niko Grupen | Head of Applied Research at Harvey

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA."

Malte Ubl | CTO at Vercel

"Fireworks has been a key partner in helping us train and serve the models behind Cursor at scale. Their platform supports the high-throughput RL workloads and production inference required for Composer, giving us the speed, reliability, and efficiency to keep pushing the frontier of AI coding."

Sualeh Asif | CPO at Cursor

"By partnering with Fireworks to fine-tune models, we reduced latency from about 2 seconds to 350 milliseconds, significantly improving performance and enabling us to launch AI features at scale. That improvement is a game changer for delivering reliable, enterprise-scale AI."

Sarah Sachs | AI Lead at Notion

"I mean, the other alternative is we would build one in-house, but we have finite engineers like everybody else. We would prefer to have engineers make training more efficient and more precise rather than spin up an inference effort."

Federico Cassano | AI Researcher at Cursor

“On Fireworks, combining open-source worker models with frontier tool use and post-training closes much of the gap to frontier performance on Legal Agent Benchmark, while improving cost efficiency and system controllability.”

Niko Grupen | Head of Applied Research at Harvey

Close the loop with Training

Or run the whole loop on Fireworks

Not running your own trainer yet? Fireworks also runs fully managed RL and a Tinker-compatible SDK, so you can train and serve on one stack. Same kernels, same quantization, no handoff. What you train is what you serve.

Explore Training

Fireworks Blog

Go deeper on the engineering

Developer Experience3/22/2026

Frontier RL Is Cheaper Than You Think

Developer Experience3/10/2024

Training-Inference Parity in MoE Models: Where Numerics Drift

3/28/2026

The Fine-Tuning Bottleneck Isn't the Algorithm

FAQ

Common Questions

Can I keep my own trainer?

Yes. Keep your trainer where it is, upload checkpoints to shared object storage, and use Fireworks for rollout serving and weight-update orchestration. Bring-your-own-trainer exposes OpenAI-compatible sampling, a weight update API, and status reporting.

How fast are weight updates?

Delta-compressed updates are typically about 2% of the full model. Distribution across regions takes a few minutes, and the in-memory GPU swap stays under a minute because download and decode are pipelined ahead of the swap.

How do you handle staleness?

Async RL tolerates a bounded, predictable off-policy delay. The systems layer keeps the gap small by making weight movement a routine background operation. Your algorithm tolerates the rest.

What about MoE numerics?

The same kernels run across the trainer and rollouts. Divergences are published per model, and router replay aligns expert selection so training and sampling stay numerically consistent.

How do I get started?

Rollouts run on reserved capacity. Start with a small proof of concept to validate the integration, then scale across regions. Talk to our team to size a deployment.

Scale your RL rollouts on Fireworks

Point your trainer at production-grade rollout inference, or run the full loop on one stack.

Read the docs

Talk to our team