

Cursor set out to build a coding model optimized for one environment: software engineering inside Cursor. Rather than relying solely on general-purpose foundation models, the team combined continual pre-training, large-scale reinforcement learning, and production feedback loops to create a specialized model that improves through real developer workflows.
Composer 2 builds on Kimi 2.5 and is trained on long-horizon software engineering tasks including tool use, debugging, terminal execution, and multi-file code edits. The result is frontier-level coding performance while remaining significantly more cost-efficient and reliable than larger general-purpose models (Composer 2 Technical Report).
As these training loops scaled, reinforcement learning stopped being just a modeling technique and became an infrastructure problem. Each rollout is effectively a full development session, requiring fast inference, consistent execution environments, and frequent synchronization of large model states across distributed clusters. Training stability, environment fidelity, and rollout throughput all became first-order system constraints (Frontier RL Blog).
Fireworks provides the inference layer that makes these RL loops practical. It powers distributed rollout and inference infrastructure across multiple clusters, enabling Cursor to run large-scale reinforcement learning without building and operating a dedicated inference stack internally. This lets the team focus on improving model behavior rather than scaling low-level systems. This builds on an earlier collaboration focused on high-performance inference for coding workloads (Fireworks x Cursor).
"We have finite engineers like everybody else. We would prefer to have engineers make training more efficient and more precise rather than spin up an inference effort."
- Federico Cassano, Research Lead, Cursor
Cursor’s optimization target is a single environment: software engineering inside Cursor.
That constraint changes the learning problem. Instead of spreading capacity across broad general-purpose capability, Composer 2 focuses on the workflows developers actually run: editing repositories, debugging failures, executing terminal commands, and completing multi-step tool interactions.
The base model (Kimi 2.5) provides general coding ability, but most gains come from post-training. Cursor combines continual mid-training on code with reinforcement learning over full software engineering sessions.
Each training run is a full development trajectory: long-horizon debugging, tool calls, environment feedback, and iterative code changes.
This is where the distinction becomes operational: Mid-training teaches the model to write code. Reinforcement learning teaches it to write correct code inside a real system.
As Federico Cassano puts it,
"Mid-training teaches the model to write code. Reinforcement learning teaches it to write correct code."
This shift is what turns a strong coding model into a reliable coding agent, and it introduces a new set of constraints around environment fidelity, rollout scale, and distributed infrastructure design.
As RL expanded, it became clear that model quality and infrastructure quality were tightly coupled.
Each rollout behaves like a full engineering session, with tool execution, terminal commands, file edits, and evaluation before any weight update occurs. That means RL performance depends not just on the model, but on how faithfully the environment mirrors production.
Small mismatches matter. Models can exploit gaps between training and production behavior, optimizing for reward signals rather than real correctness.
In practice, this turns RL into an environment fidelity problem as much as a learning problem.
The Cursor team observed this directly: when environments are imperfect, models learn shortcuts that break in production. RL does not just optimize behavior, it amplifies whatever signal the system provides.
This is why infrastructure becomes inseparable from model performance.
Fireworks provides the inference and rollout layer that allows Cursor to scale RL without building a dedicated inference system.
Instead of centralizing training in a single massive cluster, Cursor runs RL across 3–4 distributed global clusters unified through Fireworks infrastructure.
Key capabilities include:
Model synchronization is handled through compressed weight updates rather than full parameter transfers, reducing overhead while maintaining consistency across regions.
This makes it possible to treat inference capacity as a shared resource between production and training, accelerating RL cycles without separate infrastructure investment.
Composer is not just trained with reinforcement learning. RL is used as a mechanism for shaping real product behavior.
This produces measurable improvements in:
Rather than treating RL as a research loop, Cursor operationalized it as part of the system that defines product quality.
Composer delivers frontier-level coding performance with materially lower cost and higher operational efficiency.
| Benchmark | Score |
|---|---|
| CursorBench | 61.3 |
| Terminal-Bench | 61.7 |
| SWE-bench Multilingual | 73.7 |
Early internal and community evaluations show Composer competing with or exceeding leading proprietary coding models in terminal-style and agentic tasks.
Composer reflects a broader shift in how AI systems are built:
This validates a deeper shift: the next advantage in AI is not just model choice, but ownership of the post-training loop.
Cursor demonstrates that frontier coding performance is increasingly a function of reinforcement learning systems, not just model scale.
Fireworks enables this shift by making large-scale RL and inference infrastructure practical, distributed, and cost-efficient.
The result is a new category of system: models that continuously improve inside the environments where they are used.
Owning that loop is becoming the defining advantage in AI systems.