"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."


"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

why did Cursor rollout Composer 2 with @FireworksAI_HQ?
"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."


"The rLLM team is dedicated to pushing the boundaries of autonomous AI, which means our time is best spent on innovation rather than managing backend clusters. The Fireworks Training SDK lets us focus on our research instead of wrestling with infrastructure. The platform is fast, well-optimized, and just works."

why did Cursor rollout Composer 2 with @FireworksAI_HQ?
"...because it's way more performant than the open source engines and is what we use in production. our rl inference scales elastically and globally because of it. when we have low prod traffic we scale up RL, when we have high prod traffic, we scale down RL."

"Vercel’s v0 model is a composite model. The SOTA in this space changes every day, so you don’t want to tie yourself to a single model. Using a fine-tuned reinforcement learning model with Fireworks, we perform substantially better than SOTA. In our evaluation, Sonnet 3.5 compiled at 62%, and we got our error-free generation rate well into the 90s."

Yes. Fireworks' Bring Your Own Trainer (BYOT) integration lets you run your own training loop while offloading large-scale inference to Fireworks. You create a hot-load deployment pointed at your external bucket (S3, MinIO, Nebius), upload checkpoints on your own cadence, signal Fireworks when a new snapshot is ready, and run rollouts via the standard OpenAI-compatible API. Fireworks handles the distributed weight swap, KV cache management, and inference serving. Note that this is currently an early access feature; contact Fireworks to enable it on your account.
More info: https://docs.fireworks.ai/fine-tuning/rl-rollout-integration
No, and you shouldn't. The recommended approach is to upload a full HuggingFace-format checkpoint for the first step and then every 20th or 30th step after that. For all intermediate steps, publish an incremental snapshot using the ARC2 (arc_v2) format, which diffs against the currently loaded snapshot. Incremental snapshots significantly reduce both upload time and load time during training. If an incremental hot-load fails, fall back to a new full snapshot.
More info: https://docs.fireworks.ai/fine-tuning/rl-rollout-delta-checkpoints
It depends on which transition mode is configured on your deployment. The default for RL is async transition: in-flight requests are paused during the weight swap and then resumed on the same HTTP connection with their KV state intact, so they continue streaming rather than restarting. New requests are queued until the swap completes, which may show up as elevated time-to-first-token. No 4xx/5xx errors are returned for the swap itself, though you can set the x-fireworks-hot-load-drain-timeout request header (default 90 seconds) to receive an HTTP 425 if the swap exceeds that window. The alternative is synchronous transition, where in-flight requests finish on the old weights before the swap begins, and new requests receive HTTP 425 until the swap is done.
More info: https://docs.fireworks.ai/fine-tuning/rl-rollout-debugging
Run firectl get ledger <deployment_id> to dump the full snapshot history. Each row shows the snapshot identity, whether it was a full or incremental load, per-replica readiness timestamps, and any load errors. If the deployment itself is unhealthy (e.g., OOM during merge, crash loop), run firectl deployment get <deployment_id> and check status and latestStatus.reason alongside the ledger. If the delta chain is wedged, you can reset the ledger entirely via a DELETE call to the ledger endpoint. This clears server-side history without deleting the deployment. After a reset, your next signal must be a full snapshot since there is nothing to diff against.
More info: https://docs.fireworks.ai/fine-tuning/rl-rollout-debugging
You control this per snapshot via the reset_prompt_cache field in the hot-load signal request. The current default (all) refills prompt cache broadly after the swap. Setting it to new_session preserves the cache namespace for existing multi-turn session IDs while new sessions refill. Setting it to none preserves prompt cache state entirely across the swap. This field only affects what can be reused after the swap; it does not interrupt the active turn in an in-flight request.
More info: https://docs.fireworks.ai/fine-tuning/rl-rollout-debugging