
Kimi K2.5 is Moonshot AI’s flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes, and multi-agent execution into one model.
We are launching Day-0 support for Kimi K2.5. Fireworks offers the fastest endpoint for all Kimi K2 series models as well as fine tuning for Kimi K2 models. Additionally, we now offer a full parameter RL tuning private preview for Kimi K2.5, enabling application builders to fine tune the SOTA OSS VLM model for use cases like vibe coding and agentic workflows. Sign up for the full parameter RL tuning waitlist here.
Kimi K2.5 demonstrates that open source models are now surpassing their closed-source counterparts. The chart provides more details on the multiple benchmarks where Kimi K2.5 achieves SOTA results, including for Agents (HLE Full, BrowseComp, and Deepsearch) and for Vision (OmniDoc Bench 1.5).

Below is an in-depth look at its core application areas, highlighting the advanced nature across multiple multimodal agent use cases.

Kimi K2.5 is a multimodal model supporting image and video understanding. For text-only processing, developers can use the Kimi K2 model series including Kimi K2 Thinking and Kimi K2 0905.
Fireworks now supports full parameter RL tuning for Kimi K2.5 in private preview, allowing product developers to customize the model to their specific product use cases and exceed the quality of closed models. For companies already tuning models with LoRA, full parameter fine tuning offers an additional lever to get the best model quality. Signup for the waitlist here.
We are launching full parameter RL fine-tuning for Kimi K2.5 with Tinker API compatibility. Researchers get low-level compute primitives—forward, forward_backward, optimizer_step, save_weight—while we handle the distributed training infrastructure.
Global teams can train in one region and hot-load checkpoints into inference deployments elsewhere without managing cross-region weight transfers manually, allowing them to scale up their training to saturate the whole data center. The Fireworks RL trainer + model format handles all the serialization, compression, and ledger bookkeeping.
Fireworks offers the fastest endpoint for all Kimi K2 series models. Data from Artificial Analysis’ independent benchmarks shows that Fireworks consistently provides best-in-class performance for Moonshot’s Kimi Models, including Kimi K2 Thinking and Kimi K2 0905. Outperforming the next closest GPU inference provider by up to 60%. Faster speed is essential for real-time user experience, application productivity, and operational efficiency.


Fireworks provides 60% better performance with our proprietary customization engine than open-souce frameworks like vLLM. With speculation, Fireworks achieves up to 200 Tokens/s on Kimi K2.5 VL. Stay tuned as our engineering team continues to optimize the performance.

12345678910111213141516171819202122232425262728293031323334353637
Fireworks enables users to control the reasoning behavior of the Kimi K2.5 model and inspect its reasoning history for greater transparency. Click here for more details.