
Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, it was built as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Kimi K2 Thinking using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
Kimi K2 Thinking is the latest version of Moonshot AI's open-source “thinking model,” designed for advanced reasoning tasks. It interleaves step-by-step chain-of-thought reasoning with autonomous tool use, achieving strong performance across benchmarks like HLE, AIME25, and BrowseComp.
Kimi K2 Thinking is optimized for:
The maximum context length is 256k tokens.
The usable context is 256K tokens. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.
Yes. Kimi K2 Thinking is natively trained for INT4 quantization using Quantization-Aware Training (QAT), enabling lossless performance and up to 2x faster generation.
The recommended default temperature is 1.0.
Fireworks allows a maximum output of 4096 tokens per completion by default.
Yes. It supports OpenAI-style function-calling, and developers must provide a list of tools for each request.
The model has 1 trillion total parameters, with 32 billion active parameters per forward pass using a Mixture-of-Experts architecture with 384 experts, 8 selected per token.
Tokens are metered as input (prompt) and output (completion) separately. Fireworks charges per 1M tokens input/output.
Kimi K2 Thinking is released under a Modified MIT License, permitting commercial use with some additional terms.