GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Moonshot AI/Kimi K2 Thinking
mooshot

Kimi K2 Thinking

Ready
model path:accounts/fireworks/models/kimi-k2-thinking

Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, it was built as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

Kimi K2 Thinking API Features

Fine-tuning

Docs

Kimi K2 Thinking can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Kimi K2 Thinking on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Kimi K2 Thinking FAQs

What is Kimi K2 Thinking and who developed it?

Kimi K2 Thinking is the latest version of Moonshot AI's open-source “thinking model,” designed for advanced reasoning tasks. It interleaves step-by-step chain-of-thought reasoning with autonomous tool use, achieving strong performance across benchmarks like HLE, AIME25, and BrowseComp.

What applications and use cases does Kimi K2 Thinking excel at?

Kimi K2 Thinking is optimized for:

  • Agentic systems and tool-augmented reasoning
  • Coding (SWE-bench, LiveCodeBench, OJ-Bench)
  • Autonomous search (BrowseComp, FinSearchComp-T3)
  • Longform writing and conversational AI
  • Enterprise RAG and complex reasoning tasks like AIME25, GPQA
What is the maximum context length for Kimi K2 Thinking?

The maximum context length is 256k tokens.

What is the usable context window for Kimi K2 Thinking?

The usable context is 256K tokens. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.

Does Kimi K2 Thinking support quantized formats (4-bit/8-bit)?

Yes. Kimi K2 Thinking is natively trained for INT4 quantization using Quantization-Aware Training (QAT), enabling lossless performance and up to 2x faster generation.

What is the default temperature of Kimi K2 Thinking on Fireworks AI?

The recommended default temperature is 1.0.

What is the maximum output length Fireworks allows for Kimi K2 Thinking?

Fireworks allows a maximum output of 4096 tokens per completion by default.

Does Kimi K2 Thinking support function-calling schemas?

Yes. It supports OpenAI-style function-calling, and developers must provide a list of tools for each request.

How many parameters does Kimi K2 Thinking have?

The model has 1 trillion total parameters, with 32 billion active parameters per forward pass using a Mixture-of-Experts architecture with 384 experts, 8 selected per token.

How are tokens counted (prompt vs completion)?

Tokens are metered as input (prompt) and output (completion) separately. Fireworks charges per 1M tokens input/output.

What license governs commercial use of Kimi K2 Thinking?

Kimi K2 Thinking is released under a Modified MIT License, permitting commercial use with some additional terms.

Metadata

State
Ready
Created on
11/6/2025
Kind
Base model
Provider
Moonshot AI

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
1.02T

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
262k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported