Kimi K2 Thinking API & Playground

What is Kimi K2 Thinking and who developed it?

Kimi K2 Thinking is the latest version of Moonshot AI's open-source “thinking model,” designed for advanced reasoning tasks. It interleaves step-by-step chain-of-thought reasoning with autonomous tool use, achieving strong performance across benchmarks like HLE, AIME25, and BrowseComp.

What applications and use cases does Kimi K2 Thinking excel at?

Kimi K2 Thinking is optimized for:

Agentic systems and tool-augmented reasoning
Coding (SWE-bench, LiveCodeBench, OJ-Bench)
Autonomous search (BrowseComp, FinSearchComp-T3)
Longform writing and conversational AI
Enterprise RAG and complex reasoning tasks like AIME25, GPQA

What is the maximum context length for Kimi K2 Thinking?

The maximum context length is 256k tokens.

What is the usable context window for Kimi K2 Thinking?

The usable context is 256K tokens. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.

Does Kimi K2 Thinking support quantized formats (4-bit/8-bit)?

Yes. Kimi K2 Thinking is natively trained for INT4 quantization using Quantization-Aware Training (QAT), enabling lossless performance and up to 2x faster generation.

What is the default temperature of Kimi K2 Thinking on Fireworks AI?

The recommended default temperature is 1.0.

What is the maximum output length Fireworks allows for Kimi K2 Thinking?

Fireworks allows a maximum output of 4096 tokens per completion by default.

Does Kimi K2 Thinking support function-calling schemas?

Yes. It supports OpenAI-style function-calling, and developers must provide a list of tools for each request.

How many parameters does Kimi K2 Thinking have?

The model has 1 trillion total parameters, with 32 billion active parameters per forward pass using a Mixture-of-Experts architecture with 384 experts, 8 selected per token.

How are tokens counted (prompt vs completion)?

Tokens are metered as input (prompt) and output (completion) separately. Fireworks charges per 1M tokens input/output.

What license governs commercial use of Kimi K2 Thinking?

Kimi K2 Thinking is released under a Modified MIT License, permitting commercial use with some additional terms.

Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Kimi K2 Thinking using Fireworks' reliable, high-performance system with no rate limits.

Kimi K2 Thinking

Kimi K2 Thinking API Features

Serverless

On-demand Deployment

Available Serverless

Kimi K2 Thinking FAQs

Metadata

Specification

Supported Functionality