Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Moonshot AI/Kimi K2 Thinking
fireworks/kimi-k2-thinking

    Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, it was built as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

    Kimi K2 Thinking API Features

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Kimi K2 Thinking using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.60 / $2.50
    Per 1M Tokens (input/output)

    Kimi K2 Thinking FAQs

    What is Kimi K2 Thinking and who developed it?

    Kimi K2 Thinking is the latest version of Moonshot AI's open-source “thinking model,” designed for advanced reasoning tasks. It interleaves step-by-step chain-of-thought reasoning with autonomous tool use, achieving strong performance across benchmarks like HLE, AIME25, and BrowseComp.

    What applications and use cases does Kimi K2 Thinking excel at?

    Kimi K2 Thinking is optimized for:

    • Agentic systems and tool-augmented reasoning
    • Coding (SWE-bench, LiveCodeBench, OJ-Bench)
    • Autonomous search (BrowseComp, FinSearchComp-T3)
    • Longform writing and conversational AI
    • Enterprise RAG and complex reasoning tasks like AIME25, GPQA
    What is the maximum context length for Kimi K2 Thinking?

    The maximum context length is 256k tokens.

    What is the usable context window for Kimi K2 Thinking?

    The usable context is 256K tokens. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.

    Does Kimi K2 Thinking support quantized formats (4-bit/8-bit)?

    Yes. Kimi K2 Thinking is natively trained for INT4 quantization using Quantization-Aware Training (QAT), enabling lossless performance and up to 2x faster generation.

    What is the default temperature of Kimi K2 Thinking on Fireworks AI?

    The recommended default temperature is 1.0.

    What is the maximum output length Fireworks allows for Kimi K2 Thinking?

    Fireworks allows a maximum output of 4096 tokens per completion by default.

    Does Kimi K2 Thinking support function-calling schemas?

    Yes. It supports OpenAI-style function-calling, and developers must provide a list of tools for each request.

    How many parameters does Kimi K2 Thinking have?

    The model has 1 trillion total parameters, with 32 billion active parameters per forward pass using a Mixture-of-Experts architecture with 384 experts, 8 selected per token.

    How are tokens counted (prompt vs completion)?

    Tokens are metered as input (prompt) and output (completion) separately. Fireworks charges per 1M tokens input/output.

    What license governs commercial use of Kimi K2 Thinking?

    Kimi K2 Thinking is released under a Modified MIT License, permitting commercial use with some additional terms.

    Metadata

    State
    Ready
    Created on
    11/6/2025
    Kind
    Base model
    Provider
    Moonshot AI
    Hugging Face
    Kimi-K2-Thinking

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    1T

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    N/A
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported