Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Z.ai/GLM-4.6
fireworks/glm-4p6

    As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.

    GLM-4.6 API Features

    Fine-tuning

    Docs

    GLM-4.6 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for GLM-4.6 using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.55 / $2.19
    Per 1M Tokens (input/output)

    GLM-4.6 FAQs

    What is GLM-4.6 and who developed it?

    GLM-4.6 is the latest version in the GLM (General Language Model) series developed by Zhipu AI (Z.ai). It introduces enhancements in long-context reasoning, agentic behavior, code generation, and search capabilities. The model builds upon GLM-4.5, delivering improvements across multiple domains.

    What applications and use cases does GLM-4.6 excel at?

    GLM-4.6 is optimized for:

    • Code assistance
    • Conversational AI
    • Agentic systems
    • Search
    • Multimedia
    • Enterprise RAG (retrieval-augmented generation)
    What is the maximum context length for GLM-4.6?

    GLM-4.6 supports a context length of 202,752 tokens on Fireworks AI.

    What is the usable context window for GLM-4.6?

    Fireworks supports the full 202,752 tokens, but the model was benchmarked using up to 128K in evaluations.

    Does GLM-4.6 support quantized formats (4-bit/8-bit)?

    GLM-4.6 fully supports quantization, including 4-bit and 8-bit formats.

    How many parameters does GLM-4.6 have?

    GLM-4.6 has 357 billion parameters.

    What rate limits apply on the shared endpoint?

    GLM-4.6 runs on dedicated GPU infrastructure with no rate limits when deployed on-demand via Fireworks.

    What license governs commercial use of GLM-4.6?

    GLM-4.6 is released under the MIT License, allowing commercial use.

    Metadata

    State
    Ready
    Created on
    10/1/2025
    Kind
    Base model
    Provider
    Z.ai
    Hugging Face
    GLM-4.6

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    352.8B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    202.8k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported