Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Z.ai/GLM-4.5-Air
fireworks/glm-4p5-air

    The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

    GLM-4.5-Air API Features

    Fine-tuning

    Docs

    GLM-4.5-Air can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for GLM-4.5-Air using Fireworks' reliable, high-performance system with no rate limits.

    GLM-4.5-Air FAQs

    What is GLM-4.5-Air and who developed it?

    GLM-4.5-Air is a compact, open-source large language model developed by Zhipu AI. It is part of the GLM-4.5 family, optimized for intelligent agent applications. GLM-4.5-Air features 106 billion total parameters and 12 billion active parameters and supports hybrid reasoning with two execution modes: "thinking" (for complex tasks) and "non-thinking" (for fast responses).

    What applications and use cases does GLM-4.5-Air excel at?

    GLM-4.5-Air is designed for:

    • Conversational AI
    • Reasoning-intensive tasks
    • Agentic system operations
    • Code generation and tool use

    Its hybrid reasoning capabilities make it suitable for intelligent agent environments and real-world task planning.

    What is the maximum context length for GLM-4.5-Air?

    The maximum context length for GLM-4.5-Air is 131,072 tokens (131.1k).

    Does GLM-4.5-Air support quantized formats (4-bit/8-bit)?

    Yes. The model lists 52 quantized variants, including 4-bit and 8-bit for efficient inference.

    How many parameters does GLM-4.5-Air have?

    GLM-4.5-Air has 106 billion total parameters and 12 billion active parameters. It is a dense model that does not use a Mixture-of-Experts (MoE) architecture.

    Is fine-tuning supported for GLM-4.5-Air?

    No, fine-tuning is not supported on Fireworks.

    What rate limits apply on the shared endpoint?

    GLM-4.5-Air is available on both serverless (pay-per-token at $0.22 per 1M input tokens and $0.88 per 1M output tokens) and on-demand deployments with no rate limits.

    What license governs commercial use of GLM-4.5-Air?

    GLM-4.5-Air is released under the MIT license, which allows commercial use and secondary development.

    Metadata

    State
    Ready
    Created on
    8/1/2025
    Kind
    Base model
    Provider
    Z.ai
    Hugging Face
    GLM-4.5-Air

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    106.9B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported