GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Z.ai/GLM-4.5-Air
model path:accounts/fireworks/models/glm-4p5-air

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

GLM-4.5-Air API Features

Fine-tuning

Docs

GLM-4.5-Air can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use GLM-4.5-Air on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

GLM-4.5-Air FAQs

What is GLM-4.5-Air and who developed it?

GLM-4.5-Air is a compact, open-source large language model developed by Zhipu AI. It is part of the GLM-4.5 family, optimized for intelligent agent applications. GLM-4.5-Air features 106 billion total parameters and 12 billion active parameters and supports hybrid reasoning with two execution modes: "thinking" (for complex tasks) and "non-thinking" (for fast responses).

What applications and use cases does GLM-4.5-Air excel at?

GLM-4.5-Air is designed for:

  • Conversational AI
  • Reasoning-intensive tasks
  • Agentic system operations
  • Code generation and tool use

Its hybrid reasoning capabilities make it suitable for intelligent agent environments and real-world task planning.

What is the maximum context length for GLM-4.5-Air?

The maximum context length for GLM-4.5-Air is 131,072 tokens (131.1k).

Does GLM-4.5-Air support quantized formats (4-bit/8-bit)?

Yes. The model lists 52 quantized variants, including 4-bit and 8-bit for efficient inference.

How many parameters does GLM-4.5-Air have?

GLM-4.5-Air has 106 billion total parameters and 12 billion active parameters. It is a dense model that does not use a Mixture-of-Experts (MoE) architecture.

Is fine-tuning supported for GLM-4.5-Air?

No, fine-tuning is not supported on Fireworks.

What rate limits apply on the shared endpoint?

GLM-4.5-Air is available on both serverless (pay-per-token at $0.22 per 1M input tokens and $0.88 per 1M output tokens) and on-demand deployments with no rate limits.

What license governs commercial use of GLM-4.5-Air?

GLM-4.5-Air is released under the MIT license, which allows commercial use and secondary development.

Metadata

State
Ready
Created on
8/1/2025
Kind
Base model
Provider
Z.ai

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
106B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported