GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Z.ai/GLM-4.5
model path:accounts/fireworks/models/glm-4p5

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

GLM-4.5 API Features

Fine-tuning

Docs

GLM-4.5 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use GLM-4.5 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

GLM-4.5 FAQs

What is GLM-4.5 and who developed it?

GLM-4.5 is a hybrid MoE large language model developed by Zhipu AI. It is designed for reasoning, coding, and agentic tasks. Fireworks AI offers serverless and on-demand inference for the model.

What applications and use cases does GLM-4.5 excel at?

GLM-4.5 is optimized for:

  • Agentic applications (e.g., function calling, web browsing)
  • Advanced reasoning (math, logic, science)
  • Coding tasks (project creation, bug fixing, agentic coding)
  • Full-stack development and presentation generation with tool integration.
What is the maximum context length for GLM-4.5?

The maximum context length is 131,072 tokens.

What is the usable context window for GLM-4.5?

The full 128K context length is usable, given the appropriate hardware configuration (e.g., H100x32 or H200x16).

Does GLM-4.5 support quantized formats (4-bit/8-bit)?

Yes. GLM-4.5 is available in FP8 versions, suitable for inference-optimized deployments.

What are known failure modes of GLM-4.5?

While not explicitly known failure modes, the model shows lower performance on:

  • HLE (14.4%) - reasoning under human-like evaluation
  • BrowseComp (26.4%) - complex web search tasks
Does GLM-4.5 support streaming responses and function-calling schemas?

Yes. GLM-4.5 supports native function calling and OpenAI-style tool schemas. It also supports speculative decoding and streaming via SGLang and vLLM.

How many parameters does GLM-4.5 have?

GLM-4.5 has 355B total parameters, with 32B active parameters per forward pass.

What license governs commercial use of GLM-4.5?

GLM-4.5 is released under the MIT license, allowing commercial and derivative use.

Metadata

State
Ready
Created on
7/29/2025
Kind
Base model
Provider
Z.ai
Hugging Face
zai-org/GLM-4.5

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
352B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported