GLM-5

GLM-5 is Z.ai's SOTA model targeting complex systems engineering and long-horizon agentic tasks. It uses a mixture of experts architecture, so it only activates 40 billion of its 744 billion parameters. This model uses Deepseek Sparse Attention to select only the most relevant tokens for attention, reducing the cost of long-context processing. GLM-5 continues improving on top of GLM-4.7 for coding and agentic use cases, and it's also great for document generation for enterprise workloads.

GLM-5 API Features

Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for GLM-5 using Fireworks' reliable, high-performance system with no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$1.00 / $0.20 / $3.20

Per 1M Tokens (input/cached input/output)

Metadata

State

Ready

Created on

2/11/2026

Kind

Base model

Provider

N/A

Hugging Face

GLM-5

Specification

Calibrated

Mixture-of-Experts

Parameters

700B

Supported Functionality

Fine-tuning

Not supported

Serverless

Supported

Context Length

202.8k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

GLM-5

GLM-5 API Features

Serverless

On-demand Deployment

Available Serverless

Metadata

Specification

Supported Functionality