Qwen 3.7 Plus is now available on Serverless, exclusively on Fireworks. Try it today.

Model Library
/Z.ai/GLM-5
model path:accounts/fireworks/models/glm-5

GLM-5 is Z.ai's SOTA model targeting complex systems engineering and long-horizon agentic tasks. It uses a mixture of experts architecture, so it only activates 40 billion of its 744 billion parameters. This model uses Deepseek Sparse Attention to select only the most relevant tokens for attention, reducing the cost of long-context processing. GLM-5 continues improving on top of GLM-4.7 for coding and agentic use cases, and it's also great for document generation for enterprise workloads.

GLM-5 API Features

On-demand Deployment

Docs

On-demand deployments allow you to use GLM-5 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State
Ready
Created on
2/11/2026
Kind
Base model
Provider
Z.ai
Hugging Face
zai-org/GLM-5

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
743B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
202k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported