GLM-4.5V API

What is GLM-4.5V and who developed it?

GLM-4.5V is a vision-language model developed by ZhipuAI (Z.ai). It is based on the GLM-4.5-Air architecture with 106B total parameters (12B active) and continues the GLM-4.1V technical lineage. It achieves state-of-the-art (SOTA) results across 42 public V+L benchmarks and supports image, video, document understanding, and GUI agent operations.

What applications and use cases does GLM-4.5V excel at?

GLM-4.5V is designed for real-world multimodal reasoning and excels at:

•Scene and image understanding
•Video segmentation and event recognition
•GUI automation (e.g. screen reading, icon recognition)
•Long document parsing and chart interpretation
•Visual grounding (bounding boxes)
•Bilingual multimodal tasks (Chinese/English)

What is the maximum context length for GLM-4.5V?

The model supports a maximum context length of 131.1k tokens.

What are known failure modes of GLM-4.5V?

Documented limitations include:

•Occasional overthinking or repetition
•Raw HTML output in frontend code without proper formatting
•Minor perception issues (e.g. object counting, character identification)
•Slightly weaker performance on pure text-only Q&A

How many parameters does GLM-4.5V have?

•Total Parameters: 108B
•Active Parameters: 12B (Mixture-of-Experts model)

Is fine-tuning supported for GLM-4.5V?

No, fine-tuning is not supported for GLM-4.5V on Fireworks AI.

What rate limits apply on the shared endpoint?

GLM-4.5V is only available via on-demand deployment, which comes with no rate limits.

What license governs commercial use of GLM-4.5V?

GLM-4.5V is released under the MIT License, and commercial use is permitted.

GLM-4.5V

GLM-4.5V API Features

On-demand Deployment

GLM-4.5V FAQs

Metadata

Specification

Supported Functionality