GLM-4.5V is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of GLM-4.1V-Thinking, achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for GLM-4.5V using Fireworks' reliable, high-performance system with no rate limits. |
GLM-4.5V is a vision-language model developed by ZhipuAI (Z.ai). It is based on the GLM-4.5-Air architecture with 106B total parameters (12B active) and continues the GLM-4.1V technical lineage. It achieves state-of-the-art (SOTA) results across 42 public V+L benchmarks and supports image, video, document understanding, and GUI agent operations.
GLM-4.5V is designed for real-world multimodal reasoning and excels at:
The model supports a maximum context length of 131.1k tokens.
Documented limitations include:
No, fine-tuning is not supported for GLM-4.5V on Fireworks AI.
GLM-4.5V is released under the MIT License, and commercial use is permitted.