The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
Fine-tuningDocs | GLM-4.5 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for GLM-4.5 using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
GLM-4.5 is a hybrid MoE large language model developed by Zhipu AI. It is designed for reasoning, coding, and agentic tasks. Fireworks AI offers serverless and on-demand inference for the model.
GLM-4.5 is optimized for:
The maximum context length is 131,072 tokens.
The full 128K context length is usable, given the appropriate hardware configuration (e.g., H100x32 or H200x16).
Yes. GLM-4.5 is available in FP8 versions, suitable for inference-optimized deployments.
While not explicitly known failure modes, the model shows lower performance on:
Yes. GLM-4.5 supports native function calling and OpenAI-style tool schemas. It also supports speculative decoding and streaming via SGLang and vLLM.
GLM-4.5 has 355B total parameters, with 32B active parameters per forward pass.
GLM-4.5 is released under the MIT license, allowing commercial and derivative use.