As the latest iteration in the GLM series, GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
Fine-tuningDocs | GLM-4.6 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for GLM-4.6 using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
GLM-4.6 is the latest version in the GLM (General Language Model) series developed by Zhipu AI (Z.ai). It introduces enhancements in long-context reasoning, agentic behavior, code generation, and search capabilities. The model builds upon GLM-4.5, delivering improvements across multiple domains.
GLM-4.6 is optimized for:
GLM-4.6 supports a context length of 202,752 tokens on Fireworks AI.
Fireworks supports the full 202,752 tokens, but the model was benchmarked using up to 128K in evaluations.
GLM-4.6 fully supports quantization, including 4-bit and 8-bit formats.
GLM-4.6 has 357 billion parameters.
GLM-4.6 is released under the MIT License, allowing commercial use.