GLM-4.6 API & Playground

GLM-4.6 API Features

Fine-tuning Docs	GLM-4.6 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for GLM-4.6 using Fireworks' reliable, high-performance system with no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.55 / $2.19

Per 1M Tokens (input/output)

GLM-4.6 FAQs

What is GLM-4.6 and who developed it?

GLM-4.6 is the latest version in the GLM (General Language Model) series developed by Zhipu AI (Z.ai). It introduces enhancements in long-context reasoning, agentic behavior, code generation, and search capabilities. The model builds upon GLM-4.5, delivering improvements across multiple domains.

What applications and use cases does GLM-4.6 excel at?

GLM-4.6 is optimized for:

Code assistance
Conversational AI
Agentic systems
Search
Multimedia
Enterprise RAG (retrieval-augmented generation)

What is the maximum context length for GLM-4.6?

GLM-4.6 supports a context length of 202,752 tokens on Fireworks AI.

What is the usable context window for GLM-4.6?

Fireworks supports the full 202,752 tokens, but the model was benchmarked using up to 128K in evaluations.

Does GLM-4.6 support quantized formats (4-bit/8-bit)?

GLM-4.6 fully supports quantization, including 4-bit and 8-bit formats.

How many parameters does GLM-4.6 have?

GLM-4.6 has 357 billion parameters.

What rate limits apply on the shared endpoint?

GLM-4.6 runs on dedicated GPU infrastructure with no rate limits when deployed on-demand via Fireworks.

What license governs commercial use of GLM-4.6?

GLM-4.6 is released under the MIT License, allowing commercial use.

GLM-4.6

GLM-4.6 API Features

Fine-tuning

Serverless

On-demand Deployment

Available Serverless

GLM-4.6 FAQs

Metadata

Specification

Supported Functionality