GLM 5.2 API & Playground

GLM-5.2 introduces a robust 1M-token context and advanced, multi-effort coding capabilities to significantly enhance performance on long-horizon tasks. Its new IndexShare architecture and improved MTP layer simultaneously boost efficiency by reducing per-token FLOPs and increasing speculative decoding lengths.

GLM 5.2 API Features

Serverless Docs	GLM 5.2 is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.
On-demand Deployment Docs	On-demand deployments allow you to use GLM 5.2 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.
Fine-tuning Docs	GLM 5.2 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Available Serverless

Run queries immediately, pay only for usage

$1.40 / $0.14 / $4.40

Per 1M Tokens (input/cached input/output)

Metadata

State

Ready

Created on

6/16/2026

Kind

Base model

Provider

Z.ai

Hugging Face

zai-org/GLM-5.2

Specification

Calibrated

Mixture-of-Experts

Yes

Parameters

743B

Supported Functionality

Fine-tuning

Supported

Serverless

Supported

Context Length

1040k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

GLM 5.2

GLM 5.2 API Features

Serverless

On-demand Deployment

Fine-tuning

Available Serverless

Metadata

Specification

Supported Functionality