Qwen3-235B-A22B API & Playground

What is Qwen3-235B-A22B and who developed it?

Qwen3-235B-A22B is a large Mixture-of-Experts (MoE) language model developed by Qwen (Alibaba Group). It is part of the Qwen3 series and includes 235 billion total parameters, with 22 billion active at inference time. The model features dual-mode reasoning (“thinking” and “non-thinking”), agent capabilities, and multilingual instruction following.

What applications and use cases does Qwen3-235B-A22B excel at?

Qwen3-235B-A22B is optimized for:

Code assistance
Conversational AI
Agentic systems and tool integration
Search
Multimedia
Enterprise RAG

What is the maximum context length for Qwen3-235B-A22B?

The model natively supports 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling on Fireworks.

What is the usable context window for Qwen3-235B-A22B?

Fireworks supports up to 131.1K tokens of usable context with YaRN scaling enabled.

Does Qwen3-235B-A22B support quantized formats (4-bit/8-bit)?

Yes, it supports 43 quantized variants, including 4-bit and 8-bit formats.

What is the maximum output length Fireworks allows for Qwen3-235B-A22B?

Recommended output length is 32,768 tokens, with support for up to 38,912 tokens in long-form benchmarking scenarios (e.g., math, programming).

What are known failure modes of Qwen3-235B-A22B?

Endless repetition when using greedy decoding
Performance degradation with inappropriate rope_scaling on short contexts
Potential language mixing at high presence_penalty values

Does Qwen3-235B-A22B support streaming responses and function-calling schemas?

Yes, streaming (and frameworks like vLLM) and function calling are supported on Fireworks.

How many parameters does Qwen3-235B-A22B have?

Qwen3-235B-A22B has 235.1 billion total parameters with 22 billion active parameters.

Is fine-tuning supported for Qwen3-235B-A22B?

Yes. Fireworks supports LoRA-based fine-tuning for this model.

What rate limits apply on the shared endpoint?

When deployed on-demand, there are no rate limits. Serverless mode is also available for pay-per-token access.

What license governs commercial use of Qwen3-235B-A22B?

The model is released under the Apache 2.0 license, which allows commercial use with proper attribution.

Fine-tuning Docs	Qwen3 235B A22B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Qwen3 235B A22B using Fireworks' reliable, high-performance system with no rate limits.

Qwen3 235B A22B

Qwen3 235B A22B API Features

Fine-tuning

Serverless

On-demand Deployment

Available Serverless

Qwen3 235B A22B FAQs

Metadata

Specification

Supported Functionality