GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen3 235B A22B
Quen Logo Mark

Qwen3 235B A22B

Ready
model path:accounts/fireworks/models/qwen3-235b-a22b

Latest Qwen3 state of the art model, 235B with 22B active parameter model

Qwen3 235B A22B API Features

Fine-tuning

Docs

Qwen3 235B A22B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen3 235B A22B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen3 235B A22B FAQs

What is Qwen3-235B-A22B and who developed it?

Qwen3-235B-A22B is a large Mixture-of-Experts (MoE) language model developed by Qwen (Alibaba Group). It is part of the Qwen3 series and includes 235 billion total parameters, with 22 billion active at inference time. The model features dual-mode reasoning (“thinking” and “non-thinking”), agent capabilities, and multilingual instruction following.

What applications and use cases does Qwen3-235B-A22B excel at?

Qwen3-235B-A22B is optimized for:

  • Code assistance
  • Conversational AI
  • Agentic systems and tool integration
  • Search
  • Multimedia
  • Enterprise RAG
What is the maximum context length for Qwen3-235B-A22B?

The model natively supports 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling on Fireworks.

What is the usable context window for Qwen3-235B-A22B?

Fireworks supports up to 131.1K tokens of usable context with YaRN scaling enabled.

Does Qwen3-235B-A22B support quantized formats (4-bit/8-bit)?

Yes, it supports 43 quantized variants, including 4-bit and 8-bit formats.

What is the maximum output length Fireworks allows for Qwen3-235B-A22B?

Recommended output length is 32,768 tokens, with support for up to 38,912 tokens in long-form benchmarking scenarios (e.g., math, programming).

What are known failure modes of Qwen3-235B-A22B?
  • Endless repetition when using greedy decoding
  • Performance degradation with inappropriate rope_scaling on short contexts
  • Potential language mixing at high presence_penalty values
Does Qwen3-235B-A22B support streaming responses and function-calling schemas?

Yes, streaming (and frameworks like vLLM) and function calling are supported on Fireworks.

How many parameters does Qwen3-235B-A22B have?

Qwen3-235B-A22B has 235.1 billion total parameters with 22 billion active parameters.

Is fine-tuning supported for Qwen3-235B-A22B?

Yes. Fireworks supports LoRA-based fine-tuning for this model.

What rate limits apply on the shared endpoint?

When deployed on-demand, there are no rate limits. Serverless mode is also available for pay-per-token access.

What license governs commercial use of Qwen3-235B-A22B?

The model is released under the Apache 2.0 license, which allows commercial use with proper attribution.

Metadata

State
Ready
Created on
4/29/2025
Kind
Base model
Provider
Qwen

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
235B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported