Qwen3 235B A22B Instruct 2507 API & Playground

What is Qwen3-235B-A22B-Instruct-2507 and who developed it?

Qwen3-235B-A22B-Instruct-2507 is an instruction-tuned, non-thinking mode large language model developed by Alibaba’s Qwen team. It is a mixture-of-experts (MoE) model with 235 billion total parameters (22B active) and is optimized for reasoning, tool use, coding, and long-context tasks.

What applications and use cases does Qwen3-235B-A22B-Instruct-2507 excel at?

The model is designed for:

Complex reasoning (e.g., AIME25, HMMT25)
Instruction following and logic tasks
Coding (e.g., MultiPL-E, LiveCodeBench)
Long-context comprehension (supports up to 1M tokens)
Multilingual knowledge and creative writing

What is the maximum context length for Qwen3-235B-A22B-Instruct-2507?

The model supports a native context length of 262,144 tokens, and can be extended up to 1,010,000 tokens using Dual Chunk Attention and sparse attention mechanisms.

What is the usable context window for Qwen3-235B-A22B-Instruct-2507?

While the model supports up to 1M tokens, the recommended usable context for most tasks is up to 16,384 tokens, due to memory and latency considerations.

Does Qwen3-235B-A22B-Instruct-2507 support quantized formats (4-bit/8-bit)?

Yes. The model is available in FP8 quantized format, which improves inference speed and reduces memory usage.

What is the maximum output length Fireworks allows for Qwen3-235B-A22B-Instruct-2507?

The recommended maximum output length is 16,384 tokens, aligned with guidance from the Qwen team for generation quality and stability.

What are known failure modes of Qwen3-235B-A22B-Instruct-2507?

Known challenges include:

VRAM-related issues when attempting 1M context inference without proper configuration
Slight performance tradeoffs in long contexts with sparse attention
Some reports of alignment inconsistencies in subjective tasks

Does Qwen3-235B-A22B-Instruct-2507 support streaming responses and function-calling schemas?

Yes. The model supports streaming generation and agentic tool use via Qwen-Agent, which provides built-in support for function-calling and tool integration through configurable MCP files.

How many parameters does Qwen3-235B-A22B-Instruct-2507 have?

The model has 235 billion total parameters, with 22 billion active per token using a Mixture-of-Experts architecture.

How are tokens counted (prompt vs completion)?

Token pricing is split between input and output:

$0.22 per 1M input tokens
$0.88 per 1M output tokens

What license governs commercial use of Qwen3-235B-A22B-Instruct-2507?

The model is released under the Apache 2.0 license, permitting commercial use with attribution.

Fine-tuning Docs	Qwen3 235B A22B Instruct 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Serverless Docs	Immediately run model on pre-configured GPUs and pay-per-token
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Qwen3 235B A22B Instruct 2507 using Fireworks' reliable, high-performance system with no rate limits.

Qwen3 235B A22B Instruct 2507

Qwen3 235B A22B Instruct 2507 API Features

Fine-tuning

Serverless

On-demand Deployment

Available Serverless

Qwen3 235B A22B Instruct 2507 FAQs

Metadata

Specification

Supported Functionality