GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen3 235B A22B Instruct 2507
Quen Logo Mark

Qwen3 235B A22B Instruct 2507

Ready
model path:accounts/fireworks/models/qwen3-235b-a22b-instruct-2507

Updated FP8 version of Qwen3-235B-A22B non-thinking mode, with better tool use, coding, instruction following, logical reasoning and text comprehension capabilities

Qwen3 235B A22B Instruct 2507 API Features

Fine-tuning

Docs

Qwen3 235B A22B Instruct 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen3 235B A22B Instruct 2507 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen3 235B A22B Instruct 2507 FAQs

What is Qwen3-235B-A22B-Instruct-2507 and who developed it?

Qwen3-235B-A22B-Instruct-2507 is an instruction-tuned, non-thinking mode large language model developed by Alibaba’s Qwen team. It is a mixture-of-experts (MoE) model with 235 billion total parameters (22B active) and is optimized for reasoning, tool use, coding, and long-context tasks.

What applications and use cases does Qwen3-235B-A22B-Instruct-2507 excel at?

The model is designed for:

  • Complex reasoning (e.g., AIME25, HMMT25)
  • Instruction following and logic tasks
  • Coding (e.g., MultiPL-E, LiveCodeBench)
  • Long-context comprehension (supports up to 1M tokens)
  • Multilingual knowledge and creative writing
What is the maximum context length for Qwen3-235B-A22B-Instruct-2507?

The model supports a native context length of 262,144 tokens, and can be extended up to 1,010,000 tokens using Dual Chunk Attention and sparse attention mechanisms.

What is the usable context window for Qwen3-235B-A22B-Instruct-2507?

While the model supports up to 1M tokens, the recommended usable context for most tasks is up to 16,384 tokens, due to memory and latency considerations.

Does Qwen3-235B-A22B-Instruct-2507 support quantized formats (4-bit/8-bit)?

Yes. The model is available in FP8 quantized format, which improves inference speed and reduces memory usage.

What is the maximum output length Fireworks allows for Qwen3-235B-A22B-Instruct-2507?

The recommended maximum output length is 16,384 tokens, aligned with guidance from the Qwen team for generation quality and stability.

What are known failure modes of Qwen3-235B-A22B-Instruct-2507?

Known challenges include:

  • VRAM-related issues when attempting 1M context inference without proper configuration
  • Slight performance tradeoffs in long contexts with sparse attention
  • Some reports of alignment inconsistencies in subjective tasks
Does Qwen3-235B-A22B-Instruct-2507 support streaming responses and function-calling schemas?

Yes. The model supports streaming generation and agentic tool use via Qwen-Agent, which provides built-in support for function-calling and tool integration through configurable MCP files.

How many parameters does Qwen3-235B-A22B-Instruct-2507 have?

The model has 235 billion total parameters, with 22 billion active per token using a Mixture-of-Experts architecture.

How are tokens counted (prompt vs completion)?

Token pricing is split between input and output:

  • $0.22 per 1M input tokens
  • $0.88 per 1M output tokens
What license governs commercial use of Qwen3-235B-A22B-Instruct-2507?

The model is released under the Apache 2.0 license, permitting commercial use with attribution.

Metadata

State
Ready
Created on
7/21/2025
Kind
Base model
Provider
Qwen

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
235B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
262k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported