Kimi K2 0905, a new state of the art open weight models for agentic reasoning, tool use, and coding, is now available! Try Now

Quen Logo Mark

Qwen3 235B A22B Instruct 2507

Updated FP8 version of Qwen3-235B-A22B non-thinking mode, with better tool use, coding, instruction following, logical reasoning and text comprehension capabilities

Try Model

Fireworks Features

Fine-tuning

Qwen3 235B A22B Instruct 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Learn More

Serverless

Immediately run model on pre-configured GPUs and pay-per-token

Learn More

On-demand Deployment

On-demand deployments give you dedicated GPUs for Qwen3 235B A22B Instruct 2507 using Fireworks' reliable, high-performance system with no rate limits.

Learn More

FAQs

What is Qwen3-235B-A22B-Instruct-2507 and who developed it?

Qwen3-235B-A22B-Instruct-2507 is an instruction-tuned, non-thinking mode large language model developed by Alibaba’s Qwen team. It is a mixture-of-experts (MoE) model with 235 billion total parameters (22B active) and is optimized for reasoning, tool use, coding, and long-context tasks.

What applications and use cases does Qwen3-235B-A22B-Instruct-2507 excel at?

The model is designed for:

  • Complex reasoning (e.g., AIME25, HMMT25)
  • Instruction following and logic tasks
  • Coding (e.g., MultiPL-E, LiveCodeBench)
  • Long-context comprehension (supports up to 1M tokens)
  • Multilingual knowledge and creative writing

What is the maximum context length for Qwen3-235B-A22B-Instruct-2507?

The model supports a native context length of 262,144 tokens, and can be extended up to 1,010,000 tokens using Dual Chunk Attention and sparse attention mechanisms.

What is the usable context window for Qwen3-235B-A22B-Instruct-2507?

While the model supports up to 1M tokens, the recommended usable context for most tasks is up to 16,384 tokens, due to memory and latency considerations.

Does Qwen3-235B-A22B-Instruct-2507 support quantized formats (4-bit/8-bit)?

Yes. The model is available in FP8 quantized format, which improves inference speed and reduces memory usage.

What is the maximum output length Fireworks allows for Qwen3-235B-A22B-Instruct-2507?

The recommended maximum output length is 16,384 tokens, aligned with guidance from the Qwen team for generation quality and stability.

What are known failure modes of Qwen3-235B-A22B-Instruct-2507?

Known challenges include:

  • VRAM-related issues when attempting 1M context inference without proper configuration
  • Slight performance tradeoffs in long contexts with sparse attention
  • Some reports of alignment inconsistencies in subjective tasks

Does Qwen3-235B-A22B-Instruct-2507 support streaming responses and function-calling schemas?

Yes. The model supports streaming generation and agentic tool use via Qwen-Agent, which provides built-in support for function-calling and tool integration through configurable MCP files.

How many parameters does Qwen3-235B-A22B-Instruct-2507 have?

The model has 235 billion total parameters, with 22 billion active per token using a Mixture-of-Experts architecture.

How are tokens counted (prompt vs completion)?

Token pricing is split between input and output:

  • $0.22 per 1M input tokens
  • $0.88 per 1M output tokens

What license governs commercial use of Qwen3-235B-A22B-Instruct-2507?

The model is released under the Apache 2.0 license, permitting commercial use with attribution.

Info & Pricing

Provider

Qwen

Model Type

LLM

Context Length

262144

Serverless

Available

Fine-Tuning

Available

Pricing Per 1M Tokens Input/Output

$0.22 / $0.88