Updated FP8 version of Qwen3-235B-A22B non-thinking mode, with better tool use, coding, instruction following, logical reasoning and text comprehension capabilities
Qwen3 235B A22B Instruct 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Learn MoreImmediately run model on pre-configured GPUs and pay-per-token
Learn MoreOn-demand deployments give you dedicated GPUs for Qwen3 235B A22B Instruct 2507 using Fireworks' reliable, high-performance system with no rate limits.
Learn MoreQwen3-235B-A22B-Instruct-2507 is an instruction-tuned, non-thinking mode large language model developed by Alibaba’s Qwen team. It is a mixture-of-experts (MoE) model with 235 billion total parameters (22B active) and is optimized for reasoning, tool use, coding, and long-context tasks.
The model is designed for:
The model supports a native context length of 262,144 tokens, and can be extended up to 1,010,000 tokens using Dual Chunk Attention and sparse attention mechanisms.
While the model supports up to 1M tokens, the recommended usable context for most tasks is up to 16,384 tokens, due to memory and latency considerations.
Yes. The model is available in FP8 quantized format, which improves inference speed and reduces memory usage.
The recommended maximum output length is 16,384 tokens, aligned with guidance from the Qwen team for generation quality and stability.
Known challenges include:
Yes. The model supports streaming generation and agentic tool use via Qwen-Agent
, which provides built-in support for function-calling and tool integration through configurable MCP files.
The model has 235 billion total parameters, with 22 billion active per token using a Mixture-of-Experts architecture.
Token pricing is split between input and output:
The model is released under the Apache 2.0 license, permitting commercial use with attribution.
Qwen
262144
Available
Available
$0.22 / $0.88