Latest Qwen3 thinking model, competitive against the best close source models in Jul 2025.
Fine-tuningDocs | Qwen3 235B A22B Thinking 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen3 235B A22B Thinking 2507 using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
Qwen3-235B-A22B-Thinking-2507 is an open-weight large language model (LLM) developed by the Qwen team. It is a reasoning-optimized variant of the Qwen3-235B-A22B MoE (Mixture of Experts) model, released in July 2025, with significant enhancements in long-context reasoning, tool usage, and alignment capabilities.
This model is designed for:
The native maximum context length is 262,144 tokens. With custom configuration and memory requirements, it can support up to 1 million tokens using Dual Chunk Attention and sparse attention techniques.
Usable context depends on deployment setup. For most use cases, 262K tokens are supported natively. For ultra-long context (approaching 1M tokens), you must reconfigure the model and allocate ≥1000 GB of GPU memory.
The model performs best with a temperature of 0.6, TopP=0.95, and TopK=20.
The recommended output length is:
Known issues include:
Yes. The model supports streaming generation and tool use via Qwen-Agent, which handles function-calling templates and tool parsers.
The model is released under the Apache 2.0 license, permitting commercial use with attribution.