Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 235B A22B Thinking 2507
Quen Logo Mark

Qwen3 235B A22B Thinking 2507

fireworks/qwen3-235b-a22b-thinking-2507

    Latest Qwen3 thinking model, competitive against the best close source models in Jul 2025.

    Qwen3 235B A22B Thinking 2507 API Features

    Fine-tuning

    Docs

    Qwen3 235B A22B Thinking 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 235B A22B Thinking 2507 using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.22 / $0.88
    Per 1M Tokens (input/output)
    What is Qwen3-235B-A22B-Thinking-2507 and who developed it?

    Qwen3-235B-A22B-Thinking-2507 is an open-weight large language model (LLM) developed by the Qwen team. It is a reasoning-optimized variant of the Qwen3-235B-A22B MoE (Mixture of Experts) model, released in July 2025, with significant enhancements in long-context reasoning, tool usage, and alignment capabilities.

    What applications and use cases does Qwen3-235B-A22B-Thinking-2507 excel at?

    This model is designed for:

    • Complex reasoning tasks (math, logic, science, coding)
    • Academic and professional benchmarks
    • Agentic applications requiring tool use
    • Long-context use cases, including document analysis and chain-of-thought workflows
    • Creative writing and instruction following.

    What is the maximum context length for Qwen3-235B-A22B-Thinking-2507?

    The native maximum context length is 262,144 tokens. With custom configuration and memory requirements, it can support up to 1 million tokens using Dual Chunk Attention and sparse attention techniques.

    What is the usable context window for Qwen3-235B-A22B-Thinking-2507?

    Usable context depends on deployment setup. For most use cases, 262K tokens are supported natively. For ultra-long context (approaching 1M tokens), you must reconfigure the model and allocate ≥1000 GB of GPU memory.

    What is the maximum output length Fireworks allows for Qwen3-235B-A22B-Thinking-2507?

    The recommended output length is:

    • 32,768 tokens for typical queries
    • Up to 81,920 tokens for high-complexity tasks (e.g., math/code reasoning).
    What are known failure modes of Qwen3-235B-A22B-Thinking-2507?

    Known issues include:

    • Memory constraints when using long contexts (OOM errors)
    • Performance degradation if best practices for prompt and output format aren't followed

    Does Qwen3-235B-A22B-Thinking-2507 support streaming responses and function-calling schemas?

    Yes. The model supports streaming generation and tool use via Qwen-Agent, which handles function-calling templates and tool parsers.

    How many parameters does Qwen3-235B-A22B-Thinking-2507 have?
    • Total parameters: 235B
    • Activated parameters: 22B per forward pass (MoE)
    • Experts: 128 total, 8 active per token.

    What license governs commercial use of Qwen3-235B-A22B-Thinking-2507?

    The model is released under the Apache 2.0 license, permitting commercial use with attribution.

    Metadata

    State
    Unknown
    Created on
    N/A
    Kind
    Unknown
    Provider
    Qwen

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    N/A

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    262.1k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported