Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 235B A22B Instruct 2507
Quen Logo Mark

Qwen3 235B A22B Instruct 2507

Ready
fireworks/qwen3-235b-a22b-instruct-2507

    Updated FP8 version of Qwen3-235B-A22B non-thinking mode, with better tool use, coding, instruction following, logical reasoning and text comprehension capabilities

    Qwen3 235B A22B Instruct 2507 API Features

    Fine-tuning

    Docs

    Qwen3 235B A22B Instruct 2507 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 235B A22B Instruct 2507 using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.22 / $0.88
    Per 1M Tokens (input/output)

    Qwen3 235B A22B Instruct 2507 FAQs

    What is Qwen3-235B-A22B-Instruct-2507 and who developed it?

    Qwen3-235B-A22B-Instruct-2507 is an instruction-tuned, non-thinking mode large language model developed by Alibaba’s Qwen team. It is a mixture-of-experts (MoE) model with 235 billion total parameters (22B active) and is optimized for reasoning, tool use, coding, and long-context tasks.

    What applications and use cases does Qwen3-235B-A22B-Instruct-2507 excel at?

    The model is designed for:

    • Complex reasoning (e.g., AIME25, HMMT25)
    • Instruction following and logic tasks
    • Coding (e.g., MultiPL-E, LiveCodeBench)
    • Long-context comprehension (supports up to 1M tokens)
    • Multilingual knowledge and creative writing
    What is the maximum context length for Qwen3-235B-A22B-Instruct-2507?

    The model supports a native context length of 262,144 tokens, and can be extended up to 1,010,000 tokens using Dual Chunk Attention and sparse attention mechanisms.

    What is the usable context window for Qwen3-235B-A22B-Instruct-2507?

    While the model supports up to 1M tokens, the recommended usable context for most tasks is up to 16,384 tokens, due to memory and latency considerations.

    Does Qwen3-235B-A22B-Instruct-2507 support quantized formats (4-bit/8-bit)?

    Yes. The model is available in FP8 quantized format, which improves inference speed and reduces memory usage.

    What is the maximum output length Fireworks allows for Qwen3-235B-A22B-Instruct-2507?

    The recommended maximum output length is 16,384 tokens, aligned with guidance from the Qwen team for generation quality and stability.

    What are known failure modes of Qwen3-235B-A22B-Instruct-2507?

    Known challenges include:

    • VRAM-related issues when attempting 1M context inference without proper configuration
    • Slight performance tradeoffs in long contexts with sparse attention
    • Some reports of alignment inconsistencies in subjective tasks
    Does Qwen3-235B-A22B-Instruct-2507 support streaming responses and function-calling schemas?

    Yes. The model supports streaming generation and agentic tool use via Qwen-Agent, which provides built-in support for function-calling and tool integration through configurable MCP files.

    How many parameters does Qwen3-235B-A22B-Instruct-2507 have?

    The model has 235 billion total parameters, with 22 billion active per token using a Mixture-of-Experts architecture.

    How are tokens counted (prompt vs completion)?

    Token pricing is split between input and output:

    • $0.22 per 1M input tokens
    • $0.88 per 1M output tokens
    What license governs commercial use of Qwen3-235B-A22B-Instruct-2507?

    The model is released under the Apache 2.0 license, permitting commercial use with attribution.

    Metadata

    State
    Ready
    Created on
    7/21/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen3-235B-A22B-Instruct-2507-FP8

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    235.1B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    262.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported