Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 235B A22B
fireworks/qwen3-235b-a22b

    Latest Qwen3 state of the art model, 235B with 22B active parameter model

    Qwen3 235B A22B API Features

    Fine-tuning

    Docs

    Qwen3 235B A22B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 235B A22B using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.22 / $0.88
    Per 1M Tokens (input/output)

    Qwen3 235B A22B FAQs

    What is Qwen3-235B-A22B and who developed it?

    Qwen3-235B-A22B is a large Mixture-of-Experts (MoE) language model developed by Qwen (Alibaba Group). It is part of the Qwen3 series and includes 235 billion total parameters, with 22 billion active at inference time. The model features dual-mode reasoning (“thinking” and “non-thinking”), agent capabilities, and multilingual instruction following.

    What applications and use cases does Qwen3-235B-A22B excel at?

    Qwen3-235B-A22B is optimized for:

    • Code assistance
    • Conversational AI
    • Agentic systems and tool integration
    • Search
    • Multimedia
    • Enterprise RAG
    What is the maximum context length for Qwen3-235B-A22B?

    The model natively supports 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling on Fireworks.

    What is the usable context window for Qwen3-235B-A22B?

    Fireworks supports up to 131.1K tokens of usable context with YaRN scaling enabled.

    Does Qwen3-235B-A22B support quantized formats (4-bit/8-bit)?

    Yes, it supports 43 quantized variants, including 4-bit and 8-bit formats.

    What is the maximum output length Fireworks allows for Qwen3-235B-A22B?

    Recommended output length is 32,768 tokens, with support for up to 38,912 tokens in long-form benchmarking scenarios (e.g., math, programming).

    What are known failure modes of Qwen3-235B-A22B?
    • Endless repetition when using greedy decoding
    • Performance degradation with inappropriate rope_scaling on short contexts
    • Potential language mixing at high presence_penalty values
    Does Qwen3-235B-A22B support streaming responses and function-calling schemas?

    Yes, streaming (and frameworks like vLLM) and function calling are supported on Fireworks.

    How many parameters does Qwen3-235B-A22B have?

    Qwen3-235B-A22B has 235.1 billion total parameters with 22 billion active parameters.

    Is fine-tuning supported for Qwen3-235B-A22B?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    When deployed on-demand, there are no rate limits. Serverless mode is also available for pay-per-token access.

    What license governs commercial use of Qwen3-235B-A22B?

    The model is released under the Apache 2.0 license, which allows commercial use with proper attribution.

    Metadata

    State
    Ready
    Created on
    4/29/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen3-235B-A22B

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    235.1B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported