Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 30B-A3B
Quen Logo Mark

Qwen3 30B-A3B

Ready
fireworks/qwen3-30b-a3b

    Latest Qwen3 state of the art model, 30B with 3B active parameter model

    Qwen3 30B-A3B API Features

    Fine-tuning

    Docs

    Qwen3 30B-A3B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 30B-A3B using Fireworks' reliable, high-performance system with no rate limits.

    Qwen3 30B-A3B FAQs

    What is Qwen3-30B-A3B and who developed it?

    Qwen3-30B-A3B is a Mixture-of-Experts (MoE) large language model developed by Qwen (Alibaba Group). It is part of the Qwen3 family and is designed to balance high performance with efficient inference. The model has 30.5 billion total parameters, with 3.3 billion active per forward pass.

    What applications and use cases does Qwen3-30B-A3B excel at?

    Qwen3-30B-A3B is optimized for:

    • Conversational AI
    • Code assistance
    • Agentic systems
    • Search
    • Multimedia
    • Enterprise RAG

    It also supports over 100 languages, making it suitable for multilingual instruction and translation.

    What is the maximum context length for Qwen3-30B-A3B?

    The native context length is 32,768 tokens, with extended support up to 131,072 tokens using the YaRN method on Fireworks.

    What is the usable context window for Qwen3-30B-A3B?

    The usable context on Fireworks AI is 131.1K tokens with YaRN rope scaling enabled.

    Does Qwen3-30B-A3B support quantized formats (4-bit/8-bit)?

    Yes: 100 quantized variants, confirming broad format support including 4-bit and 8-bit quantizations.

    What is the maximum output length Fireworks allows for Qwen3-30B-A3B?

    Recommended maximum output is 32,768 tokens, with support for up to 38,912 tokens for complex tasks such as math or programming benchmarks.

    What are known failure modes of Qwen3-30B-A3B?

    Known risks include:

    • Endless repetition when using greedy decoding
    • Language mixing with high presence_penalty values
    • Performance degradation when improperly using YaRN on short prompts
    Does Qwen3-30B-A3B support streaming responses and function-calling schemas?

    Streaming is supported via Fireworks and frameworks like vLLM.

    Function-calling is not supported on Fireworks.

    How many parameters does Qwen3-30B-A3B have?

    Qwen3-30B-A3B has 30.5 billion total parameters, with 3.3 billion active parameters, and uses 8 experts out of a pool of 128 during inference.

    Is fine-tuning supported for Qwen3-30B-A3B?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    This model is available both serverless with pay-per-token pricing and on-demand with no rate limits.

    What license governs commercial use of Qwen3-30B-A3B?

    Qwen3-30B-A3B is released under the Apache 2.0 License, which permits commercial use.

    Metadata

    State
    Ready
    Created on
    4/28/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen3-30B-A3B

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    30.5B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported