GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen3 30B-A3B
Quen Logo Mark

Qwen3 30B-A3B

Ready
model path:accounts/fireworks/models/qwen3-30b-a3b

Latest Qwen3 state of the art model, 30B with 3B active parameter model

Qwen3 30B-A3B API Features

Fine-tuning

Docs

Qwen3 30B-A3B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen3 30B-A3B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen3 30B-A3B FAQs

What is Qwen3-30B-A3B and who developed it?

Qwen3-30B-A3B is a Mixture-of-Experts (MoE) large language model developed by Qwen (Alibaba Group). It is part of the Qwen3 family and is designed to balance high performance with efficient inference. The model has 30.5 billion total parameters, with 3.3 billion active per forward pass.

What applications and use cases does Qwen3-30B-A3B excel at?

Qwen3-30B-A3B is optimized for:

  • Conversational AI
  • Code assistance
  • Agentic systems
  • Search
  • Multimedia
  • Enterprise RAG

It also supports over 100 languages, making it suitable for multilingual instruction and translation.

What is the maximum context length for Qwen3-30B-A3B?

The native context length is 32,768 tokens, with extended support up to 131,072 tokens using the YaRN method on Fireworks.

What is the usable context window for Qwen3-30B-A3B?

The usable context on Fireworks AI is 131.1K tokens with YaRN rope scaling enabled.

Does Qwen3-30B-A3B support quantized formats (4-bit/8-bit)?

Yes: 100 quantized variants, confirming broad format support including 4-bit and 8-bit quantizations.

What is the maximum output length Fireworks allows for Qwen3-30B-A3B?

Recommended maximum output is 32,768 tokens, with support for up to 38,912 tokens for complex tasks such as math or programming benchmarks.

What are known failure modes of Qwen3-30B-A3B?

Known risks include:

  • Endless repetition when using greedy decoding
  • Language mixing with high presence_penalty values
  • Performance degradation when improperly using YaRN on short prompts
Does Qwen3-30B-A3B support streaming responses and function-calling schemas?

Streaming is supported via Fireworks and frameworks like vLLM.

Function-calling is not supported on Fireworks.

How many parameters does Qwen3-30B-A3B have?

Qwen3-30B-A3B has 30.5 billion total parameters, with 3.3 billion active parameters, and uses 8 experts out of a pool of 128 during inference.

Is fine-tuning supported for Qwen3-30B-A3B?

Yes. Fireworks supports LoRA-based fine-tuning for this model.

What rate limits apply on the shared endpoint?

This model is available both serverless with pay-per-token pricing and on-demand with no rate limits.

What license governs commercial use of Qwen3-30B-A3B?

Qwen3-30B-A3B is released under the Apache 2.0 License, which permits commercial use.

Metadata

State
Ready
Created on
4/28/2025
Kind
Base model
Provider
Qwen
Hugging Face
Qwen/Qwen3-30B-A3B

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
30.5B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported