Latest Qwen3 state of the art model, 30B with 3B active parameter model
Fine-tuningDocs | Qwen3 30B-A3B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen3 30B-A3B using Fireworks' reliable, high-performance system with no rate limits. |
Qwen3-30B-A3B is a Mixture-of-Experts (MoE) large language model developed by Qwen (Alibaba Group). It is part of the Qwen3 family and is designed to balance high performance with efficient inference. The model has 30.5 billion total parameters, with 3.3 billion active per forward pass.
Qwen3-30B-A3B is optimized for:
It also supports over 100 languages, making it suitable for multilingual instruction and translation.
The native context length is 32,768 tokens, with extended support up to 131,072 tokens using the YaRN method on Fireworks.
The usable context on Fireworks AI is 131.1K tokens with YaRN rope scaling enabled.
Yes: 100 quantized variants, confirming broad format support including 4-bit and 8-bit quantizations.
Recommended maximum output is 32,768 tokens, with support for up to 38,912 tokens for complex tasks such as math or programming benchmarks.
Known risks include:
presence_penalty valuesStreaming is supported via Fireworks and frameworks like vLLM.
Function-calling is not supported on Fireworks.
Qwen3-30B-A3B has 30.5 billion total parameters, with 3.3 billion active parameters, and uses 8 experts out of a pool of 128 during inference.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
Qwen3-30B-A3B is released under the Apache 2.0 License, which permits commercial use.