Latest Qwen3 state of the art model, 32B model
Fine-tuningDocs | Qwen3 32B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen3 32B using Fireworks' reliable, high-performance system with no rate limits. |
Qwen3 32B is a 32.8 billion parameter base language model developed by Qwen (Alibaba Group). It is part of the third-generation Qwen series, which introduces a dual-mode architecture (thinking vs. non-thinking) for improved performance in reasoning, coding, and dialogue tasks.
The model is optimized for:
It supports both general-purpose dialogue and complex logical tasks.
The model supports a native context length of 32,768 tokens, extendable to 131,072 tokens using YaRN (rope scaling).
Fireworks supports the full 131.1K token window on on-demand deployments.
Thinking mode uses temperature=0.6, top_p=0.95, and top_k=20.
Non-thinking mode uses temperature=0.7, top_p=0.8, and top_k=20.
Greedy decoding is discouraged to avoid repetition and degraded performance.
The recommended output length is up to 32,768 tokens, with a maximum of 38,912 tokens for complex benchmarks (e.g., code/math reasoning)
transformers < v4.51.0Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPU deployments.
Qwen3 32B is released under the Apache 2.0 license, which permits unrestricted commercial use.