Latest Qwen3 state of the art model, 235B with 22B active parameter model
Fine-tuningDocs | Qwen3 235B A22B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen3 235B A22B using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
Qwen3-235B-A22B is a large Mixture-of-Experts (MoE) language model developed by Qwen (Alibaba Group). It is part of the Qwen3 series and includes 235 billion total parameters, with 22 billion active at inference time. The model features dual-mode reasoning (“thinking” and “non-thinking”), agent capabilities, and multilingual instruction following.
Qwen3-235B-A22B is optimized for:
The model natively supports 32,768 tokens and can be extended to 131,072 tokens using YaRN scaling on Fireworks.
Fireworks supports up to 131.1K tokens of usable context with YaRN scaling enabled.
Yes, it supports 43 quantized variants, including 4-bit and 8-bit formats.
Recommended output length is 32,768 tokens, with support for up to 38,912 tokens in long-form benchmarking scenarios (e.g., math, programming).
rope_scaling on short contextspresence_penalty valuesYes, streaming (and frameworks like vLLM) and function calling are supported on Fireworks.
Qwen3-235B-A22B has 235.1 billion total parameters with 22 billion active parameters.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
The model is released under the Apache 2.0 license, which allows commercial use with proper attribution.