Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).
Fine-tuningDocs | Qwen2.5-Coder 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen2.5-Coder 32B Instruct using Fireworks' reliable, high-performance system with no rate limits. |
Qwen2.5-Coder 32B Instruct is a large, instruction-tuned code model developed by Qwen (Alibaba Group). It is part of the Qwen2.5-Coder series (formerly CodeQwen), which expands on Qwen2.5 with code-specific training and performance optimizations. The model achieves code generation performance on par with GPT-4o.
This model excels at:
It also supports general text reasoning, making it suitable for assistant-style interactions in developer environments.
The model natively supports 32,768 tokens, which can be extended to 131,072 tokens using YaRN extrapolation.
The full 131K token context window is usable on Fireworks when configured with rope_scaling (YaRN).
There are over 110 quantized versions, including 4-bit and 8-bit formats.
No fixed output cap is published. Output length is constrained by the 131K token total context window (input + output combined).
rope_scaling (YaRN)transformers < v4.37.0No, streaming responses and function calling are not supported for this model.
The model has 32.5 billion total parameters (31.0 billion non-embedding parameters) and uses a 64-layer architecture with grouped-query attention (GQA) featuring 40 query heads and 8 key-value heads.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
General billing is based on input + output token usage.
The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.