GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen2.5-Coder 32B Instruct
Quen Logo Mark

Qwen2.5-Coder 32B Instruct

Ready
model path:accounts/fireworks/models/qwen2p5-coder-32b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).

Qwen2.5-Coder 32B Instruct API Features

Fine-tuning

Docs

Qwen2.5-Coder 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen2.5-Coder 32B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen2.5-Coder 32B Instruct FAQs

What is Qwen2.5-Coder 32B Instruct and who developed it?

Qwen2.5-Coder 32B Instruct is a large, instruction-tuned code model developed by Qwen (Alibaba Group). It is part of the Qwen2.5-Coder series (formerly CodeQwen), which expands on Qwen2.5 with code-specific training and performance optimizations. The model achieves code generation performance on par with GPT-4o.

What applications and use cases does Qwen2.5-Coder 32B Instruct excel at?

This model excels at:

  • Code generation
  • Code reasoning and fixing
  • Mathematical problem solving
  • Conversational agents with coding capabilities
  • Enterprise and agentic RAG systems

It also supports general text reasoning, making it suitable for assistant-style interactions in developer environments.

What is the maximum context length for Qwen2.5-Coder 32B Instruct?

The model natively supports 32,768 tokens, which can be extended to 131,072 tokens using YaRN extrapolation.

What is the usable context window for Qwen2.5-Coder 32B Instruct?

The full 131K token context window is usable on Fireworks when configured with rope_scaling (YaRN).

Does Qwen2.5-Coder 32B Instruct support quantized formats (4-bit/8-bit)?

There are over 110 quantized versions, including 4-bit and 8-bit formats.

What is the maximum output length Fireworks allows for Qwen2.5-Coder 32B Instruct?

No fixed output cap is published. Output length is constrained by the 131K token total context window (input + output combined).

What are known failure modes of Qwen2.5-Coder 32B Instruct?
  • Performance degradation on short prompts when using static rope_scaling (YaRN)
  • Compatibility errors with transformers < v4.37.0
  • May require carefully formatted system/user prompts for optimal instruction adherence
Does Qwen2.5-Coder 32B Instruct support streaming responses and function-calling schemas?

No, streaming responses and function calling are not supported for this model.

How many parameters does Qwen2.5-Coder 32B Instruct have?

The model has 32.5 billion total parameters (31.0 billion non-embedding parameters) and uses a 64-layer architecture with grouped-query attention (GQA) featuring 40 query heads and 8 key-value heads.

Is fine-tuning supported for Qwen2.5-Coder 32B Instruct?

Yes. Fireworks supports LoRA-based fine-tuning for this model.

How are tokens counted (prompt vs completion)?

General billing is based on input + output token usage.

What rate limits apply on the shared endpoint?

On-demand deployment is available with no rate limits using dedicated GPUs. Serverless deployment is not supported.

What license governs commercial use of Qwen2.5-Coder 32B Instruct?

The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.

Metadata

State
Ready
Created on
11/12/2024
Kind
Base model
Provider
Qwen

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
32.7B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
32.7k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported