Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2.5-Coder 32B Instruct
Quen Logo Mark

Qwen2.5-Coder 32B Instruct

Ready
fireworks/qwen2p5-coder-32b-instruct

    Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).

    Qwen2.5-Coder 32B Instruct API Features

    Fine-tuning

    Docs

    Qwen2.5-Coder 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2.5-Coder 32B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Qwen2.5-Coder 32B Instruct FAQs

    What is Qwen2.5-Coder 32B Instruct and who developed it?

    Qwen2.5-Coder 32B Instruct is a large, instruction-tuned code model developed by Qwen (Alibaba Group). It is part of the Qwen2.5-Coder series (formerly CodeQwen), which expands on Qwen2.5 with code-specific training and performance optimizations. The model achieves code generation performance on par with GPT-4o.

    What applications and use cases does Qwen2.5-Coder 32B Instruct excel at?

    This model excels at:

    • Code generation
    • Code reasoning and fixing
    • Mathematical problem solving
    • Conversational agents with coding capabilities
    • Enterprise and agentic RAG systems

    It also supports general text reasoning, making it suitable for assistant-style interactions in developer environments.

    What is the maximum context length for Qwen2.5-Coder 32B Instruct?

    The model natively supports 32,768 tokens, which can be extended to 131,072 tokens using YaRN extrapolation.

    What is the usable context window for Qwen2.5-Coder 32B Instruct?

    The full 131K token context window is usable on Fireworks when configured with rope_scaling (YaRN).

    Does Qwen2.5-Coder 32B Instruct support quantized formats (4-bit/8-bit)?

    There are over 110 quantized versions, including 4-bit and 8-bit formats.

    What is the maximum output length Fireworks allows for Qwen2.5-Coder 32B Instruct?

    No fixed output cap is published. Output length is constrained by the 131K token total context window (input + output combined).

    What are known failure modes of Qwen2.5-Coder 32B Instruct?
    • Performance degradation on short prompts when using static rope_scaling (YaRN)
    • Compatibility errors with transformers < v4.37.0
    • May require carefully formatted system/user prompts for optimal instruction adherence
    Does Qwen2.5-Coder 32B Instruct support streaming responses and function-calling schemas?

    No, streaming responses and function calling are not supported for this model.

    How many parameters does Qwen2.5-Coder 32B Instruct have?

    The model has 32.5 billion total parameters (31.0 billion non-embedding parameters) and uses a 64-layer architecture with grouped-query attention (GQA) featuring 40 query heads and 8 key-value heads.

    Is fine-tuning supported for Qwen2.5-Coder 32B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    How are tokens counted (prompt vs completion)?

    General billing is based on input + output token usage.

    What rate limits apply on the shared endpoint?

    On-demand deployment is available with no rate limits using dedicated GPUs. Serverless deployment is not supported.

    What license governs commercial use of Qwen2.5-Coder 32B Instruct?

    The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    11/12/2024
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2.5-Coder-32B-Instruct

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    32.8B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    32.8k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported