Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2.5 7B Instruct
Quen Logo Mark

Qwen2.5 7B Instruct

Ready
fireworks/qwen2p5-7b-instruct

    Qwen2.5 are a series of decoder-only language models developed by Qwen team, Alibaba Cloud, available in 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B sizes, and base and instruct variants.

    Qwen2.5 7B Instruct API Features

    Fine-tuning

    Docs

    Qwen2.5 7B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2.5 7B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Qwen2.5 72B Instruct FAQs

    What is Qwen2.5-72B Instruct and who developed it?

    Qwen2.5-72B Instruct is a 72.7B parameter instruction-tuned language model developed by Qwen, a team under Alibaba Cloud. It is part of the Qwen2.5 series, which improves on Qwen2 by enhancing instruction following, structured data understanding, code and math reasoning, and long-text generation.

    What applications and use cases does Qwen2.5-72B Instruct excel at?

    The model is optimized for:

    • Code assistance
    • Conversational AI
    • Agentic systems
    • Search and enterprise RAG
    • Multimedia reasoning

    It supports structured output (e.g., JSON), multi-turn chat, and strong multilingual capability across 29+ languages.

    What is the maximum context length for Qwen2.5-72B Instruct?

    The model supports a default context length of 32,768 tokens, which can be extended to 131,072 tokens using YaRN extrapolation.

    What is the usable context window for Qwen2.5-72B Instruct?

    The full 131K token context window is usable on Fireworks with rope_scaling (YaRN) configured for long sequences.

    Does Qwen2.5-72B Instruct support quantized formats (4-bit/8-bit)?

    Yes. There are 83 quantized versions of this model, including 4-bit and 8-bit variants.

    What is the maximum output length Fireworks allows for Qwen2.5-72B Instruct?

    The generation length is benchmarked at 8,192 tokens, and outputs are constrained by the 131K total context window (input + output).

    What are known failure modes of Qwen2.5-72B Instruct?
    • Short-prompt degradation with static rope scaling
    • Compatibility issues with transformers versions older than v4.37.0
    • Instruction variance depending on formatting and template
    Does Qwen2.5-72B Instruct support streaming responses and function-calling schemas?

    Streaming is not supported, but the model does support function calling.

    How many parameters does Qwen2.5-72B Instruct have?

    The model has 72.7 billion total parameters (70.0 billion non-embedding parameters) and uses an 80-layer architecture with grouped-query attention (GQA) featuring 64 query heads and 8 key-value heads.

    Is fine-tuning supported for Qwen2.5-72B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    How are tokens counted (prompt vs completion)?

    Token usage is billed on input + output tokens. Prompt formatting affects input size; generation defaults are user-configurable.

    What rate limits apply on the shared endpoint?

    Serverless deployment is not supported, but on-demand deployment is available with no rate limits on dedicated GPUs.

    What license governs commercial use of Qwen2.5-72B Instruct?

    The model is released under the Qianwen License, a custom license by Alibaba Group.

    Metadata

    State
    Ready
    Created on
    10/2/2024
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2.5-7B-Instruct

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    No
    Parameters
    7.6B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    32.8k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported