Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2 72B Instruct
Quen Logo Mark

Qwen2 72B Instruct

Ready
fireworks/qwen2-72b-instruct

    Qwen2 72B Instruct is a 72 billion parameter model developed by Alibaba for instruction-tuned tasks. It excels in natural language understanding and generation tasks, including summarization, dialogue, and complex reasoning. Qwen2 is optimized for instruction-following, making it ideal for applications that require detailed and structured responses across a wide range of domains.

    Qwen2 72B Instruct API Features

    Fine-tuning

    Docs

    Qwen2 72B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2 72B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Qwen2 72B Instruct FAQs

    What is Qwen2-72B Instruct and who developed it?

    Qwen2-72B Instruct is a 72.7 billion parameter instruction-tuned language model developed by Qwen (Alibaba Group). It is part of the Qwen2 series, optimized for natural language understanding, generation, and instruction following across complex domains like coding, math, and multilingual reasoning.

    What applications and use cases does Qwen2-72B Instruct excel at?

    The model is well-suited for:

    • Conversational AI
    • Enterprise RAG systems
    • Agentic systems
    • Search and multimedia tasks
    • Code generation and math reasoning

    It shows strong performance in multilingual and structured output tasks.

    What is the maximum context length for Qwen2-72B Instruct?

    The model supports:

    • Native context length: 32,768 tokens
    • Extended context: Up to 131,072 tokens using YaRN (rope scaling extrapolation)
    What is the usable context window for Qwen2-72B Instruct?

    The full 131K token context window is usable when deployed with appropriate rope_scaling via vLLM or compatible runtime.

    What are known failure modes of Qwen2-72B Instruct?
    • Static YaRN scaling can degrade performance on short prompts
    • Transformer compatibility issues with transformers < 4.37.0
    • No tool use or image input support
    • Requires apply_chat_template() for correct prompt formatting
    How many parameters does Qwen2-72B Instruct have?

    The model has 72.7 billion parameters.

    Is fine-tuning supported for Qwen2-72B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning on dedicated infrastructure.

    What rate limits apply on the shared endpoint?
    • Serverless: Not supported
    • On-demand: Available with no rate limits on dedicated GPUs
    What license governs commercial use of Qwen2-72B Instruct?

    The model is licensed under Tongyi Qianwen, a custom license from Alibaba Group. It is not open-source under Apache/MIT and may have commercial restrictions.

    Metadata

    State
    Ready
    Created on
    6/6/2024
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2-72B-Instruct

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    72.7B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    32.8k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported