Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2 7B Instruct
Quen Logo Mark

Qwen2 7B Instruct

Ready
fireworks/qwen2-7b-instruct

    Qwen2 7B Instruct is a 7-billion-parameter instruction-tuned language model developed by the Qwen team. Optimized for following instructions, it excels at tasks like question answering, dialogue generation, and summarization. The model is designed to provide accurate and contextually appropriate responses, making it suitable for a wide range of natural language processing applications.

    Qwen2 7B Instruct API Features

    Fine-tuning

    Docs

    Qwen2 7B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2 7B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Qwen2 7B Instruct FAQs

    What is Qwen2 7B Instruct and who developed it?

    Qwen2 7B Instruct is an instruction-tuned language model developed by Qwen, a team at Alibaba Cloud. It is based on the Qwen2 architecture and optimized for general-purpose tasks such as question answering, summarization, dialogue generation, and reasoning.

    What applications and use cases does Qwen2 7B Instruct excel at?

    Qwen2 7B Instruct is tuned for:

    • Instruction-following and dialogue agents
    • Text summarization and generation
    • Code generation (e.g., HumanEval, MBPP, MultiPL-E)
    • Reasoning and mathematics
    • Multilingual tasks, especially English and Chinese
    What is the maximum context length for Qwen2 7B Instruct?

    131,072 tokens when using YaRN extrapolation techniques. Default context length is 32,768 tokens.

    What is the usable context window for Qwen2 7B Instruct?

    The supported context length for this model is 32.8K tokens on Fireworks. For longer sequences, YaRN must be explicitly configured using rope_scaling in the model config.

    What are known failure modes of Qwen2 7B Instruct?

    The model may underperform on long context processing without proper configuration of YaRN. On short inputs, performance can degrade if YaRN is enabled unnecessarily. Some benchmarks (e.g., GPQA) show slightly weaker performance compared to larger models.

    Does Qwen2 7B Instruct support streaming responses and function-calling schemas?

    Streaming responses and function calling are not supported for this model.

    Is fine-tuning supported for Qwen2 7B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    On-demand deployments are supported with no rate limits.

    What license governs commercial use of Qwen2 7B Instruct?

    Qwen2 7B Instruct is released under the Apache 2.0 license, which permits commercial use and modifications.

    Metadata

    State
    Ready
    Created on
    6/6/2024
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2-7B-Instruct

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    7.6B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    32.8k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported