Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2.5-VL 72B Instruct
Quen Logo Mark

Qwen2.5-VL 72B Instruct

Ready
fireworks/qwen2p5-vl-72b-instruct

    Qwen2.5-VL is a multimodal large language model series developed by Qwen team, Alibaba Cloud, available in 3B, 7B, 32B, and 72B sizes

    Qwen2.5-VL 72B Instruct API Features

    Fine-tuning

    Docs

    Qwen2.5-VL 72B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2.5-VL 72B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Qwen2.5-VL 72B Instruct FAQs

    What is Qwen2.5-VL 72B Instruct and who developed it?

    Qwen2.5-VL 72B Instruct is a multimodal instruction-tuned model developed by Qwen (Alibaba Group). It is the largest model in the Qwen2.5-VL series, supporting vision-language tasks including image, video, and document understanding .

    What applications and use cases does Qwen2.5-VL 72B Instruct excel at?

    This model is optimized for:

    • Image and document analysis (charts, forms, invoices, tables)
    • Video comprehension (event localization, temporal analysis)
    • Visual agent tasks (tool use, structured output)
    • Multimodal RAG and interactive assistants
    • Screen and mobile UI understanding
    What is the maximum context length for Qwen2.5-VL 72B Instruct?
    • Default context length: 32,768 tokens
    • Extended context: Up to 128K tokens using YaRN

    Note: YaRN is not recommended for tasks requiring precise visual localization

    What is the usable context window for Qwen2.5-VL 72B Instruct?

    On Fireworks, the model supports the full 128K context window on on-demand deployments.

    What are known failure modes of Qwen2.5-VL 72B Instruct?
    • Performance degradation when using YaRN on spatial/temporal tasks
    • No support for embeddings or reranking
    • Lack of function/tool calling integration despite agentic positioning
    • Memory and compute demands for high-resolution video inference
    How many parameters does Qwen2.5-VL 72B Instruct have?

    The model has 73.4 billion parameters.

    Is fine-tuning supported for Qwen2.5-VL 72B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPUs for this model.

    What rate limits apply on the shared endpoint?
    • Serverless: Not supported
    • On-demand: Supported with no rate limits on dedicated GPUs
    What license governs commercial use of Qwen2.5-VL 72B Instruct?

    The model is released under the Tongyi Qianwen license.

    Metadata

    State
    Ready
    Created on
    3/31/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2.5-VL-72B-Instruct

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    73.4B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    128k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Supported