Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen2.5-VL 32B Instruct
Quen Logo Mark

Qwen2.5-VL 32B Instruct

Ready
fireworks/qwen2p5-vl-32b-instruct

    Qwen2.5-VL is a multimodal large language model series developed by Qwen team, Alibaba Cloud, available in 3B, 7B, 32B, and 72B sizes

    Qwen2.5-VL 32B Instruct API Features

    Fine-tuning

    Docs

    Qwen2.5-VL 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen2.5-VL 32B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.90 / $0.90
    Per 1M Tokens (input/output)

    Qwen2.5-VL 32B Instruct FAQs

    What is Qwen2.5-VL 32B Instruct and who developed it?

    Qwen2.5-VL 32B Instruct is a 33.5B parameter multimodal vision-language model developed by Qwen (Alibaba Cloud). It is part of the Qwen2.5-VL series, designed to support image-text reasoning, document understanding, video comprehension, and agentic tool use.

    What applications and use cases does Qwen2.5-VL 32B Instruct excel at?

    The model is optimized for:

    • Multimodal reasoning over images, charts, documents, and UI
    • Long video understanding and event detection
    • Visual grounding (bounding box, point-level, structured JSON output)
    • Conversational AI involving visual context
    • Enterprise RAG, agentic systems, and multimedia chat
    What is the maximum context length for Qwen2.5-VL 32B Instruct?

    Fireworks supports up to 128,000 tokens, while the default config.json supports 32,768 tokens with optional YaRN extrapolation to extend further (e.g., 64K or 131K).

    What is the usable context window for Qwen2.5-VL 32B Instruct?

    Fireworks enables the full 128K token window on on-demand deployments.

    What is the maximum output length Fireworks allows for Qwen2.5-VL 32B Instruct?

    Output length is bounded by the 128K token context window.

    What are known failure modes of Qwen2.5-VL 32B Instruct?
    • Performance degradation on temporal/spatial localization when using YaRN
    • Memory bottlenecks with high-resolution video input
    • Error-prone transformer integration if not built from source (e.g., KeyError: 'qwen2_5_vl')
    • Reduced accuracy in dense visual tasks under long-context settings
    Does Qwen2.5-VL 32B Instruct support streaming responses and function-calling schemas?

    No, streaming and function calling are not supported.

    How many parameters does Qwen2.5-VL 32B Instruct have?

    The model has 33.5 billion parameters.

    Is fine-tuning supported for Qwen2.5-VL 32B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    The model is supported on serverless deployment at $0.90 per million tokens and on-demand deployment with no rate limits.

    What license governs commercial use of Qwen2.5-VL 32B Instruct?

    The model is licensed under the Apache 2.0 License, which allows unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    3/31/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen2.5-VL-32B-Instruct

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    No
    Parameters
    33.5B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Supported
    Context Length
    128k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Supported