GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen2.5-VL 32B Instruct
Quen Logo Mark

Qwen2.5-VL 32B Instruct

Ready
model path:accounts/fireworks/models/qwen2p5-vl-32b-instruct

Qwen2.5-VL is a multimodal large language model series developed by Qwen team, Alibaba Cloud, available in 3B, 7B, 32B, and 72B sizes

Qwen2.5-VL 32B Instruct API Features

Fine-tuning

Docs

Qwen2.5-VL 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen2.5-VL 32B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen2.5-VL 32B Instruct FAQs

What is Qwen2.5-VL 32B Instruct and who developed it?

Qwen2.5-VL 32B Instruct is a 33.5B parameter multimodal vision-language model developed by Qwen (Alibaba Cloud). It is part of the Qwen2.5-VL series, designed to support image-text reasoning, document understanding, video comprehension, and agentic tool use.

What applications and use cases does Qwen2.5-VL 32B Instruct excel at?

The model is optimized for:

  • Multimodal reasoning over images, charts, documents, and UI
  • Long video understanding and event detection
  • Visual grounding (bounding box, point-level, structured JSON output)
  • Conversational AI involving visual context
  • Enterprise RAG, agentic systems, and multimedia chat
What is the maximum context length for Qwen2.5-VL 32B Instruct?

Fireworks supports up to 128,000 tokens, while the default config.json supports 32,768 tokens with optional YaRN extrapolation to extend further (e.g., 64K or 131K).

What is the usable context window for Qwen2.5-VL 32B Instruct?

Fireworks enables the full 128K token window on on-demand deployments.

What is the maximum output length Fireworks allows for Qwen2.5-VL 32B Instruct?

Output length is bounded by the 128K token context window.

What are known failure modes of Qwen2.5-VL 32B Instruct?
  • Performance degradation on temporal/spatial localization when using YaRN
  • Memory bottlenecks with high-resolution video input
  • Error-prone transformer integration if not built from source (e.g., KeyError: 'qwen2_5_vl')
  • Reduced accuracy in dense visual tasks under long-context settings
Does Qwen2.5-VL 32B Instruct support streaming responses and function-calling schemas?

No, streaming and function calling are not supported.

How many parameters does Qwen2.5-VL 32B Instruct have?

The model has 33.5 billion parameters.

Is fine-tuning supported for Qwen2.5-VL 32B Instruct?

Yes. Fireworks supports LoRA-based fine-tuning for this model.

What rate limits apply on the shared endpoint?

The model is supported on serverless deployment at $0.90 per million tokens and on-demand deployment with no rate limits.

What license governs commercial use of Qwen2.5-VL 32B Instruct?

The model is licensed under the Apache 2.0 License, which allows unrestricted commercial use.

Metadata

State
Ready
Created on
3/31/2025
Kind
Base model
Provider
Qwen

Specification

Calibrated
Yes
Mixture-of-Experts
No
Parameters
35.8B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
128k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported