GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen2.5-VL 72B Instruct
Quen Logo Mark

Qwen2.5-VL 72B Instruct

Ready
model path:accounts/fireworks/models/qwen2p5-vl-72b-instruct

Qwen2.5-VL is a multimodal large language model series developed by Qwen team, Alibaba Cloud, available in 3B, 7B, 32B, and 72B sizes

Qwen2.5-VL 72B Instruct API Features

Fine-tuning

Docs

Qwen2.5-VL 72B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen2.5-VL 72B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen2.5-VL 72B Instruct FAQs

What is Qwen2.5-VL 72B Instruct and who developed it?

Qwen2.5-VL 72B Instruct is a multimodal instruction-tuned model developed by Qwen (Alibaba Group). It is the largest model in the Qwen2.5-VL series, supporting vision-language tasks including image, video, and document understanding .

What applications and use cases does Qwen2.5-VL 72B Instruct excel at?

This model is optimized for:

  • Image and document analysis (charts, forms, invoices, tables)
  • Video comprehension (event localization, temporal analysis)
  • Visual agent tasks (tool use, structured output)
  • Multimodal RAG and interactive assistants
  • Screen and mobile UI understanding
What is the maximum context length for Qwen2.5-VL 72B Instruct?
  • Default context length: 32,768 tokens
  • Extended context: Up to 128K tokens using YaRN

Note: YaRN is not recommended for tasks requiring precise visual localization

What is the usable context window for Qwen2.5-VL 72B Instruct?

On Fireworks, the model supports the full 128K context window on on-demand deployments.

What are known failure modes of Qwen2.5-VL 72B Instruct?
  • Performance degradation when using YaRN on spatial/temporal tasks
  • No support for embeddings or reranking
  • Lack of function/tool calling integration despite agentic positioning
  • Memory and compute demands for high-resolution video inference
How many parameters does Qwen2.5-VL 72B Instruct have?

The model has 73.4 billion parameters.

Is fine-tuning supported for Qwen2.5-VL 72B Instruct?

Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPUs for this model.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Supported with no rate limits on dedicated GPUs
What license governs commercial use of Qwen2.5-VL 72B Instruct?

The model is released under the Tongyi Qianwen license.

Metadata

State
Ready
Created on
3/31/2025
Kind
Base model
Provider
Qwen

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
73.4B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
128k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported