Qwen2.5-VL 72B Instruct API

What is Qwen2.5-VL 72B Instruct and who developed it?

Qwen2.5-VL 72B Instruct is a multimodal instruction-tuned model developed by Qwen (Alibaba Group). It is the largest model in the Qwen2.5-VL series, supporting vision-language tasks including image, video, and document understanding .

What applications and use cases does Qwen2.5-VL 72B Instruct excel at?

This model is optimized for:

•Image and document analysis (charts, forms, invoices, tables)
•Video comprehension (event localization, temporal analysis)
•Visual agent tasks (tool use, structured output)
•Multimodal RAG and interactive assistants
•Screen and mobile UI understanding

What is the maximum context length for Qwen2.5-VL 72B Instruct?

•Default context length: 32,768 tokens
•Extended context: Up to 128K tokens using YaRN

Note: YaRN is not recommended for tasks requiring precise visual localization

What is the usable context window for Qwen2.5-VL 72B Instruct?

On Fireworks, the model supports the full 128K context window on on-demand deployments.

What are known failure modes of Qwen2.5-VL 72B Instruct?

•Performance degradation when using YaRN on spatial/temporal tasks
•No support for embeddings or reranking
•Lack of function/tool calling integration despite agentic positioning
•Memory and compute demands for high-resolution video inference

How many parameters does Qwen2.5-VL 72B Instruct have?

The model has 73.4 billion parameters.

Is fine-tuning supported for Qwen2.5-VL 72B Instruct?

Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPUs for this model.

What rate limits apply on the shared endpoint?

•Serverless: Not supported
•On-demand: Supported with no rate limits on dedicated GPUs

What license governs commercial use of Qwen2.5-VL 72B Instruct?

The model is released under the Tongyi Qianwen license.

Fine-tuning Docs	Qwen2.5-VL 72B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Qwen2.5-VL 72B Instruct using Fireworks' reliable, high-performance system with no rate limits.

Qwen2.5-VL 72B Instruct

Qwen2.5-VL 72B Instruct API Features

Fine-tuning

On-demand Deployment

Qwen2.5-VL 72B Instruct FAQs

Metadata

Specification

Supported Functionality