Qwen2.5-VL is a multimodal large language model series developed by Qwen team, Alibaba Cloud, available in 3B, 7B, 32B, and 72B sizes
Fine-tuningDocs | Qwen2.5-VL 32B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen2.5-VL 32B Instruct using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
Qwen2.5-VL 32B Instruct is a 33.5B parameter multimodal vision-language model developed by Qwen (Alibaba Cloud). It is part of the Qwen2.5-VL series, designed to support image-text reasoning, document understanding, video comprehension, and agentic tool use.
The model is optimized for:
Fireworks supports up to 128,000 tokens, while the default config.json supports 32,768 tokens with optional YaRN extrapolation to extend further (e.g., 64K or 131K).
Fireworks enables the full 128K token window on on-demand deployments.
Output length is bounded by the 128K token context window.
KeyError: 'qwen2_5_vl')No, streaming and function calling are not supported.
The model has 33.5 billion parameters.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
The model is licensed under the Apache 2.0 License, which allows unrestricted commercial use.