Qwen2.5 are a series of decoder-only language models developed by Qwen team, Alibaba Cloud, available in 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B sizes, and base and instruct variants.
Fine-tuningDocs | Qwen2.5 7B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Qwen2.5 7B Instruct using Fireworks' reliable, high-performance system with no rate limits. |
Qwen2.5-72B Instruct is a 72.7B parameter instruction-tuned language model developed by Qwen, a team under Alibaba Cloud. It is part of the Qwen2.5 series, which improves on Qwen2 by enhancing instruction following, structured data understanding, code and math reasoning, and long-text generation.
The model is optimized for:
It supports structured output (e.g., JSON), multi-turn chat, and strong multilingual capability across 29+ languages.
The model supports a default context length of 32,768 tokens, which can be extended to 131,072 tokens using YaRN extrapolation.
The full 131K token context window is usable on Fireworks with rope_scaling (YaRN) configured for long sequences.
Yes. There are 83 quantized versions of this model, including 4-bit and 8-bit variants.
The generation length is benchmarked at 8,192 tokens, and outputs are constrained by the 131K total context window (input + output).
transformers versions older than v4.37.0Streaming is not supported, but the model does support function calling.
The model has 72.7 billion total parameters (70.0 billion non-embedding parameters) and uses an 80-layer architecture with grouped-query attention (GQA) featuring 64 query heads and 8 key-value heads.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
Token usage is billed on input + output tokens. Prompt formatting affects input size; generation defaults are user-configurable.
The model is released under the Qianwen License, a custom license by Alibaba Group.