GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen2 72B Instruct
Quen Logo Mark

Qwen2 72B Instruct

Ready
model path:accounts/fireworks/models/qwen2-72b-instruct

Qwen2 72B Instruct is a 72 billion parameter model developed by Alibaba for instruction-tuned tasks. It excels in natural language understanding and generation tasks, including summarization, dialogue, and complex reasoning. Qwen2 is optimized for instruction-following, making it ideal for applications that require detailed and structured responses across a wide range of domains.

Qwen2 72B Instruct API Features

Fine-tuning

Docs

Qwen2 72B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen2 72B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Qwen2 72B Instruct FAQs

What is Qwen2-72B Instruct and who developed it?

Qwen2-72B Instruct is a 72.7 billion parameter instruction-tuned language model developed by Qwen (Alibaba Group). It is part of the Qwen2 series, optimized for natural language understanding, generation, and instruction following across complex domains like coding, math, and multilingual reasoning.

What applications and use cases does Qwen2-72B Instruct excel at?

The model is well-suited for:

  • Conversational AI
  • Enterprise RAG systems
  • Agentic systems
  • Search and multimedia tasks
  • Code generation and math reasoning

It shows strong performance in multilingual and structured output tasks.

What is the maximum context length for Qwen2-72B Instruct?

The model supports:

  • Native context length: 32,768 tokens
  • Extended context: Up to 131,072 tokens using YaRN (rope scaling extrapolation)
What is the usable context window for Qwen2-72B Instruct?

The full 131K token context window is usable when deployed with appropriate rope_scaling via vLLM or compatible runtime.

What are known failure modes of Qwen2-72B Instruct?
  • Static YaRN scaling can degrade performance on short prompts
  • Transformer compatibility issues with transformers < 4.37.0
  • No tool use or image input support
  • Requires apply_chat_template() for correct prompt formatting
How many parameters does Qwen2-72B Instruct have?

The model has 72.7 billion parameters.

Is fine-tuning supported for Qwen2-72B Instruct?

Yes. Fireworks supports LoRA-based fine-tuning on dedicated infrastructure.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Available with no rate limits on dedicated GPUs
What license governs commercial use of Qwen2-72B Instruct?

The model is licensed under Tongyi Qianwen, a custom license from Alibaba Group. It is not open-source under Apache/MIT and may have commercial restrictions.

Metadata

State
Ready
Created on
6/6/2024
Kind
Base model
Provider
Qwen

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
72.7B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
32.7k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported