Qwen3 Next 80B A3B Thinking

Qwen3 Next 80B A3B Thinking is a state-of-the-art mixture-of-experts (MoE) language model with 3 billion activated parameters and 80 billion total parameters. It features a hybrid attention architecture, supports contexts up to 262K tokens, and includes enhanced reasoning capabilities. To ensure sufficient GPU memory capacity, we recommend deploying this model on 2 NVIDIA H200 or 4 NVIDIA H100 GPUs.

Qwen3 Next 80B A3B Thinking API Features

On-demand Deployment

Docs

On-demand deployments give you dedicated GPUs for Qwen3 Next 80B A3B Thinking using Fireworks' reliable, high-performance system with no rate limits.

Metadata

State

Ready

Created on

9/16/2025

Kind

Base model

Provider

Qwen

Hugging Face

Qwen3-Next-80B-A3B-Thinking

Specification

Calibrated

Mixture-of-Experts

Parameters

80B

Supported Functionality

Fine-tuning

Not supported

Serverless

Not supported

Serverless LoRA

Not supported

Context Length

N/A

Function Calling

Not supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported