Join us for "Own Your AI" night on 10/1 in SF featuring Meta, Uber, Upwork, and AWS. Register here

Deepseek Logo Mark

DeepSeek R1 0528 Distill Qwen3 8B

We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

Try Model

Fireworks Features

Fine-tuning

DeepSeek R1 0528 Distill Qwen3 8B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Learn More

On-demand Deployment

On-demand deployments give you dedicated GPUs for DeepSeek R1 0528 Distill Qwen3 8B using Fireworks' reliable, high-performance system with no rate limits.

Learn More

Info & Pricing

Provider

Deepseek

Model Type

LLM

Context Length

131072

Fine-Tuning

Available

Pricing Per 1M Tokens

$0.2