OpenAI gpt-oss-120b & 20b, open weight models designed for reasoning, agentic tasks, and versatile developer use cases is now available! Try Now

Deepseek Logo Mark

DeepSeek R1 0528 Distill Qwen3 8B

We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

Try Model

Fireworks Features

Fine-tuning

DeepSeek R1 0528 Distill Qwen3 8B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Learn More

On-demand Deployment

On-demand deployments give you dedicated GPUs for DeepSeek R1 0528 Distill Qwen3 8B using Fireworks' reliable, high-performance system with no rate limits.

Learn More

Info

Provider

Deepseek

Model Type

LLM

Context Length

131072

Fine-Tuning

Available

Pricing Per 1M Tokens

$0.2