DeepSeek R1 0528 Distill Qwen3 8B

We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

DeepSeek R1 0528 Distill Qwen3 8B API Features

Fine-tuning Docs	DeepSeek R1 0528 Distill Qwen3 8B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for DeepSeek R1 0528 Distill Qwen3 8B using Fireworks' reliable, high-performance system with no rate limits.

Metadata

State

Ready

Created on

6/3/2025

Kind

Base model

Provider

Deepseek

Hugging Face

DeepSeek-R1-0528-Qwen3-8B

Specification

Calibrated

Mixture-of-Experts

Parameters

8.2B

Supported Functionality

Fine-tuning

Supported

Serverless

Not supported

Serverless LoRA

Not supported

Context Length

131.1k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

DeepSeek R1 0528 Distill Qwen3 8B