We distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.
Fine-tuningDocs | DeepSeek R1 0528 Distill Qwen3 8B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments allow you to use DeepSeek R1 0528 Distill Qwen3 8B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |