![Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?](/_next/image?url=https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fc285f3eb-d4f2-4ce1-8c53-25d0d3a0337b%2Fb43e1917-a368-4571-949a-92b47236e51d%2FScreenshot_2025-01-31_at_3.09.58_PM.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DASIAZI2LB4662NJMC5XA%252F20250212%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20250212T194106Z%26X-Amz-Expires%3D3600%26X-Amz-Security-Token%3DIQoJb3JpZ2luX2VjENv%252F%252F%252F%252F%252F%252F%252F%252F%252F%252FwEaCXVzLXdlc3QtMiJHMEUCIQC45tsAThyK90Ew2GqOGifxQhed3UTyUmITY4ME%252BNhlIgIgbv09ppvFsgdnT1PbmDOKu%252BipJxlWHN%252BNI9YFPT%252F%252BvZMqiAQI9P%252F%252F%252F%252F%252F%252F%252F%252F%252F%252FARAAGgw2Mzc0MjMxODM4MDUiDE%252BmaGNW1rrgD2hHPCrcA%252FQe39nanULwDnPVYrKroF7HyNZs5Jo%252BpGhcp%252Bpw3OCslBtNwl5x68a3L1hf6UYQ7gI0k%252Fcezd4gudYfLYNyZ1zMbyiE7gxH3u0djU7iRp%252Fwni3oLQycRVUYc5W%252BWh%252BoO4e%252FSOoABTFb2g%252BXV1MEpB%252Fj0k0xbZjqewgCf28OdZ8j1pgUM9mvbsNITOc2RKx2Dfp4%252FHrJsszrp9qBLjuBL2Ni7kibvVXs%252FlIJWXXjZjQh7YfnGtsvdqy2pZR8RO1l5BafC%252BAxBXAmpcm3vmgZ3c%252BtFtsj74Xv1trx1eaejMXDQQ%252BBwY8jpnUtrpQKjQ07ueR1YaViW54Fq9YHa6lBoeDhjwBfVTGIZsf%252BJ7T48ao0uZlVP%252BKPlaIL1%252FtJRMMiOEivq%252FNFyzGlUp7YN1YMq3w0j15X96I2kJC8BAu40x7p8HZ7G7vcUi%252B5Mznl%252B7m0eqj4RCKKNQ1w6joyYJBh2%252BJNDQizYfbXbt87L22ghjeFYqvv0ZFA0D5L5GvWgHiO3h%252Fuzr5ZeCYg3svlW%252B8fJMUIDF%252FGvPUnQU4Oz6wi%252BrA%252F59sf30uIROrX0vBfw%252BXM5bGq5UajM64dXIF8XXyAXqnI%252BrUtFXCM8AyPOr6WrWUNerBh%252BgqE6qyu%252FYrOMNLRs70GOqUBsEaI%252B77NjD%252FjmGS79dlpyGu3mc6MPxJgEnVhdEPZI5fDbxfONEBLGXhzGaJ1Pxeq6VohlfKeB3aPaSnctWvM06Np5WT7KUso4Dbj%252F3KoIlwTZOs0ShFJ8cecweOhqMeKZLhHGtnyMQlml5CXyFgyAn902ZZNZYaJ1RYMNINWx3GES6tJEP8Ll8RK5EtimlX61%252F6sdUzAtSt%252BOCuWdtQFG2ETYO5b%26X-Amz-Signature%3Dbc4955c0d920f3426253c4fdae9948057132aad2706d531d27bb6ae9b1ad6702%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject&w=1080&q=75)
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?
By Fireworks AI|1/31/2025
DeepSeek R1, a state-of-the-art open model, is now available. Try it now or read our DeepSeek quickstart!
By Fireworks AI|1/31/2025
The recent release of DeepSeek R1 has taken the AI community by storm, offering performance on par with leading frontier models—such as OpenAI’s o1—at a fraction of the cost. Still, R1 can be expensive for use cases with high traffic or low latency requirements.
DeepSeek R1’s strength lies in its explicit step-by-step reasoning. Before generating a final answer, it creates an internal “chain of thought” (CoT) to systematically reason through each problem. This process is a form of test-time computation, allowing the model to dynamically allocate more compute to complex problems. However, these extended reasoning sequences typically increase inference cost.
Distillation is a method for transferring knowledge from a large, more powerful teacher model to a smaller, more cost-effective student model. According to the DeepSeek R1 paper, R1 is highly effective in this teacher role. Its detailed CoT sequences guide the student model to break down complex tasks into smaller, more manageable steps.
Although fine-tuning with human-labeled data can produce specialized models, collecting both final answers and their corresponding reasoning steps is expensive. Distillation scales more easily: rather than relying on human annotations, the teacher model automatically generates the training data for the student.
The term “distillation” can refer to different methods:
In this post, we focus on the data distillation because it supports a wider variety of student–teacher pairs.
Training data is often a bottleneck in model development. In a recent post (add link), we explored how to generate labels by combining model output with a verification function. Distillation takes a different approach, using a teacher model to synthesize missing completions.
DeepSeek R1 stands out because it not only provides final answers but also reveals its step-by-step chain of thought—unlike other reasoning models that keep this internal process hidden. If your dataset includes ground truth answers, you can identify high-quality synthetic CoTs through rejection sampling, selecting only the best chains to further improve your fine-tuned model. Rejection sampling can remove incorrect data examples either by comparing the generated data against ground truth labels or by applying a user-defined validation function. From the interface point of view, the validation function resembles the verifiable reward function used by value-model-free RL methods like these described in our recent blog post.
GSM8K (Grade School Math 8K) is a dataset of 8.5K diverse grade-school math word problems. Each data point includes:
We expanded this dataset by adding:
Then, we fine-tuned three variants of the model (using LoRA on llama-3.1-8B-instruct), each with different training targets:
The table below summarizes average accuracy and reasoning length:
From this study, synthetic reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in boosting performance, albeit with a higher inference cost due to their longer length.
DeepSeek R1 is available on the Fireworks AI platform. A user-friendly distillation interface will soon be part of FireOptimizer. If you need earlier access, please get in touch to explore options.
By incorporating reasoning-based data through distillation, organizations can drastically improve model performance without bearing the full burden of human-annotated datasets. DeepSeek R1’s ability to produce long, high-quality reasoning chains makes it a powerful teacher model—showing that, in some cases, the machine might just out-teach the human.