Decoding DeepSeek R1's Research Abstract

Introduction Whats up everyone? This is Dev. This blog is to understand the abstract of DeepSeek-R1’s Research Paper. In case you have not heard about DeepSeek, check out the following video to know more about it. Brief about DeepSeek R1 DeepSeek is a Chinese AI company that develops open-source large language models. They recently release their first generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. The later model disrupted the internet and everyone is/was talking about it. The major reason behind it is that DeepSeek-R1 achieved the performance that can be comparable with OpenAI's o1 model on reasoning tasks. Understand Abstract The abstract starts with introducing the models. Further, it talks about how DeepSeek-R1-Zero was trained. It says that DeepSeek-R1-Zero was trained via Large Scale Reinforcement Learning without Supervised Fine-Tuning as a preliminary step. Large Scale Reinforcement Learning Reinforcement Learning (RL) refers to the learning that a model does using trial and error method. Gradually the model improves, based on the feedback. When the model is made to do RL at a large scale, it can be called as 'Large Scale Reinforcement Learning'. Supervised Fine-Tuning When the model is taught using examples and correct answers, it can be called as 'Supervised Fine-Tuning' (SFT). In general while training an AI model, the usual steps include SFT, first and then training the model through Large Scale RL. However, as per the abstract, DeepSeek-R1-Zero was directly trained through 'Large Scale RL', skipping the step of SFT. Moreover, the abstract claims that through RL, DeepSeek-R1-Zero emerged with numerous powerful and intriguing reasoning behaviors. Nonetheless, it encountered challenges such as poor readability and language mixing. To solve this issue, they introduced DeepSeek-R1 that incorporated multi-stage training and cold-start data before RL. Multi-Stage Training Instead of training the model, all at once, the model is trained in different steps or stages. With each step, the model gets better and prepares it for the next step. Cold-Start Data 'Cold-Start' means before making the model learn through RL, the model is given some starting knowledge. Following this methodology, DeepSeek-R1 demonstrated the performance that is comparable to OpenAI-o1-1217 model on reasoning tasks. To add to this, to support the research community, DeepSeek-R1-Zero and DeepSeek-R1 is made open-sourced. Along with it, they released six dense models distilled from DeepSeek-R1 based on Qwen and Llama: 1.5B, 7B, 8B, 14B, 32B and 70B. Distilled Model A Distilled Model is a smaller, faster and more efficient version of a larger AI Model. In the above mentioned video, I ran a distilled model of DeepSeek-R1 using Ollama and prompted it a complex programming question. Conclusion Thank you for reading the blog. Here is the DeepSeek R1's Research Paper in case you want to check it out. Moreover, I am also working on making a website which lists the terms related to AI models, that I came across while reading AI model's research papers, along with their description, as I understand them. If you are also interested in learning AI though books and research papers, this can be helpful.

Feb 5, 2025 - 23:59

Decoding DeepSeek R1's Research Abstract

Introduction

Whats up everyone? This is Dev. This blog is to understand the abstract of DeepSeek-R1’s Research Paper. In case you have not heard about DeepSeek, check out the following video to know more about it.

Brief about DeepSeek R1

DeepSeek is a Chinese AI company that develops open-source large language models. They recently release their first generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. The later model disrupted the internet and everyone is/was talking about it. The major reason behind it is that DeepSeek-R1 achieved the performance that can be comparable with OpenAI's o1 model on reasoning tasks.

Understand Abstract

The abstract starts with introducing the models. Further, it talks about how DeepSeek-R1-Zero was trained. It says that DeepSeek-R1-Zero was trained via Large Scale Reinforcement Learning without Supervised Fine-Tuning as a preliminary step.

Large Scale Reinforcement Learning

Reinforcement Learning (RL) refers to the learning that a model does using trial and error method.
Gradually the model improves, based on the feedback.
When the model is made to do RL at a large scale, it can be called as 'Large Scale Reinforcement Learning'.

Supervised Fine-Tuning

When the model is taught using examples and correct answers, it can be called as 'Supervised Fine-Tuning' (SFT).

In general while training an AI model, the usual steps include SFT, first and then training the model through Large Scale RL. However, as per the abstract, DeepSeek-R1-Zero was directly trained through 'Large Scale RL', skipping the step of SFT.

Moreover, the abstract claims that through RL, DeepSeek-R1-Zero emerged with numerous powerful and intriguing reasoning behaviors. Nonetheless, it encountered challenges such as poor readability and language mixing.

To solve this issue, they introduced DeepSeek-R1 that incorporated multi-stage training and cold-start data before RL.

Multi-Stage Training

Instead of training the model, all at once, the model is trained in different steps or stages.
With each step, the model gets better and prepares it for the next step.

Cold-Start Data

'Cold-Start' means before making the model learn through RL, the model is given some starting knowledge.

Following this methodology, DeepSeek-R1 demonstrated the performance that is comparable to OpenAI-o1-1217 model on reasoning tasks.

To add to this, to support the research community, DeepSeek-R1-Zero and DeepSeek-R1 is made open-sourced. Along with it, they released six dense models distilled from DeepSeek-R1 based on Qwen and Llama: 1.5B, 7B, 8B, 14B, 32B and 70B.

Distilled Model

A Distilled Model is a smaller, faster and more efficient version of a larger AI Model.

In the above mentioned video, I ran a distilled model of DeepSeek-R1 using Ollama and prompted it a complex programming question.

Conclusion

Thank you for reading the blog. Here is the DeepSeek R1's Research Paper in case you want to check it out.

Moreover, I am also working on making a website which lists the terms related to AI models, that I came across while reading AI model's research papers, along with their description, as I understand them. If you are also interested in learning AI though books and research papers, this can be helpful.