towardsdatascience.com

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

Part 2 of the LLM deep dive The post How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo appeared first on Towards Data Science.

Feb 27, 2025 - 22:24

0

How LLMs Work: Reinforcement Learning, RLHF, DeepSeek R1, OpenAI o1, AlphaGo

Welcome to part 2 of my LLM deep dive. If you’ve not read Part 1, I highly encourage you to check it out first.

Previously, we covered the first two major stages of training an LLM:

Pre-training — Learning from massive datasets to form a base model.
Supervised fine-tuning (SFT) — Refining the model with curated examples to make it useful.

Now, we’re diving into the next major stage: Reinforcement Learning (RL). While pre-training and SFT are well-established, RL is still evolving but has become a critical part of the training pipeline.

I’ve taken reference from Andrej Karpathy’s widely popular 3.5-hour YouTube. Andrej is a founding member of OpenAI, his insights are gold — you get the idea.

Let’s go Read More

Tags:

Previous Article

This new Lenovo laptop I tested proves work computers don't have to be boring

OpenAI releases ‘largest, most knowledgable’ model GPT-4.5 with reduced hallucin...

Related Posts

Data Science: From School to Work, Part II

Data Science: From School to Work, Part II

Mar 3, 2025 0

Using GPT-4 for Personal Styling

Using GPT-4 for Personal Styling

Mar 7, 2025 0

How to Train LLMs to “Think” (o1 & DeepSeek-R1)

How to Train LLMs to “Think” (o1 & DeepSeek-R1)

Mar 4, 2025 0

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.