More

alessiodm · 2025-08-10T16:27:26 1754843246

I’ve released a major expansion of my open-source deep reinforcement learning course. Last year's initial release got positive feedback, so I've added a new module with advanced topics and practical productionization techniques by curating and refining materials I collected over the years. This final update includes hands-on implementations of RND, AlphaZero, RLHF, MBPO, and more. I hope it's a valuable resource for the community.

alessiodm · on May 7, 2024

Great feedback, I didn't even think about that the TODOs could be indeed confusing! I updated the instructions in the README.md calling them out explicitly as the coding sections to be completed. Thanks again!

alessiodm · on May 7, 2024

Thank you so much for this feedback! Indeed, this is definitely confusing in the notebook. I pushed a small commit to make it a little bit more clear that the non-determinism comes from the probabilistic nature of the environment dynamics (and not b/c the agent chooses a different action by mistake).

As a side note, initially I meant to go through it in a video to fill the gaps in the text with my voice. But given that I didn't have time for those, I am fixing those gaps first :) Thanks again!

alessiodm · on May 6, 2024

Thank you! I'll be be curious if / how these notebooks help and how your experience is! Any feedback welcome!

alessiodm · on May 6, 2024

Thank you. It is true, indeed the material does assume some prior knowledge (which I mention in the introduction). In particular: being proficient in Python, or at least in one high-level programming language, be familiar with deep learning and neural networks, and - to get into the theory and mathematics (optional) - basic calculus, algebra, statistics, and probability theory.

Nonetheless, especially for RL foundations, I found that a practical understanding of the algorithms at a basic level, writing them yourself, and "playing" with them and their results (especially in small toy settings like the grid world) provided the best way to start getting a basic intuition in the field. Hence, this resource :)

alessiodm · on May 6, 2024

I took the Deep Learning course [1] by deeplearning.ai in the past, and their resources where incredibly good IMHO. Hence, I would suggest to take a look at their NLP specialization [2].

+1000 to "Neural networks: zero to hero" already mentioned as well.

[1] https://www.deeplearning.ai/courses/deep-learning-specializa... [2] https://www.deeplearning.ai/courses/natural-language-process...

alessiodm · on May 6, 2024

Thank you so much! Unfortunately, that is a mistake in the README that I just noticed (thank you for pointing it out!) :( As I mentioned in the first post, I didn't get to make the YouTube videos yet. But it seems the community would be indeed interested.

I will try to get to them (and in the meantime fix the README, sorry about that!)

alessiodm · on May 6, 2024

TL;DR: If more folks feel this way, please upvote this comment: I'll be happy to take down this post, change the title, and either re-post it or just don't - the GitHub repo is out there - that that should be more than enough. Sorry again for the confusion (I just upvoted it).

I am deeply sorry about the confusion. And the last thing I intended was to grab any attention away from Andrej, and / or being confused with him.

I tried to find a way to edit the post title, but I couldn't find one. Is there just a limited time window to do that? If you know how to do it, I'd be happy to edit it right away in case.

I didn't even think this post would get any attention at all - it is my first post indeed here, and I really did it just b/c if anybody could use this project to learn RL I was happy to share.

khiner · on May 6, 2024

Throwing in my vote - I wasn’t confused, saw your GH link and a “Zero to Hero” course name on RL, seems clear to me and “Zero to Hero” is a classic title for a first course, nice that you gave props to Andrea too! Multiple people can and should make ML guides and reference each other. Thanks for putting in the time to share your learnings and make a fantastic resource out of it!

alessiodm · on May 6, 2024

Thanks a lot. It makes me feel better to hear that the post is not completely confusing and appropriating - I really didn't mean that, or to use it as a trick for attention.

ultra_nick · on May 6, 2024

Didn't "Zero to Hero" come from Disney's Hercules movie before Karparthy used it?

alessiodm · on May 6, 2024

Didn't know that, but now I have an excuse to go watch a movie :D

gradascent · on May 6, 2024

I didn't find it confusing at all. I think it's totally ok to re-use phrasing made famous by someone else - this is how language evolves after all.

alessiodm · on May 6, 2024

Thank you, I appreciate it.

FezzikTheGiant · on May 6, 2024

this is a great resource nonetheless. Even if you did use the name to get attention how does it matter? I still see it as a net positive. Thanks for sharing this

alessiodm · on May 6, 2024

Thank you!

alessiodm · on May 6, 2024

RL can be massively disappointing, indeed. And I agree with you (and with the amazing post I already referenced [1]) that it is hard to get it to work at all. Sorry to hear you have been disappointed so much!

Nonetheless, I would personally recommend even just learning the basics and fundamentals of RL. Beyond supervised, unsupervised, and the most-recent and well-deservedly hyped semi-supervised learning (generative AI, LLMs, and so on), reinforcement learning indeed models the learning problem in a very elegant way: an agent interacting with an environment and getting feedback. Which is, arguably, a very intuitive and natural way of modeling it. You could consider backward error correction / propagation as an implicit reward signal, but that would be a very limited view.

On a positive note, RL has very practical sucessful applications today - even if in niche fields. For example, LLM fine-tuning techniques like RLHF successfully apply RL to modern AI systems, companies like Covariant are working on large robotics models which definitely use RL, and generally as a research field I believe (but I may be proven wrong!) there is so much more to explore. For example, check Nvidia Eureka that combines LLM to RL [2]: pretty cool stuff IMHO!

Far from attempting to convince you on the strength and capabilities of DRL, just recommending folks to not discard it right away and at least give it a chance to learn the basics, even just for an intellectual exercise :) Thanks again!

[1] https://www.alexirpan.com/2018/02/14/rl-hard.html

[2] https://blogs.nvidia.com/blog/eureka-robotics-research/

alessiodm · on May 6, 2024

Thank you very much! I'd be really interested to know if your agents will eventually make progress, and if these notebooks help - even if a tiny bit!

If you just want to see if these algorithm can even work at all, feel free to jump on the `solution` folder and pick any algorithm you think could work and just try it out there. If it does, then you can have all the fun rewriting it from scratch :) Thanks again!

3abiton · on May 11, 2024

Thanks for sharing your work!