Hacker Newsnew | past | comments | ask | show | jobs | submit | more alessiodm's commentslogin

Thanks a lot, and another great suggestion for improvement. I also found that the common advice is "tweak hyperparameters until you find the right combination". That can definitely help. But usually issues hide in different "corners", both of the problem space and its formulation, the algorithm itself (e.g., just different random seeds have big variance in performance), and more.

As you mentioned, in real applications of DRL things tend to go wrong more often than right: "it doesn't work just yet" [1]. And my short tutorial definitely lacks in the area of troubleshooting, tuning, and "productionisation". If I carve time for expansion, this will likely make top of list. Thanks again.

[1] https://www.alexirpan.com/2018/02/14/rl-hard.html


Thanks for sharing [1], that was a great read. I'd be curious to see an updated version of that article, since it's about 6 years old now. For example, Boston Dynamics has transitioned from MPC to RL for controlling its Spot robots [2]. Davide Scaramuzza, whose team created autonomous FPV drones that beat expert human pilots, has also discussed how his team had to transition from MPC to RL [3].

[2]: https://bostondynamics.com/blog/starting-on-the-right-foot-w...

[3]: https://www.incontrolpodcast.com/1632769/13775734-ep15-david...


Thank you for the amazing links as well! You are right that the article [1] is 6 years old now, and indeed the field has evolved. But the algorithms and techniques I share in the GitHub repo are the "classic" ones (dating back then too), for which that post is still relevant - at least from an historical perspective.

You bring up a very good point though: more recent advancements and assessments should be linked and/or mentioned in the repo (e.g., in the resources and/or an appendix). I will try to do that sometime.


While trying to learn the latest in Deep Reinforcement Learning, I was able to take advantage of many excellent resources (see credits [1]), but I couldn't find one that provided the right balance between theory and practice for my personal experience. So I decided to create something myself, and open-source it for the community, in case it might be useful to someone else.

None of that would have been possible without all the resources listed in [1], but I rewrote all algorithms in this series of Python notebooks from scratch, with a "pedagogical approach" in mind. It is a hands-on step-by-step tutorial about Deep Reinforcement Learning techniques (up ~2018/2019 SoTA) guiding through theory and coding exercises on the most utilized algorithms (QLearning, DQN, SAC, PPO, etc.)

I shamelessly stole the title from a hero of mine, Andrej Karpathy, and his "Neural Network: Zero To Hero" [2] work. I also meant to work on a series of YouTube videos, but didn't have the time yet. If this posts gets any type of interest, I might go back to it. Thank you.

P.S.: A friend of mine suggested me to post here, so I followed their advice: this is my first post, I hope it properly abides with the rules of the community.

[1] https://github.com/alessiodm/drl-zh/blob/main/00_Intro.ipynb [2] https://karpathy.ai/zero-to-hero.html


Does it rely heavily on python, or could someone use a different language to go through the material?


Yes, the material relies heavily on Python. I intentionally used popular open-source libraries (such as Gymnasium for RL environments, and PyTorch for deep learning) and Python itself given their popularity in the field, so that the content and learnings could be readily applicable to real-world projects.

The theory and algorithms per-se are general: they can be re-implemented in any language, as long as there are comparable libraries to use. But the notebooks are primarily in Python, and the (attempted) "frictionless" learning experience would lose a bit if the setup is in a different language, and it'll likely take a little bit more effort to follow along.


I've gone through the first three notebooks today and enjoyed them a lot. First time I've tried the Atari gymnasium and that was really satisfying and fun. Thank you.


Really happy to hear you enjoyed the notebooks! And thank you very much for the patch in the simulate_mdp for the cliff world!


very cool, thanks for putting this together

It would be great to see a page dedicated to SoTA techniques & results


Thank you so much! And very good advice: I have an extremely brief and not-descriptive list in the "Next" notebook, initially intended for that. But it definitely falls short.

I may actually expand it in a second "more advanced" series of notebooks, to explore model-based RL, curiosity, and other recent topics: even if not comprehensive, some hands on basic coding exercise on those topics might be of interest nonetheless.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: