November 27, 2023
HuggingFace Deep RL - Unit 1 Learnings

HuggingFace Deep RL - Unit 1 Learnings

Intro to Deep Reinforcement Learning

I recently started making my way through the HuggingFace 🤗 Deep RL Course and figured it might be worthwhile to document some of my own personal takeaways as I was working through it.

  • The "Q" in "Q-Learning" stands for "Quality". Might not be all that important to some folks, but I was curious, and I've gotten a lot less hesitant about using ChatGPT to handle the jargon-busting.

  • There's an SB3 integration with wandb. I was a little hesitant to hook up monitoring tools all willy-nilly in the first section without knowing if there might be more info later (i.e., recommended tools). Still, since it turns out it's pretty easy out of the box, I just did it to start getting a quick look at how params would affect training. Half the fun of the tutorials is stopping to turn some knobs.

  • Gym is a thing. Pretty neat. Getting a look at how we're describing our state to our model, it's really interesting to consider RPA in terms of vectorizing the environment. While I've been playing around with OpenCV a lot lately, I think that being able to deconstruct elements in an environment to pass over to our model (even one based on visual analysis) could be SUPER useful. Even for building a pixel bot, state recognition is a thing, innit? Are we clicking an "ok" button to begin another mission, or is it a button to spend $100 on premium currency? Lol

  • Discussed some of the params with ChatGPT and it was pretty insightful. I'm used to traditional video workloads and I tend to lean on the knowledge there with regard to considering how we might test optimizations to params. Looking to try to move through test runs quickly (as one would with proxy footage) to explore relationships between params before looking to express that in a more compute-expensive run seems like a reasonable heuristic. What we arrived at (and I knew this from the old "playing around with VQlipse" days, but haven't had to use it in a while) was that decreasing the n_steps, batch size, and n_epochs was the way to go. Playing with those values is definitely more nuanced than just swapping proxy footage in and out, but it's at least an approachable way to get a look at the effects of other params over shorter runs.

  • The folks in the HF discord are pretty swell, from the looks of it.

Unit 1 Bonus:

  • Huggy is adorable. I'll probably never be able to play with him and not do "the puppy voice" (oo'zuh'guboi)

  • ml-agents seems really interesting. I haven't messed around with it beyond just the initial Huggy training for the exercise, but from the livestream it seems like there are a ton of applications for it beyond just setting up bespoke training spaces. I'm sure I'll get more info on that later in the course.