Cartpole: A somewhat deep introduction to RL and value based learning

less than 1 minute read

Published:

Takeaways

  • Trained a RL policy model from scratch, with the weights open sourced and released on huggingface
  • Beyond surface level discussion of the training dynamics, with specific analysis of loss curves and rewards in W&B runs
  • Discussion of training failures, and how using a target network can be used to prevent training instability

Full version on Substack → CartPole - Going from 0 -> 1 in Value-based RL