Cartpole: A somewhat deep introduction to RL and value based learning
Published:
Takeaways
- Trained a RL policy model from scratch, with the weights open sourced and released on huggingface
- Beyond surface level discussion of the training dynamics, with specific analysis of loss curves and rewards in W&B runs
- Discussion of training failures, and how using a target network can be used to prevent training instability
Full version on Substack → CartPole - Going from 0 -> 1 in Value-based RL
