Policy Gradient Methods Reading List
Curated by Mouhssine Rifaki | Stanford Electrical Engineering | Last updated April 2026
The core ideas behind gradient-based policy optimization in RL.
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
Williams. Machine Learning 1992.
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton et al.. NeurIPS 1999.
- Asynchronous Methods for Deep Reinforcement Learning
Mnih et al.. ICML 2016.
- Trust Region Policy Optimization
Schulman et al.. ICML 2015.
- Proximal Policy Optimization Algorithms
Schulman et al.. arXiv 2017.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja et al.. ICML 2018.
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
Schulman et al.. ICLR 2016.
- A Natural Policy Gradient
Kakade. NeurIPS 2001.
- Deterministic Policy Gradient Algorithms
Silver et al.. ICML 2014.
- Maximum a Posteriori Policy Optimisation
Abdolmaleki et al.. ICLR 2018.
← Back to main page