Policy Gradient Methods Reading List

Curated by Mouhssine Rifaki | Stanford Electrical Engineering | Last updated April 2026

The core ideas behind gradient-based policy optimization in RL.

  1. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
    Williams. Machine Learning 1992.
  2. Policy Gradient Methods for Reinforcement Learning with Function Approximation
    Sutton et al.. NeurIPS 1999.
  3. Asynchronous Methods for Deep Reinforcement Learning
    Mnih et al.. ICML 2016.
  4. Trust Region Policy Optimization
    Schulman et al.. ICML 2015.
  5. Proximal Policy Optimization Algorithms
    Schulman et al.. arXiv 2017.
  6. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    Haarnoja et al.. ICML 2018.
  7. High-Dimensional Continuous Control Using Generalized Advantage Estimation
    Schulman et al.. ICLR 2016.
  8. A Natural Policy Gradient
    Kakade. NeurIPS 2001.
  9. Deterministic Policy Gradient Algorithms
    Silver et al.. ICML 2014.
  10. Maximum a Posteriori Policy Optimisation
    Abdolmaleki et al.. ICLR 2018.
← Back to main page