Sample Complexity in Reinforcement Learning Reading List

Curated by Mouhssine Rifaki | Stanford Electrical Engineering | Last updated April 2026

Foundational and modern results on how many samples an RL agent needs to learn a good policy.

  1. Minimax Regret Bounds for Reinforcement Learning
    Azar, Osband, Munos. ICML 2017.
  2. Is Q-Learning Provably Efficient?
    Jin et al.. NeurIPS 2018.
  3. Reinforcement Learning: Theory and Algorithms
    Agarwal, Jiang, Kakade, Sun. Monograph 2020.
  4. Near-Optimal Reinforcement Learning with Self-Play
    Bai, Jin, Yu. NeurIPS 2020.
  5. Provably Efficient Exploration in Policy Optimization
    Cai et al.. ICML 2020.
  6. Bandit Algorithms
    Lattimore and Szepesvari. Cambridge 2020.
  7. On the Sample Complexity of Reinforcement Learning with a Generative Model
    Azar, Munos, Kappen. Machine Learning 2013.
  8. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
    Dann, Lattimore, Brunskill. NeurIPS 2017.
  9. Reward-Free Exploration for Reinforcement Learning
    Jin et al.. ICML 2020.
  10. Provable Benefits of Representational Transfer in Reinforcement Learning
    Agarwal et al.. COLT 2022.
← Back to main page