Sample Complexity in Reinforcement Learning Reading List
Curated by Mouhssine Rifaki | Stanford Electrical Engineering | Last updated April 2026
Foundational and modern results on how many samples an RL agent needs to learn a good policy.
- Minimax Regret Bounds for Reinforcement Learning
Azar, Osband, Munos. ICML 2017.
- Is Q-Learning Provably Efficient?
Jin et al.. NeurIPS 2018.
- Reinforcement Learning: Theory and Algorithms
Agarwal, Jiang, Kakade, Sun. Monograph 2020.
- Near-Optimal Reinforcement Learning with Self-Play
Bai, Jin, Yu. NeurIPS 2020.
- Provably Efficient Exploration in Policy Optimization
Cai et al.. ICML 2020.
- Bandit Algorithms
Lattimore and Szepesvari. Cambridge 2020.
- On the Sample Complexity of Reinforcement Learning with a Generative Model
Azar, Munos, Kappen. Machine Learning 2013.
- Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Dann, Lattimore, Brunskill. NeurIPS 2017.
- Reward-Free Exploration for Reinforcement Learning
Jin et al.. ICML 2020.
- Provable Benefits of Representational Transfer in Reinforcement Learning
Agarwal et al.. COLT 2022.
← Back to main page