Sample Complexity in Reinforcement Learning Reading List

Curated by Mouhssine Rifaki | Stanford Electrical Engineering | Last updated April 2026

Foundational and modern results on how many samples an RL agent needs to learn a good policy.

Minimax Regret Bounds for Reinforcement Learning
Azar, Osband, Munos. ICML 2017.
Is Q-Learning Provably Efficient?
Jin et al.. NeurIPS 2018.
Reinforcement Learning: Theory and Algorithms
Agarwal, Jiang, Kakade, Sun. Monograph 2020.
Near-Optimal Reinforcement Learning with Self-Play
Bai, Jin, Yu. NeurIPS 2020.
Provably Efficient Exploration in Policy Optimization
Cai et al.. ICML 2020.
Bandit Algorithms
Lattimore and Szepesvari. Cambridge 2020.
On the Sample Complexity of Reinforcement Learning with a Generative Model
Azar, Munos, Kappen. Machine Learning 2013.
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Dann, Lattimore, Brunskill. NeurIPS 2017.
Reward-Free Exploration for Reinforcement Learning
Jin et al.. ICML 2020.
Provable Benefits of Representational Transfer in Reinforcement Learning
Agarwal et al.. COLT 2022.

← Back to main page