Lecture Materials

Lecture Materials

Lecture materials for this course are given below. Note the associated refresh your understanding and check your understanding polls will be posted weekly.

Topic	Videos (on Canvas/Panopto)	Course Materials
Introduction to Reinforcement Learning	(2022 Live) Lecture 1 (2021 Recording) Part 1: Course Overview Part 2: Course Logistics Part 3: Intro to Sequential Decision Making	Lecture 1 Slides Post class version Additional Materials: High level introduction: SB (Sutton and Barto) Chp 1 Linear Algebra Review Probability Review Python Tutorial
Tabular MDP planning	(2022 Live) Lecture 2 (2021 Recording) Lecture 2, Part 1: Policy Evaluation Lecture 2, Part 2: Policy Iteration Lecture 2, Part 3: Value Iteration Lecture 2, Part 4: Bellman contraction operator and finite horizon	Lecture 2 Slides (pre-class) [Post class, annotated] Additional Materials: SB (Sutton and Barto) Chp 3, 4.1-4.4 Problem Session 1 Problem Session 1 Solution
Tabular RL policy evaluation	(2022 Live) Lecture 3 (2021 Recording) Lecture 3, Part 1: Monte Carlo Policy Evaluation Lecture 3, Part 2: Temporal Difference Learning Lecture 3, Part 3: Dynamic Programming with Certainty Equivalence Lecture 3, Part 4: Comparing Algorithms for Policy Evaluation	Lecture 3 Slides (pre-class) [Post class, with annotations] Additional Materials: SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3 David Silver's Lecture 4 [link] Problem Session 2 Problem Session 2 Solution
Q-learning	(2022 Live) Lecture 4 (2021 Recording) Lecture 4 Part 1: Generalized Policy Iteration and E-Greedy Lecture 4 Part 2: MC Control with e-greedy policies (tabular) Lecture 4 Part 3: TD Methods, SARSA and Q-learning for Tabular Control Lecture 4 Part 4: Maximization Bias	Lecture 4 Slides (post class with annotations) Additional Materials: SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7 Problem Session 3 Problem Session 3 Solution Draft lecture 2 notes (Note: these may contain factual errors and typos) Draft lecture 3 notes (Note: these may contain factual errors and typos) Draft lecture 4 notes (*Note: these may contain factual errors and typos)
RL with function approximation	(2022 Live) Lecture 5 Lecture 6 Lecture 7 (2021 Recording) Lecture 5 Part 2: Value Function Approximation Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA Lecture 5 Part 6: MC and TD Control with Linear VFA Lecture 6 Part 2: Function Approximation with Deep Neural Networks Lecture 6 Part 3: Function Approximation with CNN Lecture 6 Part 4: Deep RL — DQN Lecture 7 Part 2: Double DQN Lecture 7 Part 3: Prioritized Replay Lecture 7 Part 4: Dueling DQN Lecture 7 Part 5: Practical Tips for DQN on Atari Pytorch Tutorial (from Winter 2021) Deep Learning Overview (from Winter 2020)	Lecture 5 Slides [Post lecture with annotations] Lecture 6 Slides [Post class annotations] Lecture 7 Slides [Post class annotations] Additional Materials: SB (Sutton and Barto) 9.3, 9.6, 9.7 Human-level control through deep reinforcement learning Playing Atari with Deep Reinforcement Learnin CS231n CNN notes Problem Session 4 Problem Session 4 Solution Draft lecture 5 notes (Note: these may contain factual errors and typos) Draft lecture 6 notes (Note: these may contain factual errors and typos) Draft lecture 7 notes (*Note: these may contain factual errors and typos)
Policy search	(2022 Live) Lecture 8 Lecture 9 (2021 Recording) Lecture 8 Part 2: Policy Search Methods Lecture 8 Part 3: Gradient-free Methods Lecture 8 Part 4: Finite Difference Methods Lecture 8 Part 5: Score Functions Lecture 8 Part 6: Likelihood Ratio Policy Gradient Lecture 8 Part 7: REINFORCE Lecture 9 Part 2: Policy-Based RL Recap Lecture 9 Part 3: Better Gradient Estimates Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance Lecture 9 Part 5: Need for Automatic Step Size Tuning Lecture 9 Part 6: Local Approximation Lecture 9 Part 7: Trust Regions Lecture 9 Part 8: TRPO Algorithm	Lecture 8 Slides [Post class with annotations] Lecture 9 Slides [Post class] Additional Materials: SB (Sutton and Barto) Chp 13 Problem Session 5 Problem Session 5 Solution Problem Session 6 Problem Session 6 Solution Draft lecture 8 notes (Note: these may contain factual errors and typos) Draft lecture 9 notes (Note: these may contain factual errors and typos)
Fast Learning	(2022 Live) Lecture 10 Lecture 11 Lecture 12 (2021 Recording) Lecture 10 Part 2: Introduction to Multi-armed Bandits Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm Lecture 10 Part 4: Regret Lecture 10 Part 5: E-greedy Algorithm Lecture 10 Part 6: Optimism under Uncertainty Lecture 10 Part 7: UCB Bandit Regret Lecture 11 Part 2: Bandits and Probably Approximately Correct Lecture 11 Part 3: Bayesian Bandits Lecture 11 Part 4: Bayesian Bandits Example Lecture 11 Part 5: Thompson Sampling Lecture 11 Part 6: Bayesian Regret and Probability Matching Lecture 12 Part 3: Fast RL in MDPs Lecture 12 Part 4: Fast RL in Bayesian MDPs Lecture 12 Part 5: Generalization and Exploration	Lecture 10 Draft Slides [Post class with annotations] Lecture 11 Slides [Post class, with annotations] Lecture 12 Slides [Post class, with annotations] Additional Materials: Bandit Algorithms Book Chapter 7.1, Chapter 35 Draft lecture 11 notes (*Note: these may contain factual errors and typos) Problem Session 7 Problem Session 7 Solution
Batch Reinforcement Learning	(2022 Live) Lecture 13 Lecture 14 Lecture 15 Lecture 16 Guest Lecture 17 (2021 Recording) Lecture 13 Part 2: Introduction to Batch RL Lecture 13 Part 3: Batch RL Setting Lecture 13 Part 4: Offline Batch Evaluation Using Models Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling	Lecture 13 Slides [Post class, with annotations] Lecture 14 Imitation Learning Slides [Post class, with annotations] Lecture 15 Batch Policy Learning [Post class, with annotations] Lecture 16 Reinforcement Learning and Reward Additional Materials: Problem Session 8 Problem Session 8 Solution