Modules

All modules in this course are given below. Pre-recorded lecture videos and slides will be available by the end of Sunday the week before class. Note the associated refresh your understanding and check your understanding polls will be posted weekly.

Module	Videos (on Canvas/Panopto)	Course Materials
Introduction to Reinforcement Learning	Part 1: Course Overview Part 2: Course Logistics Part 3: Intro to Sequential Decision Making	Lecture 1 Slides Additional Materials: High level introduction: SB (Sutton and Barto) Chp 1 Linear Algebra Review Probability Review Python Tutorial
Tabular MDP planning	Lecture 2, Part 1: Policy Evaluation Lecture 2, Part 2: Policy Iteration Lecture 2, Part 3: Value Iteration Lecture 2, Part 4: Bellman contraction operator and finite horizon	Lecture 2 Slides Additional Materials: SB (Sutton and Barto) Chp 3, 4.1-4.4
Tabular RL policy evaluation	Lecture 3 Refresh Your Understanding Lecture 3, Part 1: Monte Carlo Policy Evaluation Lecture 3, Part 2: Temporal Difference Learning Lecture 3, Part 3: Dynamic Programming with Certainty Equivalence Lecture 3, Part 4: Comparing Algorithms for Policy Evaluation	Lecture 3 Slides Additional Materials: SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3 David Silver's Lecture 4 [link]
Q-learning	Lecture 4 Part 1: Refresh Your Understanding Lecture 4 Part 2: Generalized Policy Iteration and E-Greedy Lecture 4 Part 3: MC Control with e-greedy policies (tabular) Lecture 4 Part 4: TD Methods, SARSA and Q-learning for Tabular Control Lecture 4 Part 5: Maximization Bias	Lecture 4 Slides Additional Materials: SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7
RL with function approximation	Lecture 5 Part 1: Refresh Your Understanding Lecture 5 Part 2: Value Function Approximation Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA Lecture 5 Part 6: MC and TD Control with Linear VFA Lecture 6 Part 1: Refresh Your Understanding Lecture 6 Part 2: Function Approximation with Deep Neural Networks Lecture 6 Part 3: Function Approximation with CNN Lecture 6 Part 4: Deep RL — DQN Lecture 7 Part 1: Refresh Your Understanding Lecture 7 Part 2: Double DQN Lecture 7 Part 3: Prioritized Replay Lecture 7 Part 4: Dueling DQN Lecture 7 Part 5: Practical Tips for DQN on Atari Pytorch Tutorial Deep Learning Overview (from Winter 2020)	Lecture 5 Slides Lecture 6 Slides Lecture 7 Slides Additional Materials: SB (Sutton and Barto) 9.3, 9.6, 9.7 Human-level control through deep reinforcement learning Playing Atari with Deep Reinforcement Learnin CS231n CNN notes
Policy search	Lecture 8 Part 1: Refresh Your Understanding Lecture 8 Part 2: Policy Search Methods Lecture 8 Part 3: Gradient-free Methods Lecture 8 Part 4: Finite Difference Methods Lecture 8 Part 5: Score Functions Lecture 8 Part 6: Likelihood Ratio Policy Gradient Lecture 8 Part 7: REINFORCE Lecture 9 Part 1: Refresh Your Understanding Lecture 9 Part 2: Policy-Based RL Recap Lecture 9 Part 3: Better Gradient Estimates Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance Lecture 9 Part 5: Need for Automatic Step Size Tuning Lecture 9 Part 6: Local Approximation Lecture 9 Part 7: Trust Regions Lecture 9 Part 8: TRPO Algorithm	Lecture 8 Slides Lecture 9 Slides Additional Materials: SB (Sutton and Barto) Chp 13
Fast Learning	Lecture 10 Part 1: Refresh Your Understanding Lecture 10 Part 2: Introduction to Multi-armed Bandits Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm Lecture 10 Part 4: Regret Lecture 10 Part 5: E-greedy Algorithm Lecture 10 Part 6: Optimism under Uncertainty Lecture 10 Part 7: UCB Bandit Regret Lecture 11 Part 1: Refresh Your Understanding Lecture 11 Part 2: Bandits and Probably Approximately Correct Lecture 11 Part 3: Bayesian Bandits Lecture 11 Part 4: Bayesian Bandits Example Lecture 11 Part 5: Thompson Sampling Lecture 11 Part 6: Bayesian Regret and Probability Matching Lecture 12 Part 1: Refresh Your Understanding Lecture 12 Part 2: Refresh Your Understanding Solution Lecture 12 Part 3: Fast RL in MDPs Lecture 12 Part 4: Fast RL in Bayesian MDPs Lecture 12 Part 5: Generalization and Exploration	Lecture 10 Slides Lecture 11 Slides Lecture 12 Slides Additional Materials: Bandit Algorithms Book Chapter 7.1, Chapter 35 Draft lecture notes
Batch Reinforcement Learning	Lecture 13 Part 1: Refresh Your Understanding Lecture 13 Part 2: Introduction to Batch RL Lecture 13 Part 3: Batch RL Setting Lecture 13 Part 4: Offline Batch Evaluation Using Models Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling Lecture 14: Guest Lecture from Professor Finale Doshi-Velez	Lecture 13 Slides Lecture 14 Guest Lecture Slides
Monte Carlo Tree Search	Lecture 15 Part 1: Refresh Your Understanding Lecture 15 Part 2: Model-Based RL Lecture 15 Part 3: Simulation-Based Search Lecture 15 Part 4: Upper Confidence Tree Search Lecture 15 Part 5: Case Study: the Game of Go	Lecture 15 Slides Annotated Lecture 15 Slides
Rewards, Value Alignment, and Wrapping Up	Lecture 16 Recording	Lecture 16 Slides

CS234: Reinforcement Learning Winter 2021

Modules

Modules