Lecture Materials

Lecture Materials

Lecture materials for this course are given below. Note the associated refresh your understanding and check your understanding polls will be posted weekly.
Topic   Videos (on Canvas/Panopto)  Course Materials  
Introduction to Reinforcement Learning (2022 Live)
  • Lecture 1
  • (2021 Recording)
  • Part 1: Course Overview
  • Part 2: Course Logistics
  • Part 3: Intro to Sequential Decision Making
    1. Lecture 1 Slides Post class version
    2. Additional Materials:
    Tabular MDP planning (2022 Live)
  • Lecture 2
  • (2021 Recording)
  • Lecture 2, Part 1: Policy Evaluation
  • Lecture 2, Part 2: Policy Iteration
  • Lecture 2, Part 3: Value Iteration
  • Lecture 2, Part 4: Bellman contraction operator and finite horizon
    1. Lecture 2 Slides (pre-class) [Post class, annotated]
    2. Additional Materials:
    Tabular RL policy evaluation (2022 Live)
  • Lecture 3
  • (2021 Recording)
  • Lecture 3, Part 1: Monte Carlo Policy Evaluation
  • Lecture 3, Part 2: Temporal Difference Learning
  • Lecture 3, Part 3: Dynamic Programming with Certainty Equivalence
  • Lecture 3, Part 4: Comparing Algorithms for Policy Evaluation
    1. Lecture 3 Slides (pre-class) [Post class, with annotations]
    2. Additional Materials:
    Q-learning (2022 Live)
  • Lecture 4
  • (2021 Recording)
  • Lecture 4 Part 1: Generalized Policy Iteration and E-Greedy
  • Lecture 4 Part 2: MC Control with e-greedy policies (tabular)
  • Lecture 4 Part 3: TD Methods, SARSA and Q-learning for Tabular Control
  • Lecture 4 Part 4: Maximization Bias
    1. Lecture 4 Slides (post class with annotations)
    2. Additional Materials:
    RL with function approximation (2022 Live)
  • Lecture 5
  • Lecture 6
  • Lecture 7
  • (2021 Recording)
  • Lecture 5 Part 2: Value Function Approximation
  • Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA
  • Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees
  • Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA
  • Lecture 5 Part 6: MC and TD Control with Linear VFA
  • Lecture 6 Part 2: Function Approximation with Deep Neural Networks
  • Lecture 6 Part 3: Function Approximation with CNN
  • Lecture 6 Part 4: Deep RL — DQN


  • Lecture 7 Part 2: Double DQN
  • Lecture 7 Part 3: Prioritized Replay
  • Lecture 7 Part 4: Dueling DQN
  • Lecture 7 Part 5: Practical Tips for DQN on Atari

  • Pytorch Tutorial (from Winter 2021)
  • Deep Learning Overview (from Winter 2020)
    1. Lecture 5 Slides [Post lecture with annotations]
    2. Lecture 6 Slides [Post class annotations]
    3. Lecture 7 Slides [Post class annotations]
    4. Additional Materials:
    Policy search (2022 Live)
  • Lecture 8
  • Lecture 9
  • (2021 Recording)
  • Lecture 8 Part 2: Policy Search Methods
  • Lecture 8 Part 3: Gradient-free Methods
  • Lecture 8 Part 4: Finite Difference Methods
  • Lecture 8 Part 5: Score Functions
  • Lecture 8 Part 6: Likelihood Ratio Policy Gradient
  • Lecture 8 Part 7: REINFORCE

  • Lecture 9 Part 2: Policy-Based RL Recap
  • Lecture 9 Part 3: Better Gradient Estimates
  • Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance
  • Lecture 9 Part 5: Need for Automatic Step Size Tuning
  • Lecture 9 Part 6: Local Approximation
  • Lecture 9 Part 7: Trust Regions
  • Lecture 9 Part 8: TRPO Algorithm
    1. Lecture 8 Slides [Post class with annotations]
    2. Lecture 9 Slides [Post class]
    3. Additional Materials:

    Fast Learning (2022 Live)
  • Lecture 10
  • Lecture 11
  • Lecture 12
  • (2021 Recording)
  • Lecture 10 Part 2: Introduction to Multi-armed Bandits
  • Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm
  • Lecture 10 Part 4: Regret
  • Lecture 10 Part 5: E-greedy Algorithm
  • Lecture 10 Part 6: Optimism under Uncertainty
  • Lecture 10 Part 7: UCB Bandit Regret

  • Lecture 11 Part 2: Bandits and Probably Approximately Correct
  • Lecture 11 Part 3: Bayesian Bandits
  • Lecture 11 Part 4: Bayesian Bandits Example
  • Lecture 11 Part 5: Thompson Sampling
  • Lecture 11 Part 6: Bayesian Regret and Probability Matching

  • Lecture 12 Part 3: Fast RL in MDPs
  • Lecture 12 Part 4: Fast RL in Bayesian MDPs
  • Lecture 12 Part 5: Generalization and Exploration
    1. Lecture 10 Draft Slides [Post class with annotations]
    2. Lecture 11 Slides [Post class, with annotations]
    3. Lecture 12 Slides [Post class, with annotations]
    4. Additional Materials:
    Batch Reinforcement Learning (2022 Live)
  • Lecture 13
  • Lecture 14
  • Lecture 15
  • Lecture 16
  • Guest Lecture 17
  • (2021 Recording)
  • Lecture 13 Part 2: Introduction to Batch RL
  • Lecture 13 Part 3: Batch RL Setting
  • Lecture 13 Part 4: Offline Batch Evaluation Using Models
  • Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions
  • Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling

    1. Lecture 13 Slides [Post class, with annotations]
    2. Lecture 14 Imitation Learning Slides [Post class, with annotations]
    3. Lecture 15 Batch Policy Learning [Post class, with annotations]
    4. Lecture 16 Reinforcement Learning and Reward
    5. Additional Materials: