Modules

Modules

All modules in this course are given below. Pre-recorded lecture videos and slides will be available by the end of Sunday the week before class. Note the associated refresh your understanding and check your understanding polls will be posted weekly.
Module   Videos (on Canvas/Panopto)  Course Materials  
Introduction to Reinforcement Learning
  • Part 1: Course Overview
  • Part 2: Course Logistics
  • Part 3: Intro to Sequential Decision Making
    1. Lecture 1 Slides
    2. Additional Materials:
    Tabular MDP planning
  • Lecture 2, Part 1: Policy Evaluation
  • Lecture 2, Part 2: Policy Iteration
  • Lecture 2, Part 3: Value Iteration
  • Lecture 2, Part 4: Bellman contraction operator and finite horizon
    1. Lecture 2 Slides
    2. Additional Materials:
      • SB (Sutton and Barto) Chp 3, 4.1-4.4
    Tabular RL policy evaluation
  • Lecture 3 Refresh Your Understanding
  • Lecture 3, Part 1: Monte Carlo Policy Evaluation
  • Lecture 3, Part 2: Temporal Difference Learning
  • Lecture 3, Part 3: Dynamic Programming with Certainty Equivalence
  • Lecture 3, Part 4: Comparing Algorithms for Policy Evaluation
    1. Lecture 3 Slides
    2. Additional Materials:
      • SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3
      • David Silver's Lecture 4 [link]
    Q-learning
  • Lecture 4 Part 1: Refresh Your Understanding
  • Lecture 4 Part 2: Generalized Policy Iteration and E-Greedy
  • Lecture 4 Part 3: MC Control with e-greedy policies (tabular)
  • Lecture 4 Part 4: TD Methods, SARSA and Q-learning for Tabular Control
  • Lecture 4 Part 5: Maximization Bias
    1. Lecture 4 Slides
    2. Additional Materials:
      • SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7
    RL with function approximation
  • Lecture 5 Part 1: Refresh Your Understanding
  • Lecture 5 Part 2: Value Function Approximation
  • Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA
  • Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees
  • Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA
  • Lecture 5 Part 6: MC and TD Control with Linear VFA

  • Lecture 6 Part 1: Refresh Your Understanding
  • Lecture 6 Part 2: Function Approximation with Deep Neural Networks
  • Lecture 6 Part 3: Function Approximation with CNN
  • Lecture 6 Part 4: Deep RL — DQN

  • Lecture 7 Part 1: Refresh Your Understanding
  • Lecture 7 Part 2: Double DQN
  • Lecture 7 Part 3: Prioritized Replay
  • Lecture 7 Part 4: Dueling DQN
  • Lecture 7 Part 5: Practical Tips for DQN on Atari

  • Pytorch Tutorial
  • Deep Learning Overview (from Winter 2020)
    1. Lecture 5 Slides
    2. Lecture 6 Slides
    3. Lecture 7 Slides
    4. Additional Materials:
    Policy search
  • Lecture 8 Part 1: Refresh Your Understanding
  • Lecture 8 Part 2: Policy Search Methods
  • Lecture 8 Part 3: Gradient-free Methods
  • Lecture 8 Part 4: Finite Difference Methods
  • Lecture 8 Part 5: Score Functions
  • Lecture 8 Part 6: Likelihood Ratio Policy Gradient
  • Lecture 8 Part 7: REINFORCE

  • Lecture 9 Part 1: Refresh Your Understanding
  • Lecture 9 Part 2: Policy-Based RL Recap
  • Lecture 9 Part 3: Better Gradient Estimates
  • Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance
  • Lecture 9 Part 5: Need for Automatic Step Size Tuning
  • Lecture 9 Part 6: Local Approximation
  • Lecture 9 Part 7: Trust Regions
  • Lecture 9 Part 8: TRPO Algorithm
    1. Lecture 8 Slides
    2. Lecture 9 Slides
    3. Additional Materials:
      • SB (Sutton and Barto) Chp 13

    Fast Learning
  • Lecture 10 Part 1: Refresh Your Understanding
  • Lecture 10 Part 2: Introduction to Multi-armed Bandits
  • Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm
  • Lecture 10 Part 4: Regret
  • Lecture 10 Part 5: E-greedy Algorithm
  • Lecture 10 Part 6: Optimism under Uncertainty
  • Lecture 10 Part 7: UCB Bandit Regret

  • Lecture 11 Part 1: Refresh Your Understanding
  • Lecture 11 Part 2: Bandits and Probably Approximately Correct
  • Lecture 11 Part 3: Bayesian Bandits
  • Lecture 11 Part 4: Bayesian Bandits Example
  • Lecture 11 Part 5: Thompson Sampling
  • Lecture 11 Part 6: Bayesian Regret and Probability Matching

  • Lecture 12 Part 1: Refresh Your Understanding
  • Lecture 12 Part 2: Refresh Your Understanding Solution
  • Lecture 12 Part 3: Fast RL in MDPs
  • Lecture 12 Part 4: Fast RL in Bayesian MDPs
  • Lecture 12 Part 5: Generalization and Exploration
    1. Lecture 10 Slides
    2. Lecture 11 Slides
    3. Lecture 12 Slides
    4. Additional Materials:
    Batch Reinforcement Learning
  • Lecture 13 Part 1: Refresh Your Understanding
  • Lecture 13 Part 2: Introduction to Batch RL
  • Lecture 13 Part 3: Batch RL Setting
  • Lecture 13 Part 4: Offline Batch Evaluation Using Models
  • Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions
  • Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling

  • Lecture 14: Guest Lecture from Professor Finale Doshi-Velez
    1. Lecture 13 Slides
    2. Lecture 14 Guest Lecture Slides
    Monte Carlo Tree Search
  • Lecture 15 Part 1: Refresh Your Understanding
  • Lecture 15 Part 2: Model-Based RL
  • Lecture 15 Part 3: Simulation-Based Search
  • Lecture 15 Part 4: Upper Confidence Tree Search
  • Lecture 15 Part 5: Case Study: the Game of Go
  • Lecture 15 Slides
  • Annotated Lecture 15 Slides
  • Rewards, Value Alignment,
    and Wrapping Up
  • Lecture 16 Recording
  • Lecture 16 Slides