| Topic |
Videos (on Canvas/Panopto) |
Course Materials |
|
| Introduction to Reinforcement Learning |
(2022 Live)
Lecture 1
(2021 Recording)
Part
1: Course Overview
Part
2: Course Logistics
Part
3: Intro to Sequential Decision Making
|
- Lecture 1 Slides Post class version
- Additional Materials:
|
| Tabular MDP planning |
(2022 Live)
Lecture 2
(2021 Recording)
Lecture
2, Part 1: Policy Evaluation
Lecture
2, Part 2: Policy Iteration
Lecture
2, Part 3: Value Iteration
Lecture
2, Part 4: Bellman contraction operator and finite horizon
|
- Lecture 2 Slides (pre-class) [Post class, annotated]
- Additional Materials:
|
| Tabular RL policy evaluation |
(2022 Live)
Lecture 3
(2021 Recording)
Lecture
3, Part 1: Monte Carlo Policy Evaluation
Lecture
3, Part 2: Temporal Difference Learning
Lecture
3, Part 3: Dynamic Programming with Certainty Equivalence
Lecture
3, Part 4: Comparing Algorithms for Policy Evaluation
|
- Lecture 3 Slides (pre-class) [Post class, with annotations]
- Additional Materials:
|
| Q-learning |
(2022 Live)
Lecture 4
(2021 Recording)
Lecture 4 Part 1: Generalized Policy Iteration and E-Greedy
Lecture 4 Part 2: MC Control with e-greedy policies (tabular)
Lecture 4 Part 3: TD Methods, SARSA and Q-learning for Tabular Control
Lecture 4 Part 4: Maximization Bias
|
- Lecture 4 Slides (post class with annotations)
- Additional Materials:
|
| RL with function approximation |
(2022 Live)
Lecture 5
Lecture 6
Lecture 7
(2021 Recording)
Lecture 5 Part 2: Value Function Approximation
Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA
Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees
Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA
Lecture 5 Part 6: MC and TD Control with Linear VFA
Lecture 6 Part 2: Function Approximation with Deep Neural Networks
Lecture 6 Part 3: Function Approximation with CNN
Lecture 6 Part 4: Deep RL — DQN
Lecture 7 Part 2: Double DQN
Lecture 7 Part 3: Prioritized Replay
Lecture 7 Part 4: Dueling DQN
Lecture 7 Part 5: Practical Tips for DQN on Atari
Pytorch Tutorial (from Winter 2021)
Deep Learning Overview (from Winter 2020)
|
- Lecture 5 Slides [Post lecture with annotations]
- Lecture 6 Slides [Post class annotations]
- Lecture 7 Slides [Post class annotations]
- Additional Materials:
|
| Policy search |
(2022 Live)
Lecture 8
Lecture 9
(2021 Recording)
Lecture 8 Part 2: Policy Search Methods
Lecture 8 Part 3: Gradient-free Methods
Lecture 8 Part 4: Finite Difference Methods
Lecture 8 Part 5: Score Functions
Lecture 8 Part 6: Likelihood Ratio Policy Gradient
Lecture 8 Part 7: REINFORCE
Lecture 9 Part 2: Policy-Based RL Recap
Lecture 9 Part 3: Better Gradient Estimates
Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance
Lecture 9 Part 5: Need for Automatic Step Size Tuning
Lecture 9 Part 6: Local Approximation
Lecture 9 Part 7: Trust Regions
Lecture 9 Part 8: TRPO Algorithm
|
- Lecture 8 Slides [Post class with annotations]
- Lecture 9 Slides [Post class]
- Additional Materials:
|
| Fast Learning |
(2022 Live)
Lecture 10
Lecture 11
Lecture 12
(2021 Recording)
Lecture 10 Part 2: Introduction to Multi-armed Bandits
Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm
Lecture 10 Part 4: Regret
Lecture 10 Part 5: E-greedy Algorithm
Lecture 10 Part 6: Optimism under Uncertainty
Lecture 10 Part 7: UCB Bandit Regret
Lecture 11 Part 2: Bandits and Probably Approximately Correct
Lecture 11 Part 3: Bayesian Bandits
Lecture 11 Part 4: Bayesian Bandits Example
Lecture 11 Part 5: Thompson Sampling
Lecture 11 Part 6: Bayesian Regret and Probability Matching
Lecture 12 Part 3: Fast RL in MDPs
Lecture 12 Part 4: Fast RL in Bayesian MDPs
Lecture 12 Part 5: Generalization and Exploration
|
- Lecture 10 Draft Slides [Post class with annotations]
- Lecture 11 Slides [Post class, with annotations]
- Lecture 12 Slides [Post class, with annotations]
- Additional Materials:
|
| Batch Reinforcement Learning |
(2022 Live)
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Guest Lecture 17
(2021 Recording)
Lecture 13 Part 2: Introduction to Batch RL
Lecture 13 Part 3: Batch RL Setting
Lecture 13 Part 4: Offline Batch Evaluation Using Models
Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions
Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling
|
- Lecture 13 Slides [Post class, with annotations]
- Lecture 14 Imitation Learning Slides [Post class, with annotations]
- Lecture 15 Batch Policy Learning [Post class, with annotations]
- Lecture 16 Reinforcement Learning and Reward
- Additional Materials:
|