| Module |
Videos (on Canvas/Panopto) |
Course Materials |
|
| Introduction to Reinforcement Learning |
Part
1: Course Overview
Part
2: Course Logistics
Part
3: Intro to Sequential Decision Making
|
- Lecture 1 Slides
- Additional Materials:
|
| Tabular MDP planning |
Lecture
2, Part 1: Policy Evaluation
Lecture
2, Part 2: Policy Iteration
Lecture
2, Part 3: Value Iteration
Lecture
2, Part 4: Bellman contraction operator and finite horizon
|
- Lecture 2 Slides
- Additional Materials:
- SB (Sutton and Barto) Chp 3, 4.1-4.4
|
| Tabular RL policy evaluation |
Lecture
3 Refresh Your Understanding
Lecture
3, Part 1: Monte Carlo Policy Evaluation
Lecture
3, Part 2: Temporal Difference Learning
Lecture
3, Part 3: Dynamic Programming with Certainty Equivalence
Lecture
3, Part 4: Comparing Algorithms for Policy Evaluation
|
- Lecture 3 Slides
- Additional Materials:
- SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3
- David Silver's Lecture 4 [link]
|
| Q-learning |
Lecture
4 Part 1: Refresh Your Understanding
Lecture 4 Part 2: Generalized Policy Iteration and E-Greedy
Lecture 4 Part 3: MC Control with e-greedy policies (tabular)
Lecture 4 Part 4: TD Methods, SARSA and Q-learning for Tabular Control
Lecture 4 Part 5: Maximization Bias
|
- Lecture 4 Slides
- Additional Materials:
- SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7
|
| RL with function approximation |
Lecture 5 Part 1: Refresh Your Understanding
Lecture 5 Part 2: Value Function Approximation
Lecture 5 Part 3: MC Learning for Policy Evaluation with Linear VFA
Lecture 5 Part 4: MC Learning for Policy Evaluation with Linear VFA Convergence Guarantees
Lecture 5 Part 5: TD Learning for Policy Evaluation with Linear VFA
Lecture 5 Part 6: MC and TD Control with Linear VFA
Lecture 6 Part 1: Refresh Your Understanding
Lecture 6 Part 2: Function Approximation with Deep Neural Networks
Lecture 6 Part 3: Function Approximation with CNN
Lecture 6 Part 4: Deep RL — DQN
Lecture 7 Part 1: Refresh Your Understanding
Lecture 7 Part 2: Double DQN
Lecture 7 Part 3: Prioritized Replay
Lecture 7 Part 4: Dueling DQN
Lecture 7 Part 5: Practical Tips for DQN on Atari
Pytorch Tutorial
Deep Learning Overview (from Winter 2020)
|
- Lecture 5 Slides
- Lecture 6 Slides
- Lecture 7 Slides
- Additional Materials:
|
| Policy search |
Lecture 8 Part 1: Refresh Your Understanding
Lecture 8 Part 2: Policy Search Methods
Lecture 8 Part 3: Gradient-free Methods
Lecture 8 Part 4: Finite Difference Methods
Lecture 8 Part 5: Score Functions
Lecture 8 Part 6: Likelihood Ratio Policy Gradient
Lecture 8 Part 7: REINFORCE
Lecture 9 Part 1: Refresh Your Understanding
Lecture 9 Part 2: Policy-Based RL Recap
Lecture 9 Part 3: Better Gradient Estimates
Lecture 9 Part 4: Policy Gradient Algorithms and Reducing Variance
Lecture 9 Part 5: Need for Automatic Step Size Tuning
Lecture 9 Part 6: Local Approximation
Lecture 9 Part 7: Trust Regions
Lecture 9 Part 8: TRPO Algorithm
|
- Lecture 8 Slides
- Lecture 9 Slides
- Additional Materials:
- SB (Sutton and Barto) Chp 13
|
| Fast Learning |
Lecture 10 Part 1: Refresh Your Understanding
Lecture 10 Part 2: Introduction to Multi-armed Bandits
Lecture 10 Part 3: Multi-armed Bandit Greedy Algorithm
Lecture 10 Part 4: Regret
Lecture 10 Part 5: E-greedy Algorithm
Lecture 10 Part 6: Optimism under Uncertainty
Lecture 10 Part 7: UCB Bandit Regret
Lecture 11 Part 1: Refresh Your Understanding
Lecture 11 Part 2: Bandits and Probably Approximately Correct
Lecture 11 Part 3: Bayesian Bandits
Lecture 11 Part 4: Bayesian Bandits Example
Lecture 11 Part 5: Thompson Sampling
Lecture 11 Part 6: Bayesian Regret and Probability Matching
Lecture 12 Part 1: Refresh Your Understanding
Lecture 12 Part 2: Refresh Your Understanding Solution
Lecture 12 Part 3: Fast RL in MDPs
Lecture 12 Part 4: Fast RL in Bayesian MDPs
Lecture 12 Part 5: Generalization and Exploration
|
- Lecture 10 Slides
- Lecture 11 Slides
- Lecture 12 Slides
- Additional Materials:
|
| Batch Reinforcement Learning |
Lecture 13 Part 1: Refresh Your Understanding
Lecture 13 Part 2: Introduction to Batch RL
Lecture 13 Part 3: Batch RL Setting
Lecture 13 Part 4: Offline Batch Evaluation Using Models
Lecture 13 Part 5: Offline Batch Evaluation Using Q-functions
Lecture 13 Part 6: Offline Batch Evaluation Using Importance Sampling
Lecture 14: Guest Lecture from Professor Finale Doshi-Velez
|
- Lecture 13 Slides
- Lecture 14 Guest Lecture Slides
|
| Monte Carlo Tree Search |
Lecture 15 Part 1: Refresh Your Understanding
Lecture 15 Part 2: Model-Based RL
Lecture 15 Part 3: Simulation-Based Search
Lecture 15 Part 4: Upper Confidence Tree Search
Lecture 15 Part 5: Case Study: the Game of Go
|
Lecture 15 Slides
Annotated Lecture 15 Slides
|
Rewards, Value Alignment,
and Wrapping Up |
Lecture 16 Recording
|
Lecture 16 Slides
|