| Topic |
Videos (on Canvas/Panopto) |
Course Materials |
|
| Introduction to Reinforcement Learning |
Lecture 1
|
- Lecture 1 Draft Slides [Post class version]
- Additional Materials:
|
| Tabular MDP planning |
Lecture 2
|
- Lecture 2 Slides (pre-class) [Post class, annotated]
- Additional Materials:
- SB (Sutton and Barto) Chp 3, 4.1-4.4
|
| Tabular RL policy evaluation |
Lecture 3
|
- Lecture 3 Slides (pre-class) [Post class, with annotations]
- Additional Materials:
- SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3
- David Silver's Lecture 4 [link]
|
| Q-learning |
Lecture 4
|
- Lecture 4 Slides (preclass) (post class with annotations)
- Additional Materials:
- SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7
|
| Policy Gradient |
Lecture 5
Lecture 6
Lecture 7
|
- Lecture 5 Slides [Post lecture with annotations]
- Lecture 6 Slides [Post class annotations]
- Lecture 7 Slides [Post class annotations]
- Additional Materials:
- SB (Sutton and Barto) Chp 13
|
| Imitation Learning and Learning from Human Input |
Lecture 8
Lecture 9 (including DPO guest lecture by Rafael Rafailov, Archit Sharma, Eric Mitchell)
Lecture 10
|
- Lecture 7 Slides [Post class annotations]
- Lecture 8 Slides (preclass) [Post class with annotations]
- Lecture 9 Slides [Post class]
- Lecture 9 DPO Slides
- Lecture 10 Slides [Post class]
- Additional Materials:
|
| Fast Learning / Data Efficient RL |
Lecture 11
Lecture 12
Lecture 13
|
- Lecture 11 Slides [Post class, with annotations]
- Lecture 12 Slides [Post class, with annotations]
- Lecture 13 Slides [Post class, with annotations]
- Additional Materials:
|
-->
| MCTS |
Lecture 14
|
- Lecture 14 Slides [Post class, with annotations]
|
| Rewards in Reinforcement Learning |
Lecture 15
|
Lecture 15 Slides (preclass) Post class with annotations
Lecture 15 (Value Alignment)
|
| Review and Looking Forward |
|
Lecture 16 Slides [post class]
|
|