Course Description & Logistics

To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning.

Communication: We will use Piazza for all communications. We encourage all students to use Piazza and you may submit public or private posts.

Learning remotely: We know with covid-19 and other world events, that this quarter will be difficult. To better support your learning during this unprecendented time, we have changed the course format. The new format is experimental, based on the experience of other instructors who have taught recent classes since the pandemic began: please let us know if you have ideas or feedback about how to further improve the class and your learning and we will take those into consideration for this and future offerings.
  • Modules (videos and slides): All standard lecture materials will be delivered through modules with pre-recorded course videos that you can watch at your own time. Each week's modules are listed in the schedule and can be accessed here, and will be posted by the end of Sunday before that week's class. Guest lectures will be presented live and recorded for later watching. Recordings will be available to enrolled students through Canvas.
  • Lecture Watch Parties: During our class time each Monday, we will have an optional CA-facilitated watch party in breakout rooms for the first lecture in that week. A CA will pause the video at periodic intervals to check your understanding and answer questions. Links to the relevant Zoom or Nook session for a particular watch party will be provided in the schedule.
  • Problem sessions: During our class time each Wednesday, we will have an optional CA-lead session to go through exercises about the material and also practice working on materials in breakout rooms with CA support.
  • Group office hours: We will have group office hours on Nooks for assignments in addition to 1:1 office hours. Group Office Hours are on 5PM-8PM Wednesday, Thursday, and Friday each week.
  • 1:1 office hours: Students can sign up for 1:1 office hours with faculty and CAs. These will all be appointment-based so that students need not to wait in queue. See our calendar for times and sign up links. You can also find the sign up link here. Please contact us through Piazza if you want to schedule one but cannot find an available slot. Video conference links will be provided during sign up.
  • Quizzes: Instead of a large high-stakes midterm, there will be four quizzes over the course. We will drop the lowest score of Quiz 1-3.

Platforms: Besides Piazza, we will use a combination of Zoom and Nooks to hold class activities and office hours. All assignments and quizzes will be handled through Gradescope, where you will also find your grades. We will send out links and access codes to enrolled students through Canvas.

Time / Location: All class activities and office hours are in our calendar. Note: Office hour links will be posted by the end of Monday January 11. All times are in Stanford local time (Pacific Time, PT).

You can find previous years (Winter 2020, Winter 2019, Winter 2018) materials here.

Prerequisites for This Class

  • Proficiency in Python
    All class assignments will be in Python. There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/ C++/ Matlab/ Javascript) you will probably be fine.
  • College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
    You should be comfortable taking derivatives and understanding matrix vector operations and notation.
  • Basic Probability and Statistics (e.g. CS 109 or other stats course)
    You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
  • Foundations of Machine Learning
    We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.

Learning Outcomes

By the end of the class students should be able to:

  • Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
  • Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the exam).
  • Implement in code common RL algorithms (as assessed by the assignments).
  • Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by assignments and the exam).
  • Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).

Course Modules (Videos and Slides)

See the Modules page.

Course Schedule


Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Week 1 Jan 11 Jan 12 Jan 13 Jan 14 Jan 15 Jan 16 Jan 17
Modules
Introduction to Reinforcement Learning

Tabular MDP Planning

[Quiz 0]

Live Lecture (Brunskill)
Introduction to Reinforcement Learning
11:30am-12:50pm
(Optional) Problem Session 1
11:30am-12:50pm

Recording
[Problems]
[Solutions]

[Assignment 1 Released]
Week 1 Check Your Understanding

Due at 6 pm
Quiz 0

Due at 6 pm

[Quiz 0]
[Solution]
Week 2 Jan 18
Jan 19
Jan 20 Jan 21 Jan 22 Jan 23 Jan 24
Modules
Tabular RL Policy Evaluation

[Assignment 1]

Holiday (No session) (Optional) Problem Session 2
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Assignment 1

Week 2 Check Your Understanding

Due at 6 pm
Week 3 Jan 25
Jan 26 Jan 27 Jan 28 Jan 29
Jan 30 Jan 31
Modules
Q-Learning

RL with function approximation

[Quiz 1]

(Optional) Watch Party
Q-Learning
11:30am-12:50pm

[Assignment 2 Released]
(Optional) Problem Session 3
11:30am-12:50pm

[Problems]
[Solutions]
Quiz 1
[Solution]

Week 3 Check Your Understanding

Due at 6 pm
Week 4 Feb 1
Feb 2
Feb 3 Feb 4 Feb 5 Feb 6 Feb 7
Modules
RL with function approximation

[Assignment 2]

(Optional) Watch Party
RL with Function Approximation
11:30am-12:50pm
(Optional) Problem Session 4
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Assignment 2 Part 1

Week 4 Check Your Understanding

Due at 6 pm
Week 5 Feb 8
Feb 9
Feb 10 Feb 11 Feb 12 Feb 13 Feb 14
Modules
Policy Search

[Assignment 2]

(Optional) Watch Party
Policy Search
11:30am-12:50pm
(Optional) Problem Session 5
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Assignment 2 Part 2

Week 5 Check Your Understanding

Due at 6 pm

[Assignment 3 Released]
Week 6 Feb 15
Feb 16
Feb 17 Feb 18 Feb 19 Feb 20 Feb 21
Modules
Fast Learning

[Quiz 2]

Holiday (No Session) (Optional) Problem Session 6
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Quiz 2
[Solution]

Week 6 Check Your Understanding

Due at 6 pm

Week 7 Feb 22
Feb 23
Feb 24 Feb 25 Feb 26 Feb 27 Feb 28
Modules
Fast Learning

[Assignment 3]

(Optional) Watch Party
Fast Learning
11:30am-12:50pm
(Optional) Problem Session 7
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Week 7 Check Your Understanding

Due at 6 pm

[Assignment 4 Released]
Assignment 3

Due at 6 pm
Week 8 Mar 1
Mar 2
Mar 3 Mar 4 Mar 5 Mar 6 Mar 7
Modules
Batch Reinforcement Learning

[N/A]

(Optional) Watch Party
Batch Reinforcement Learning
11:30am-12:50pm
Live Guest Lecture Professor Finale Doshi-Velez on batch RL
11:30am-12:50pm

[Slides]
Recording
Week 8 Check Your Understanding

Due at 6 pm
Week 9 Mar 8
Mar 9
Mar 10 Mar 11 Mar 12 Mar 13 Mar 14
Modules
Monte Carlo Tree Search

[Quiz 3]

(Optional) Watch Party
Monte Carlo Tree Search
11:30am-12:50pm
(Optional) Problem Session 8
11:30am-12:50pm

Recording
[Problems]
[Solutions]
Quiz 3

Week 9 Check Your Understanding

Due at 6 pm
Week 10 Mar 15
Mar 16
Mar 17 Mar 18 Mar 19 Mar 20 Mar 21
Modules
N/A

[Assignment 4]


Live Lecture
Rewards, Value Alignment,
Wrap up
11:30am-12:50pm

Recording
No Session

Assignment 4
Due at 6 pm

Textbooks

There is no official textbook for the class but a number of the supporting readings will come from: Some other additional references that may be useful are listed below:

Grade Breakdown

  • Assignment 1: 10%
  • Assignment 2: 20%
  • Assignment 3: 16%
  • Assignment 4: 20%
  • Quizzes 0: 1%
  • Quizzes 1, 2, 3: 16% each (we will take top 2 scores of 3 quizzes to yield 16+16 = 32% of grade)
  • Exercises: 1% (to receive 1%, complete 80% or more of the check/refresh your understanding polls)

Late Day Policy

Quizzes

  • There will be 4 quizzes. See the schedule for the dates.
  • Quizzes will be handled through Gradescope. All quizzes must be submitted by 6 pm PST on Friday.
  • You will be allowed to pick a 2 hour interval to complete a quiz during a fixed time interval.
  • Quizzes are open book and open internet, but you should not discuss your answers with anyone else.

Assignments and Submission Process


Communication

We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Piazza for all questions related to lectures and assignments.

For SCPD students, if you have generic SCPD specific questions, please email scpdsupport@stanford.edu or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular class, please contact us at cs234-win2021-staff@lists.stanford.edu.

For exceptional circumstances that require us to make special arrangements, please email us at cs234-win2021-staff@lists.stanford.edu. For example, such a situation may arise if a student requires extra days to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date due to events such as conference travel etc. They will be considered and approved on a case by case basis.

Academic Collaboration and Misconduct

I care about academic collaboration and misconduct because it is important both that we are able to evaluate your own work (independent of your peer’s) and because not claiming others’ work as your own is an important part of integrity in your future career. I understand that different institutions and locations can have different definitions of what forms of collaborative behavior is considered acceptable. In this class, for written homework problems, you are welcome to discuss ideas with others, but you are expected to write up your own solutions independently (without referring to another’s solutions). For coding, you are allowed to do projects in groups of 2, but for any other collaborations, you may only share the input-output behavior of your programs. This encourages you to work separately but share ideas on how to test your implementation. Please remember that if you share your solution with another student, even if you did not copy from another, you are still violating the honor code.

We periodically run similarity-detection software over all submitted student programs, including programs from past quarters and any solutions found online on public websites. Anyone violating the Stanford University Honor Code will be referred to the Office of Judicial Affairs. If you think you made a mistake (it can happen, especially under stress or when time is short!), please reach out to Emma or the head CA; the consequences will be much less severe than if we approach you.

Academic Accommodation

Students who may need an academic accommodation based on the impact of a disability should initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should please contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (650-723-1066, http://studentaffairs.stanford.edu/oae).

Credit/No Credit Enrollment

If you're enrolled in the class on credit/no credit status, you will be graded on work as usual per standard Stanford rules. The only distinction with those taking the class for letter grade is that you must obtain a C- (C minus) grade or higher in the class, for you to be marked as CR.