Course Description & Logistics
To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions.
Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range
of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide
a solid introduction to the field of reinforcement learning and students will learn about the core
challenges and approaches, including generalization and exploration. Through a combination of lectures,
and written and coding assignments, students will become well versed in key ideas and techniques for RL.
Assignments will include the basics of reinforcement learning as well as deep reinforcement learning —
an extremely promising new area that combines deep learning techniques with reinforcement learning.
We will use Ed discussion forums. We encourage all students to use Ed for the fastest response to your questions.
- Lectures will be live every Tuesday and Thursday: Videos of the lecture content will also be made available to enrolled students through canvas.
- Office hours: Will be announced in the first week of class
All assignments and quizzes will be handled through Gradescope, where you will also find your
grades. We will send out links and access codes to enrolled students through Canvas.
You can find previous years (Winter 2022
, Winter 2021
, Winter 2020
, Winter 2019
Prerequisites for This Class
Proficiency in Python
All class assignments will be in Python.
There is a tutorial here for
those who aren't as familiar with Python. If you have a lot of programming experience but
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking derivatives and understanding matrix vector operations and
Basic Probability and Statistics (e.g. CS 109 or other stats course)
You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with
gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be
more intuitive with some knowledge of convex optimization.
By the end of the class students should be able to:
Define the key features of reinforcement learning that distinguishes it from AI
and non-interactive machine learning (as assessed by the exam).
Given an application problem (e.g. from computer vision, robotics, etc), decide
if it should be formulated as a RL problem; if yes be able to define it formally
(in terms of the state space, action space, dynamics and reward model), state what
algorithm (from class) is best suited for addressing it and justify your answer
(as assessed by the exam).
Implement in code common RL algorithms (as assessed by the assignments).
Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate
algorithms on these metrics: e.g. regret, sample complexity, computational complexity,
empirical performance, convergence, etc (as assessed by assignments and the exam).
Describe the exploration vs exploitation challenge and compare and contrast at least
two approaches for addressing this challenge (in terms of performance, scalability,
complexity of implementation, and theoretical guarantees) (as assessed by an assignment
and the exam).
There is no official textbook for the class but a number of the supporting readings will come from:
Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for
free here and references will
refer to the final pdf version available here.
Some other additional references that may be useful are listed below:
- Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds.
- Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig.[link]
- Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]
- David Silver's course on Reinforcement Learning [link]
- Assignment 1: 10%
- Assignment 2: 18%
- Assignment 3: 18%
- Midterm: 25%
- Quiz: 5%
- Course Project: 24%
- Proposal: 1%
- Milestone: 2%
- Poster Presentation: 5%
- Paper: 16%
- If you choose to do the default project/4th assignment, your breakdown will instead be
- Poster presentation: 5%
- Paper/assignment write up: 19%
- 0.5% bonus for participating [answering lecture polls for 80% of the days we have lecture with polls. You may participate in these remotely as well. These are due by Sunday at 6pm for the week of lecture. You should complete these by logging in with your Stanford sunid in order for your participation to count.]
Late Day Policy
You can use 5 late days total.
A late day extends the deadline by 24 hours.
You are allowed up to 2 late days for assignments 1, 2, 3, project proposal, and project milestone, not to exceed 5 late days total. You may not use any late days for the project poster presentation and final project paper. For group submissions such as the project proposal and milestone, all group members must have the corresponding number of late days used on the assignment, and if one or more members do not have a sufficient amount of late days, all group members will incur a grade penalty of 50% within 24 hours and 100% after 24 hours, as explained below.
- If you use two late days and hand an assignment in after 48 hours, it will be worth at most 50%. If you do not have enough late days left, handing the assignment within 1 day after it was due (adjusting for the late days used) will be worth at most 50%. No credit will be given to assignments handed in after 24 hours they were due (adjusting for any late days. E.g. if you use 2 late days, then after this policy applies 24 hours after your 2 late days, e.g. after 72 hours). Please contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.
- There will be one midterm and one quiz. See the schedule for the dates.
- Exams will be held in class for on-campus students.
Conflicts: If you are not able to attend the in class
midterm and quizzes with an official reason, please email us at
firstname.lastname@example.org, as soon as you can
so that an accommodation can be scheduled. (Historically this is either to ask you
to take the exam remotely at the same time, or to schedule an alternate exam time).
Notes for the exams: You are welcome to bring a 1-sided 1
(letter sized) page of handwritten notes to the midterm. For the quiz you
are welcome to bring a double sided (letter sized) page of handwritten notes. No calculators, laptops, cell phones, tablets or other resources will
Assignments and Submission Process
Assignments: See Assignments page
where all the assignments will be posted.
Computing Resources: We will have some cloud resources available for later assignments.
Submission Process: The submission instructions for the assignments
can also be found on the Assignments page.
We believe students often learn an enormous amount from each other as well as from us, the course staff.
discussion and peer learning, we request that you please use Ed
for all questions related to lectures and
For SCPD students, if you have generic SCPD specific questions, please email email@example.com
or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular
class, please contact
us at firstname.lastname@example.org
For exceptional circumstances that require us to make special arrangements, please email us at
. For example,
such a situation may arise if a
student requires extra days
to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date
to events such as
conference travel etc. They will be considered and approved on a case by case basis.
If you think that the course staff made a quantifiable error in grading your assignment
or exam, then you are welcome to submit a regrade request. Regrade requests should be made on gradescope and will be accepted
for three days after assignments or exams are returned.
Note that while doing a regrade we may review your entire assigment, not just the part you
bring to our attention (i.e. we may find errors in your work that we missed before).
Academic Collaboration and Misconduct
I care about academic collaboration and misconduct because it is important both that we are able to evaluate
your own work (independent of your peer’s)
and because not claiming others’ work as your own is an important part of integrity in your future career. I
understand that different
institutions and locations can have different definitions of what forms of collaborative behavior is
acceptable. In this class,
for written homework problems, you are welcome to discuss ideas with others, but you are expected to write up
your own solutions
independently (without referring to another’s solutions). For coding, you may only share the input-output behavior
of your programs. This encourages you to work separately but share ideas
on how to test your implementation. Please remember that if you share your solution with another student, even
if you did not copy from
another, you are still violating the honor code.
We periodically run similarity-detection software over all submitted student programs, including programs from
past quarters and any
solutions found online on public websites. Anyone violating the Stanford University
will be referred to the
Office of Judicial Affairs.
If you think you made a mistake (it can happen, especially under stress or when time is short!), please reach
out to Emma or the head CA;
the consequences will be much less severe than if we approach you.
If you need an academic accommodation based on the impact of a disability, please share your Office of Accessible Education letter with us via an email to our course staff list as soon as it is convenient for you. This helps us ensure
the course materials and staff support can comply with your needs.
The OAS is located at 563 Salvatierra Walk
Credit/No Credit Enrollment
If you're enrolled in the class on credit/no credit status, you will be graded on work as usual
per standard Stanford rules. The only distinction with those taking the class for letter grade is that you
must obtain a C- (C minus) grade or higher in the class, for you to be marked as CR.