Prerequisites for This Class
Proficiency in Python
All class assignments will be in Python (using numpy and Tensorflow and optionally Keras).
There is a tutorial here for
those who aren't as familiar with Python. If you have a lot of programming experience but
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking derivatives and understanding matrix vector operations and
Basic Probability and Statistics (e.g. CS 109 or other stats course)
You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with
gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be
more intuitive with some knowledge of convex optimization.
To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions.
Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range
of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide
a solid introduction to the field of reinforcement learning and students will learn about the core
challenges and approaches, including generalization and exploration. Through a combination of lectures,
and written and coding assignments, students will become well versed in key ideas and techniques for RL.
Assignments will include the basics of reinforcement learning as well as deep reinforcement learning —
an extremely promising new area that combines deep learning techniques with reinforcement learning.
In addition, students will advance their understanding and the field of RL through an open ended project.
By the end of the class students should be able to:
Define the key features of reinforcement learning that distinguishes it from AI
and non-interactive machine learning (as assessed by the exam).
Given an application problem (e.g. from computer vision, robotics, etc), decide
if it should be formulated as a RL problem; if yes be able to define it formally
(in terms of the state space, action space, dynamics and reward model), state what
algorithm (from class) is best suited for addressing it and justify your answer
(as assessed by the project and the exam).
Implement in code common RL algorithms such as a deep RL algorithm, including
imitation learning (as assessed by the homeworks).
Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate
algorithms on these metrics: e.g. regret, sample complexity, computational complexity,
empirical performance, convergence, etc (as assessed by homeworks and the exam).
Describe the exploration vs exploitation challenge and compare and contrast at least
two approaches for addressing this challenge (in terms of performance, scalability,
complexity of implementation, and theoretical guarantees) (as assessed by an assignment
and the exam).
Class Time and Location
Winter quarter (January 08 - March 16, 2018)
Lecture: Monday, Wednesday 11:30 AM - 12:50 PM
Location: NVIDIA Auditorium
Course Schedule / Syllabus (Including Due Dates)
See the Course Schedule
There is no official textbook for the class but a number of the supporting readings will come from:
Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for
free here and references will
refer to the January 1 2018 draft available here.
Some other additional references that may be useful are listed below:
- Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds.
- Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig.[link]
- Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]
- Assignment 1: 10%
- Assignment 2: 20%
- Assignment 3: 15%
- Midterm: 25%
- Quiz: 5%
- Individual: 4.5%
- Group: 0.5%
- Course Project: 25%
- Proposal: 1%
- Milestone: 3%
- Poster Presenation: 5%
- Paper: 16%
Late Day Policy
You can use 6 late days.
A late day extends the deadline by 24 hours.
You are allowed up to 2 late days per assignment. If you hand an assignment in after 48 hours,
it will be worth at most 50%. No credit will be given to assignments handed in after 72 hours
— contact us if you think you have an extremely rare circumstance for which we should make an
exception. This policy is to ensure that feedback can be given in a timely manner.
You can use late days on the project proposal (up to 2) and milestone (up to 2). No late days are
allowed for the poster presentation and final report. Any late days on the project writeup will
decrease the potential score on the project by 25%. To use a late day on the project proposal or
milestone, it is allowable to pool late days between team members: in order words, one can use any
single team member’s late day (e.g. team member A can use her late day, and team member B can use
his late day, and that yields 2 total late days for the project proposal).
If you think that the course staff made a quantifiable error in grading your assignment
or exam, then you are welcome to submit a regrade request. If you wish to do so, you
must come in person to one of the owners for the assignment or exam
question -- the owners will be clearly stated on an assignment webpage or in the exam
feedback. For SCPD students you can do this during one of the relevant CA’s office hours
in Zoom. Unfortunately due to the class size, requests that are submitted over email or
on Piazza will be ignored. In considering whether to make a request, we encourage you to
consider that even if the grading may seem overly strict to you, we are applying the same
rubric to all students for fairness, so the strictness of the grading is not a suitable
justification request for a regrade. If your regrade request is valid, then the CA will
add your request to a list of regrade requests which will be processed accordingly and
you will be informed by the result of the regrade. Regrade requests will only be accepted
for three days after assignments are returned.
Note that while doing a regrade we may review your entire assigment, not just the part you
bring to our attention (i.e. we may find errors in your work that we missed before).
Midterm (in class): February 14
Quiz (in class): March 12
Conflicts: If you are not able to attend the in class
midterm and quizzes with an official reason, please email the course CA Anchit at
firstname.lastname@example.org, as soon as you can
so that an accommodation can be scheduled. (Historically this is either to ask you
to take the exam remotely at the same time, or to schedule an alternate exam time).
Notes for the exams: You are welcome to bring a 1-sided 1
(letter sized) page of handwritten or typed notes to the midterm. For the quiz you
are welcome to bring a double sided (letter sized) page of handwritten or typed notes
to the midterm. No calculators, laptops, cell phones, tablets or other resources will
Emma's office hours will be held in Gates 218
Alex and Xinkun will hold office hours in the Lathrop Learning Hub
Other CA office hours will be held in the Huang Basement
For both in-person and online SCPD office hours, you will need to register an account on QueueStatus
. When you wish to join the queue,
click "Sign Up" at the CS234 queue
. Be sure to enter your email when you "Sign Up"; this is a way for the
CA to contact you. Look for announcements on the left panel for more information. For online office hours, you will need to install Zoom (instructions below) to
video call with the CA: the CA will contact you via Zoom when he/she reaches you in the queue.
Instructions for installing Zoom:
Go to the Zoom Client for Linux page and download the correct Linux package for your Linux
distribution type, OS architecture and version.
Follow the linux installation instructions here.
Download Zoom installer here.
Installation instructions can be found here.
- Go to Stanford Zoom and select 'Launch Zoom'.
- Click 'Host a Meeting'; nothing will launch but this will give a link to 'download & run Zoom'.
- Click on 'download & run Zoom' to obtain and download 'Zoom_launcher.exe'.
- Run 'Zoom_launcher.exe' to install.
Assignments, Course Project and Submission Process
Assignments: See Assignments page
where all the assignments will be posted.
Course Project: See the Course Project page
for more details on the course project.
Computing Resources: We will have a limited number of Azure credits
for use in the Assigment 2 and the project. Instructions for how to access these will be announced
for Assigment 2.
Submission Process: The submission instructions for the assignments and
the project can also be found on the Assignments page.
Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions and these are harder to do and benefit
from by yourself. However, if you are not able to attend class, the class is recorded. It has previously been shown that watching lecture
videos in small groups with one person pausing to facilitate discussion can yield student performance as high as attending lectures live,
and we have heard of students getting together to watch videos in small groups in the past, so we encourage you to
consider this if you are unable to attend a particular lecture or if you’re participating in the class as a SCPD student. I am always
excited to hear about new ways students find to effectively learn the material, so sharing such tips is always appreciated.
We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate
discussion and peer learning, we request that you please use Piazza
for all questions related to lectures, homeworks and projects.
For SCPD students, if you have generic SCPD specific questions, please email email@example.com
or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular class, please contact
us at firstname.lastname@example.org
For exceptional circumstances that require us to make special arrangements, please email the course CA Anchit at
. For example, such a situation may arise if a student requires extra days
to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date due to events such as
conference travel etc. They will be considered and approved on a case by case basis.
You will be awarded with up to 2% extra credit if you answer other students' questions in a substantial and helpful way on
. NOTE: If you enrolled in this class on Axess, you should be added to the
Piazza group automatically within a few hours. You can also register independently — there is no access code required to join the group.
Academic Collaboration and Misconduct
I care about academic collaboration and misconduct because it is important both that we are able to evaluate your own work (independent of your peer’s)
and because not claiming others’ work as your own is an important part of integrity in your future career. I understand that different
institutions and locations can have different definitions of what forms of collaborative behavior is considered acceptable. In this class,
for written homework problems, you are welcome to discuss ideas with others, but you are expected to write up your own solutions
independently (without referring to another’s solutions). For coding, you are allowed to do projects in groups of 2, but for any other
collaborations, you may only share the input-output behavior of your programs. This encourages you to work separately but share ideas
on how to test your implementation. Please remember that if you share your solution with another student, even if you did not copy from
another, you are still violating the honor code. In terms of the final project, you are welcome to combine this project with another class
assuming that the project is relevant to both classes, given that you take prior permission of the class instructors. If your project is
an extension of a previous class project, you are expected to make significant additional contributions to the project.
We periodically run similarity-detection software over all submitted student programs, including programs from past quarters and any
solutions found online on public websites. Anyone violating the Stanford University
will be referred to the Office of Judicial Affairs.
If you think you made a mistake (it can happen, especially under stress or when time is short!), please reach out to Emma or the head CA;
the consequences will be much less severe than if we approach you.
Students with Documented Disabilities
Students who may need an academic accommodation based on the impact of a disability must initiate
the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request
with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter
for faculty dated in the current quarter in which the request is being made. Students should contact
the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located
at 563 Salvatierra Walk
Credit/No Credit Enrollment
If you're enrolled in the class on credit/no credit status, you will be graded on work as usual
per standard Stanford rules. The only distinction with those taking the class for letter grade is that you
must obtain a C- (C minus) grade or higher in the class, for you to be marked as CR.