Prerequisites for This Class

• Proficiency in Python
All class assignments will be in Python (using numpy and Tensorflow and optionally Keras). There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Javascript) you will probably be fine.
• College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking derivatives and understanding matrix vector operations and notation.
• Basic Probability and Statistics (e.g. CS 109 or other stats course)
You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
• Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.

Course Description

To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning. In addition, students will advance their understanding and the field of RL through an open ended project.

Learning Outcomes

By the end of the class students should be able to:

• Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
• Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the project and the exam).
• Implement in code common RL algorithms such as a deep RL algorithm, including imitation learning (as assessed by the homeworks).
• Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by homeworks and the exam).
• Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).

Class Time and Location

Winter quarter (January 08 - March 16, 2018)
Lecture: Monday, Wednesday 11:30 AM - 12:50 PM
Location: NVIDIA Auditorium

Course Schedule / Syllabus (Including Due Dates)

See the Course Schedule page.

Textbooks

There is no official textbook for the class but a number of the supporting readings will come from:
• Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for free here and references will refer to the January 1 2018 draft available here.
Some other additional references that may be useful are listed below:
• Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds. [link]
• Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig.[link]
• Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]

• Assignment 1: 10%
• Assignment 2: 20%
• Assignment 3: 15%
• Midterm: 25%
• Quiz: 5%
• Individual: 4.5%
• Group: 0.5%
• Course Project: 25%
• Proposal: 1%
• Milestone: 3%
• Poster Presenation: 5%
• Paper: 16%

Late Day Policy

• You can use 6 late days.
• A late day extends the deadline by 24 hours.
• You are allowed up to 2 late days per assignment. If you hand an assignment in after 48 hours, it will be worth at most 50%. No credit will be given to assignments handed in after 72 hours — contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.
• You can use late days on the project proposal (up to 2) and milestone (up to 2). No late days are allowed for the poster presentation and final report. Any late days on the project writeup will decrease the potential score on the project by 25%. To use a late day on the project proposal or milestone, it is allowable to pool late days between team members: in order words, one can use any single team member’s late day (e.g. team member A can use her late day, and team member B can use his late day, and that yields 2 total late days for the project proposal).

• Note that while doing a regrade we may review your entire assigment, not just the part you bring to our attention (i.e. we may find errors in your work that we missed before).

Exams

• Dates:
• Midterm (in class): February 14
• Quiz (in class): March 12

• Conflicts: If you are not able to attend the in class midterm and quizzes with an official reason, please email the course CA Anchit at anchitg@stanford.edu, as soon as you can so that an accommodation can be scheduled. (Historically this is either to ask you to take the exam remotely at the same time, or to schedule an alternate exam time).

• Notes for the exams: You are welcome to bring a 1-sided 1 (letter sized) page of handwritten or typed notes to the midterm. For the quiz you are welcome to bring a double sided (letter sized) page of handwritten or typed notes to the midterm. No calculators, laptops, cell phones, tablets or other resources will be allowed.

Office Hours

Emma's office hours will be held in Gates 218. Alex and Xinkun will hold office hours in the Lathrop Learning Hub. Other CA office hours will be held in the Huang Basement. See Calendar for times.

For both in-person and online SCPD office hours, you will need to register an account on QueueStatus. When you wish to join the queue, click "Sign Up" at the CS234 queue. Be sure to enter your email when you "Sign Up"; this is a way for the CA to contact you. Look for announcements on the left panel for more information. For online office hours, you will need to install Zoom (instructions below) to video call with the CA: the CA will contact you via Zoom when he/she reaches you in the queue.

Instructions for installing Zoom:
• Linux
• Go to the Zoom Client for Linux page and download the correct Linux package for your Linux distribution type, OS architecture and version.
• Follow the linux installation instructions here.
• Mac
• Installation instructions can be found here.
• Windows
• Go to Stanford Zoom and select 'Launch Zoom'.
• Click 'Host a Meeting'; nothing will launch but this will give a link to 'download & run Zoom'.
• Run 'Zoom_launcher.exe' to install.

Assignments, Course Project and Submission Process

• Assignments: See Assignments page where all the assignments will be posted.

• Course Project: See the Course Project page for more details on the course project.

• Computing Resources: We will have a limited number of Azure credits for use in the Assigment 2 and the project. Instructions for how to access these will be announced for Assigment 2.

• Submission Process: The submission instructions for the assignments and the project can also be found on the Assignments page.

Attendance

Attendance is not required but is encouraged. Sometimes we may do in class exercises or discussions and these are harder to do and benefit from by yourself. However, if you are not able to attend class, the class is recorded. It has previously been shown that watching lecture videos in small groups with one person pausing to facilitate discussion can yield student performance as high as attending lectures live, and we have heard of students getting together to watch videos in small groups in the past, so we encourage you to consider this if you are unable to attend a particular lecture or if you’re participating in the class as a SCPD student. I am always excited to hear about new ways students find to effectively learn the material, so sharing such tips is always appreciated.

Communication

We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Piazza for all questions related to lectures, homeworks and projects.

For SCPD students, if you have generic SCPD specific questions, please email scpdsupport@stanford.edu or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular class, please contact us at cs234-win1718-scpd@lists.stanford.edu.

For exceptional circumstances that require us to make special arrangements, please email the course CA Anchit at anchitg@stanford.edu. For example, such a situation may arise if a student requires extra days to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date due to events such as conference travel etc. They will be considered and approved on a case by case basis.

You will be awarded with up to 2% extra credit if you answer other students' questions in a substantial and helpful way on Piazza.

Announcements

See Piazza. NOTE: If you enrolled in this class on Axess, you should be added to the Piazza group automatically within a few hours. You can also register independently — there is no access code required to join the group.