Course Description & Logistics

To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning and the basics of RL from human feedback training.

Communication: We will use a forum (to be added in week 1). For urgent questions before then, please use cs234-win2526-staff@lists.stanford.edu.

  • Lectures will be live every Monday and Wednesday 3-4:20pm.
  • Office hours: Will be announced in the first week of class



Prerequisites for This Class

Learning Outcomes

By the end of the class students should be able to:

Course Lecture Materials (Videos and Slides)

See the Lecture Materials page.

Draft Course Schedule


Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Week 1 Jan 5 Jan 6 Jan 7 Jan 8 Jan 9 Jan 10 Jan 11
Lecture Materials
Introduction to RL Tabular MDP Planning
[Assignment 1 Released]
Week 2 Jan 12 Jan 13 Jan 14 Jan 15 Jan 16 Jan 17 Jan 18
Lecture Materials
Policy Evaluation Q-learning and function approximation Assignment 1 Due at 6pm
[Assignment 2 Released]
Week 3 Jan 19 Jan 20 Jan 21 Jan 22 Jan 23 Jan 24 Jan 25
Lecture Materials
Holiday Policy Search
Week 4 Jan 26 Jan 27 Jan 28 Jan 29 Jan 30 Jan 31 Feb 1
Lecture Materials
Policy Search Offline RL, Imitation Learning Assignment 2 Due at 6pm
Week 5 Feb 2 Feb 3 Feb 4 Feb 5 Feb 6 Feb 7 Feb 8
Lecture Materials Offline RL, RLHF Midterm (in class) [Assignment 3 Released] Project Proposal Due
Week 6 Feb 9 Feb 10 Feb 11 Feb 12 Feb 13 Feb 14 Feb 15
Lecture Materials Offline RL / Bandits Strategic data gathering / Exploration
Week 7 Feb 16 Feb 17 Feb 18 Feb 19 Feb 20 Feb 21 Feb 22
Lecture Materials Holiday Exploration Assignment 3 Due at 6pm
Week 8 Feb 23 Feb 24 Feb 25 Feb 26 Feb 27 Feb 28 Mar 1
Lecture Materials Exploration RL and MCTS
Project Milestone Due
Week 9 Mar 2 Mar 3 Mar 4 Mar 5 Mar 6 Mar 7 Mar 8
Lecture Materials Guest Lecture In Class Quiz
Week 10 Mar 9 Mar 10 Mar 11 Mar 12 Mar 13 Mar 14 Mar 15
Lecture Materials Alignment, Impacts Poster Session
Week 11 Mar 16 Mar 17 Mar 18 Mar 19 Mar 20 Mar 21 Mar 22
Lecture Materials Final Project Writeup Due at 6pm

Textbooks

There is no official textbook for the class but a number of the supporting readings will come from: Some other additional references that may be useful are listed below:

Grade Breakdown (Will be Updated First Week of Class)

Late Day Policy

Exams

Assignments and Submission Process


Communication

We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Ed for all questions related to lectures and assignments.

For SCPD students, if you have generic SCPD specific questions, please email scpdsupport@stanford.edu or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular class, please contact us at cs234-win2526-staff@lists.stanford.edu.

For exceptional circumstances that require us to make special arrangements, please email us at cs234-win2526-staff@lists.stanford.edu. For example, such a situation may arise if a student requires extra days to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date due to events such as conference travel etc. They will be considered and approved on a case by case basis.

Regrading Requests

Academic Collaboration, AI Tools Usage and Misconduct

I care about academic collaboration and misconduct because it is important both that we are able to evaluate your own work (independent of your peer’s) and because not claiming others’ work as your own is an important part of integrity in your future career. I understand that different institutions and locations can have different definitions of what forms of collaborative behavior is considered acceptable. In this class, for written homework problems, you are welcome to discuss ideas with others, but you are expected to write up your own solutions independently (without referring to another’s solutions). For coding, you may only share the input-output behavior of your programs. This encourages you to work separately but share ideas on how to test your implementation. Please remember that if you share your solution with another student, even if you did not copy from another, you are still violating the honor code. Consistent with this, it is also considered an honor code violation if you make your assignment solutions publicly available, such as posting them online or in a public git repo.

We may run similarity-detection software over all submitted student programs, including programs from past quarters and any solutions found online on public websites. Anyone violating the Stanford University Honor Code will be referred to the Office of Judicial Affairs. If you think you made a mistake (it can happen, especially under stress or when time is short!), please reach out to Emma or the head CA; the consequences will be much less severe than if we approach you. We expect all students to submit their own solutions to CS234 homeworks, exams and quizzes, and for projects. You are permitted to use generative AI tools such as Gemini, GPT-4 and Co-Pilot in the same way that human collaboration is considered acceptable: you are not allowed to directly ask for solutions or copy code, and you should indicate if you have used generative AI tools. Similar to human collaboration help, you are ultimately responsible and accountable for your own work. We may check students' homework, exams and projects to enforce this policy.

Note that it is not acceptable to list a LLM as a collaborator on the project milestone or final report: as things stand, generative AI cannot accept fault or responsibility, and thus cannot be a collaborator in a final project.

Academic Accommodation

If you need an academic accommodation based on the impact of a disability, please share your Office of Accessible Education letter with us via an email to our course staff list cs234-win2526-staff@lists.stanford.edu as soon as it is convenient for you. This helps us ensure the course materials and staff support can comply with your needs. The OAS is located at 563 Salvatierra Walk (650-723-1066, http://studentaffairs.stanford.edu/oae).

Credit/No Credit Enrollment

If you're enrolled in the class on credit/no credit status, you will be graded on work as usual per standard Stanford rules. The only distinction with those taking the class for letter grade is that you must obtain a C- (C minus) grade or higher in the class, for you to be marked as CR. In past years, the threshold for obtaining a C- is typically 70%.