CS329H: Machine Learning from Human Preferences

Content

Machine learning from human preferences investigates mechanisms for capturing human and societal preferences and values in artificial intelligence (AI) systems and applications, e.g., for socio-technical applications such as algorithmic fairness and many language and robotics tasks when reward functions are otherwise challenging to specify quantitatively. While learning from human preferences has emerged as an increasingly important component of modern AI, e.g., credited with advancing the state of the art in language modeling and reinforcement learning, existing approaches are largely reinvented independently in each subfield, with limited connections drawn among them.

This course will cover the foundations of learning from human preferences from first principles and outline connections to the growing literature on the topic. This includes but is not limited to:

Inverse reinforcement learning, which uses human preferences to specify the reinforcement learning reward function
Metric elicitation, which uses human preferences to specify tradeoffs for cost-sensitive classification
Reinforcement learning from human feedback, where human preferences are used to align a pre-trained language model

This is a graduate-level course. By the end of the course, students should be able to understand and implement state-of-the-art learning from human feedback and be ready to research these topics. Given how fast this area is growing, this course will consist of weekly lectures, presentations, and discussions of papers led by students. Students will also complete a final course project. If you are a CS PhD student at Stanford, this course is counted toward the breadth requirement for "Learning and Modeling" or "Human and Society".

Logistics

Lectures are on Monday and Wednesday from 1:30 PM - 2:50 PM PT in Packard 101. This course is open to SCPD and it will be recorded.
Contact: Please post on Ed for any questions. For private matters that can not be handled via Ed, email sanmi [AT] cs [DOT] stanford [DOT] edu, sttruong [AT] cs [DOT] stanford [DOT] edu, or kenanhas [AT] stanford [DOT] edu.
Office Hours:
- Sanmi: 9:00 AM - 10:00 AM on Wednesday in Gates 332.
- Sang: 1:00 PM - 2:00 PM on Friday in Gates 234.
Auditor Policy: We welcome auditors in the class if space is available.
Slides: Link

Schedule

The current class schedule is below (subject to change). A tentative reading list can be found here.

Date	Theme	Topics	Deadline
Sep 23	-	Introduction
Sep 25	Preference Models	Lecture: Individual Preference Models
Sep 30		Lecture: Individual Preference Models
Oct 2		Student-led Discussion: Preference Models
Oct 7		Lecture: Aggregated Preference Models via Social Choice Theory & Game Theory
Oct 9		(Tentative) Guest lecture, Tan Zhi Xuan	Homework 1 release
Oct 14	Preference Measurement & Optimization	Lecture: Model-based Preferences Optimization via Active Learning
Oct 16		Lecture: Model-based Preferences Optimization via Preference Elicitation
Oct 21		Lecture: Model-free Preference Optimization via Dueling Bandits
Oct 23		Lecture: Model-free Preference Optimization via Bayesian Optimization
Oct 28		Lecture: Aggregated Preference Optimization via Mechanism Design	Homework 1 due Homework 2 release
Oct 30		Guest lecture: Joseph Jay Williams (U of T)
Nov 4		Guest lecture: Eytan Bashy
Nov 6		Student-led Discussion: Model-based and Model-free Preference Optimization
Nov 11	Value, Alignment, & Human-centered Design	Lecture: Value & Alignment	Homework 2 due Homework 3 release
Nov 13		Lecture: Human-centered Design
Nov 18		Guest lecture: Colin Megill
Nov 20		Student-led Discussion: Value, Alignment, & Human-centered Design
Nov 25		Thanksgiving	Homework 3 due Homework 4 release
Nov 27		Thanksgiving
Dec 2		Catch-up day (details TBD)
Dec 4		Project Expo
Dec 9		-	Homework 4 due
Dec 14		-	Final project due

Grading

Homework (60%)
Project (40%): Complete in groups (max 5 students). If working in a group, include a contribution statement for each deliverable. Aim for a project covering learning from human preferences suitable for submission to a machine learning conference.
Class participation (Extra credit up to 10%): Pull request to edit the book on GitHub, QA on Ed, and in-class participation.

CS329H: Machine Learning from Human Preferences

Autumn 2024

Previous Years: Autumn 2023

Content

Instructor

Teaching Fellow

Course Assistant

Logistics

Schedule

Grading