Content

Human preference data has become crucial to the success of Machine Learning (ML) systems in many application domains, from personalization to post-training of language models. As ML systems are more and more widely deployed, understanding models, methods, and algorithms for learning from preference data becomes important for both scientists and practitioners. This course covers learning from preferences in supervised, active, and reinforcement/assistance settings, and covers aspects specific to preference data, such as preference heterogeneity and aggregation, interpretation of human feedback, and privacy. In coding tasks, students implement supervised reward modeling and assistance games. Prerequisites: Recommended CS 221 and CS 229.

If you are a CS PhD student at Stanford, this course is counted toward the breath requirement for "Learning and Modeling" or "Human and Society".

Instructor
Andy Haupt
Andreas Haupt
Instructor
Sanmi Koyejo
Sanmi Koyejo

Contact

Logistics

Computing

We will announce computing resources soon.

Attendance (waived for CGOE/HCP students)

This class will be discussion-based, and requires attendance. After the first two weeks of class (10/6) until the end of substantive classes (11/17),

Schedule

The current class schedule is below (subject to change).


Date Topic Materials Dates
09/22Intro to Preference Modeling, and logisticsslides_01
📕Introduction
09/24Background I: Choice model theory, IIA, RUM, BT, Luceslides_02
handout_02
📕Chapter 1
09/29Background II: Implementing Choice Modelsslides_03
notebook_03
10/01Learning I: structure of utilities (Rasch, Thurstone, Bradley-Terry)slides_04
torch-choice.ipynb
torch-choice-recording
📕Chapter 2
10/06Learning II: full-information MLE, Bayesian inference, online vs offlineslides_05Begin attendance policy
Coding 1 released
10/08Learning III: parametric familiesslides_06Quiz guidance released
10/13Elicitation I: measurement objective (Rasch vs Bradley-Terry), information gathering (Bayes vs Frequentist)slides_07
notebook_07
📕Notes
Quiz 1
10/15Elicitation II: sequential optimal design (Rasch vs Bradley-Terry)slides_08
10/20Elicitation III: parametric familyslides_09Coding 1 due
10/22Decision I: stateless dueling, parametric vs nonparametricslides_10
📕Chapter 4
Coding 2 released
Quiz guidance updated
10/27Decision II: stateful dueling, RLHF, LMarenaslides_11Quiz 2
10/29Decision III: asymptotic optimality (Rasch vs Bradley-Terry, Bayes vs Frequentist)slides_12
11/03Guest lecture: Cooperative Inverse Reinforcement Learning (with D. Hadfield-Menell)gslides
11/05Aggregation I: The Paradox of Liberalism, non-nosy preferences, personalization, recommendation systemsslides_13
📕Chapter 5
Pre-analysis plan due
Coding 2 due
11/10Aggregation II: nosy preferences: median voters, and community notes, and Bordaslides_14Coding 3 released
11/12Aggregation III: A statistical perspective on aggregationslides_15Quiz 3
11/17Guest lecture: Craig Boutilierslides_16
11/19Privacy and the Inversion Problemslides_17
📕Chapter 7
End attendance policy
11/25Thanksgiving (no class)Coding 3 due
11/28Thanksgiving (no class)
12/01NeurIPS (no class)
12/03NeurIPS (no class)Project due