Human preference data has become crucial to the success of Machine Learning (ML) systems in many application domains, from personalization to post-training of language models. As ML systems are more and more widely deployed, understanding models, methods, and algorithms for learning from preference data becomes important for both scientists and practitioners. This course covers learning from preferences in supervised, active, and reinforcement/assistance settings, and covers aspects specific to preference data, such as preference heterogeneity and aggregation, interpretation of human feedback, and privacy. In coding tasks, students implement supervised reward modeling and assistance games. Prerequisites: Recommended CS 221 and CS 229.
If you are a CS PhD student at Stanford, this course is counted toward the breath requirement for "Learning and Modeling" or "Human and Society".
We will announce computing resources soon.
This class will be discussion-based, and requires attendance. After the first two weeks of class (10/6) until the end of substantive classes (11/17),
The current class schedule is below (subject to change).
| Date | Topic | Materials | Dates |
|---|---|---|---|
| 09/22 | Intro to Preference Modeling, and logistics | slides_01 📕Introduction | |
| 09/24 | Background I: Choice model theory, IIA, RUM, BT, Luce | slides_02 handout_02 📕Chapter 1 | |
| 09/29 | Background II: Implementing Choice Models | slides_03 notebook_03 | |
| 10/01 | Learning I: structure of utilities (Rasch, Thurstone, Bradley-Terry) | slides_04 torch-choice.ipynb torch-choice-recording 📕Chapter 2 | |
| 10/06 | Learning II: full-information MLE, Bayesian inference, online vs offline | slides_05 | Begin attendance policy Coding 1 released |
| 10/08 | Learning III: parametric families | slides_06 | Quiz guidance released |
| 10/13 | Elicitation I: measurement objective (Rasch vs Bradley-Terry), information gathering (Bayes vs Frequentist) | slides_07 notebook_07 📕Notes | Quiz 1 |
| 10/15 | Elicitation II: sequential optimal design (Rasch vs Bradley-Terry) | slides_08 | |
| 10/20 | Elicitation III: parametric family | slides_09 | Coding 1 due |
| 10/22 | Decision I: stateless dueling, parametric vs nonparametric | slides_10 📕Chapter 4 | Coding 2 released Quiz guidance updated |
| 10/27 | Decision II: stateful dueling, RLHF, LMarena | slides_11 | Quiz 2 |
| 10/29 | Decision III: asymptotic optimality (Rasch vs Bradley-Terry, Bayes vs Frequentist) | slides_12 | |
| 11/03 | Guest lecture: Cooperative Inverse Reinforcement Learning (with D. Hadfield-Menell) | gslides | |
| 11/05 | Aggregation I: The Paradox of Liberalism, non-nosy preferences, personalization, recommendation systems | slides_13 📕Chapter 5 | Pre-analysis plan due Coding 2 due |
| 11/10 | Aggregation II: nosy preferences: median voters, and community notes, and Borda | slides_14 | Coding 3 released |
| 11/12 | Aggregation III: A statistical perspective on aggregation | slides_15 | Quiz 3 |
| 11/17 | Guest lecture: Craig Boutilier | slides_16 | |
| 11/19 | Privacy and the Inversion Problem | slides_17 📕Chapter 7 | End attendance policy |
| 11/25 | Thanksgiving (no class) | Coding 3 due | |
| 11/28 | Thanksgiving (no class) | ||
| 12/01 | NeurIPS (no class) | ||
| 12/03 | NeurIPS (no class) | Project due |