Human Preference Models:
Choice models
Sanmi Koyejo
Today: Choice Modeling
Tools to predict the choice behavior of a group of decision-makers in a specific choice context.
Application: Marketing
What features affect a car purchase?
Application: Transportation
- How pricing affects route choice
- How much is a driver willing to pay
Image source: https://www.supplychain247.com/article/8_factors_to_consider_when_choosing_route_optimization_software/locus
Application: Energy Economics
Del Granado, Pedro Crespo, Renger H. Van Nieuwkoop, Evangelos G. Kardakos, and Christian Schaffner. "Modelling the energy transition: A nexus of energy system and economic models." Energy strategy reviews, 20 (2018): 229-235.
Example: Daily activity-travel pattern of an individual
Source: Chandra Bhat, “ General introduction to choice modeling”
Application: RL and Language
https://openai.com/research/learning-to-summarize-with-human-feedback
History
- Thurstone research into food preferences in the 1920s
- Microeconomics: Random Utility Theory (1970s)
- McFadden: Nobel prize in 2000 for the theoretical basis for discrete choice.
- Psychology: Duncan Luce and Anthony Marley
- Luce, R. Duncan (1959). “Conditional logit analysis of qualitative choice behavior”
- Early use in marketing
- Predict demand for new products that are potentially expensive to produce
- Early use in transportation
- Predict usage of transportation resources, e.g., used by McFadden to predict the demand for the Bay Area Rapid Transit (BART) before it was built
Why are we studying choice models?
- Human preferences are often gathered by asking for choices across alternatives
- Basic choice models are the workhorse for ML from preferences (Bradley-Terry, Plackett Luce)
- Our discussion will highlight some of the key assumptions, e.g., utility and rationality
- We will cover models originally built for discrete/finite choices, which have been extended to ML applications (conditional choices)
(Discrete) choice models
- Models designed to capture decision-process of individuals
- True utility is not observable, but perhaps can measure via preferences over choices
- Main assumption: utility (benefit, or value) that an individual derives from item A over item B is a function of the frequency that they choose item A over item B in repeated choices.
- Useful Note: “Utility” in choice models <=> “Reward” in RL
Modeling: Discrete choice
- Choices are collectively exhaustive, mutually exclusive, and finite
$$y_{ni} =
\begin{cases}
1, & \text{if } U_{ni} > U_{nj} \ \forall j \neq i \\
0, & \text{otherwise}
\end{cases}$$
$$U_{ni} = H_{ni}(z_{ni})$$
- $z_{n,i}$ are variables describing the individual attributes and the alternative choices
- $H_{ni}(z_{ni})$ is a stochastic function, e.g., linear $H_{ni}(z_{ni}) = \beta z_{ni} + \epsilon_{ni}$, where $\epsilon_{ni}$ are unobserved individual factors
Implications of the choice model
- Only the utility differences matter
$$\begin{aligned}
P_{ni} &= Pr(y_{ni} = 1) \\
&= Pr(U_{ni} > U_{nj}, \forall j \neq i) \\
&= Pr(U_{ni} - U_{nj} > 0, \forall j \neq i)
\end{aligned}$$
- Note that utility here is scale-free
- May be invariant to monotonic transformations
- Ok within a single context, but will need to normalize for comparing across datasets
- Common approach: normalize scale by standardizing the variance
Example: Binary choice with individual attributes
- Benefit of action depends on $s_n$ = individual characteristics
$$\begin{cases}
U_n = \beta s_n + \epsilon_n \\
y_n =
\begin{cases}
1 & U_n > 0 \\
0 & U_n \leq 0
\end{cases}
\end{cases} \quad \Rightarrow \quad P_{n1} = \frac{1}{1 + \exp(-\beta s_n)}$$
$$P_{n1} = \Phi(\beta s_n)$$
- Where $\Phi(.)$ is the normal CDF
Example: Utility is linear function of variables that vary over alternatives (Bradley-Terry Model)
- The utility of each alternative depends on the attributes of the alternatives (which may include individual attributes)
- Unobserved terms are assumed to have an extreme value distribution
$$\begin{cases}
U_{n1} = \beta z_{n1} + \epsilon_{n1} \\
U_{n2} = \beta z_{n2} + \epsilon_{n2} \\
\epsilon_{n1}, \epsilon_{n2} \sim \text{iid extreme value}
\end{cases} \quad \Rightarrow \quad P_{n1} = \frac{\exp(\beta z_{n1})}{\exp(\beta z_{n1}) + \exp(\beta z_{n2})}$$
Example: Utility for each alternative depends on attributes of that alternative
- Unobserved terms are assumed to have an extreme value distribution
- With $J$ alternatives
$$\begin{cases}
U_{ni} = \beta z_{ni} + \epsilon_{ni} \\
\epsilon_{ni} \sim \text{iid extreme value}
\end{cases} \quad \Rightarrow \quad P_{ni} = \frac{\exp(\beta z_{ni})}{\sum_{j=1}^{J} \exp(\beta z_{nj})}$$
- Compare to standard model for multiclass classification (multiclass logistic)
- Can also replace noise model with Gaussians
Capturing correlations across alternatives
- All the prior models use the logistic model which does not capture correlations in noise.
- This can be fixed using a joint distribution over the noise e.g.,
$$\begin{cases}
U_{ni} = \beta z_{ni} + \epsilon_{ni} \\
\epsilon_n \equiv (\epsilon_{n1}, \cdots, \epsilon_{nJ}) \sim N(0, \Omega)
\end{cases}$$
Estimation
- Linear case: maximum likelihood estimators
- Logistic model: use (binary or multinomial) logistic regression
- Gaussian Model: use probit regression
- More complex function classes: use standard ML fitting tools for (regularized) maximum likelihood, e.g., stochastic gradient descent (SGD)
- Standard tradeoffs, e.g., bias-variance tradeoff
- More complex utility models generally require more data
- Most ML applications pool the model across individuals, individual differences may matter (more on this in future class)
What of measuring ordered preferences?
- Example: On a 1-5 scale where 1 means disagree completely and 5 means agree completely, how much do you agree with the following statement: “I am enjoying this class so far”
- Use ordinal regression, e.g.,
$$U_n = H_n(z_n)
\quad \quad \quad
y_n =
\begin{cases}
1, & \text{if } U_n < a \\
2, & \text{if } a < U_n < b \\
3, & \text{if } b < U_n < c \\
4, & \text{if } c < U_n < d \\
5, & \text{if } U_n > d
\end{cases}$$
- For some real numbers $a, b, c, d$ (parameters)
Ordered Logit
- For linear utility: $U_n = \beta z_n + \epsilon$, $\epsilon \sim$ Logistic
$Pr(\text{choosing 1}) = Pr(U_n < a) = Pr(\epsilon < a - \beta z_n) = \frac{1}{1 + \exp(-(a - \beta z_n))}$
$$\begin{aligned}
Pr(\text{choosing 2}) &= Pr(a < U_n < b) = Pr(a - \beta z_n < \epsilon < b - \beta z_n) \\
&= \frac{1}{1 + \exp(-(b - \beta z_n))} - \frac{1}{1 + \exp(-(a - \beta z_n))}
\end{aligned}$$
$$...$$
$Pr(\text{choosing 5}) = Pr(U_n > d) = Pr(\epsilon > d - \beta z_n) = 1 - \frac{1}{1 + \exp(-(d - \beta z_n))}$
- Can also replace with Gaussian for ordered probit regression
Plackett-Luce Model
- Ranking models the sequence of choices (Plackett and Luce in 1970s)
- Probability of choice 1, 2, …, J is
$Pr(\text{ranking } 1, 2, \dots, J) = \frac{\exp(\beta z_1)}{\sum_{j=1}^{J} \exp(\beta z_{nj})} \cdot \frac{\exp(\beta z_2)}{\sum_{j=2}^{J} \exp(\beta z_{nj})} \cdots \frac{\exp(\beta z_{J-1})}{\sum_{j=J-1}^{J} \exp(\beta z_{nj})}$
- PL is common in biomedical literature
- aka rank ordered logit (econometrics ~1980s), or exploded logit model
- All the extensions mentioned also apply (nonlinear utility, correlated noise, etc.)
Modeling and estimation summary
- Choose the utility model, i.e., how the attributes and alternatives define the utility e.g., linear function of attributes with logistic noise
- Choose the response/observation model, e.g., binary, multiple choice, ordered choice.
- Fit the model using (regularized) maximum likelihood
Aside: “Revealed preference” vs “stated preference”
- Revealed preference: Use observed data about the choices to estimate value ascribed to items.
- Generally offline observational data about real choices
- Stated Preference: Use the choices made by individuals under experimental conditions to estimate these values
- Generally online experimental data (can include controlled experiments)
- Revealed preference is considered a “real” choice, so can be more accurate
- In simulated situations, participants may not respond well to hypotheticals
- OTOH: observed data may not cover the space, hence the appeal of experiments
Exercise (inclass): choice model for class(es)
- “Should you take CS 329H or not?”
- What are the attributes/features (describe what to measure about a class)?
- What utility model?
- What is the observation/response model?
- Revealed preference (observed choices) or stated preference (hypothetical)?
- “Should you take CS 329H or CS 221 or CS 229?”
- What are the attributes/features?
- What utility model?
- What is the observation/response model?
- Revealed preference or stated preference?
Exercise (inclass): choice model for language
Design a choice model to evaluate the quality of a language model?
- What utility model?
- What are the attributes/features?
- What is the observation/response model?
- Revealed preference or stated preference?
- Who should you query?
- Individual or pooled responses: why or why not?
- What are some pro/cons of your design?
References
- Train, K. (1986). Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. MIT Press. ISBN 9780262200554. Chapter 8.
- McFadden, D.; Train, K. (2000). "Mixed MNL Models for Discrete Response" (PDF). Journal of Applied Econometrics. 15 (5): 447–470.
- Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley.
- Additional:
- Ben-Akiva, M.; Lerman, S. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand. Transportation Studies. Massachusetts: MIT Press.
- Park, Byeong U.; Simar, Léopold; Zelenyuk, Valentin (2017). "Nonparametric estimation of dynamic discrete choice models for time series data" (PDF). Computational Statistics & Data Analysis. 108: 97–120. doi:10.1016/j.csda.2016.10.024.
- Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." arXiv preprint arXiv:2305.18290 (2023).