Chapter 2: Choice Models

Human Preference Models:

Choice models

Sanmi Koyejo

All Sections

Export PDF

Today: Choice Modeling

Tools to predict the choice behavior of a group of decision-makers in a specific choice context.

Application: Marketing

What features affect a car purchase?

Application: Transportation

How pricing affects route choice
How much is a driver willing to pay

Image source: https://www.supplychain247.com/article/8_factors_to_consider_when_choosing_route_optimization_software/locus

Application: Energy Economics

Del Granado, Pedro Crespo, Renger H. Van Nieuwkoop, Evangelos G. Kardakos, and Christian Schaffner. "Modelling the energy transition: A nexus of energy system and economic models." Energy strategy reviews, 20 (2018): 229-235.

Example: Daily activity-travel pattern of an individual

Source: Chandra Bhat, “ General introduction to choice modeling”

Application: RL and Language

https://openai.com/research/learning-to-summarize-with-human-feedback

History

Thurstone research into food preferences in the 1920s
Microeconomics: Random Utility Theory (1970s)
- McFadden: Nobel prize in 2000 for the theoretical basis for discrete choice.
Psychology: Duncan Luce and Anthony Marley
- Luce, R. Duncan (1959). “Conditional logit analysis of qualitative choice behavior”
Early use in marketing
- Predict demand for new products that are potentially expensive to produce
Early use in transportation
- Predict usage of transportation resources, e.g., used by McFadden to predict the demand for the Bay Area Rapid Transit (BART) before it was built

Why are we studying choice models?

Human preferences are often gathered by asking for choices across alternatives
Basic choice models are the workhorse for ML from preferences (Bradley-Terry, Plackett Luce)
Our discussion will highlight some of the key assumptions, e.g., utility and rationality
- We will cover models originally built for discrete/finite choices, which have been extended to ML applications (conditional choices)

(Discrete) choice models

Models designed to capture decision-process of individuals
True utility is not observable, but perhaps can measure via preferences over choices
Main assumption: utility (benefit, or value) that an individual derives from item A over item B is a function of the frequency that they choose item A over item B in repeated choices.
Useful Note: “Utility” in choice models <=> “Reward” in RL

Modeling: Discrete choice

Choices are collectively exhaustive, mutually exclusive, and finite

$$y_{ni} = \begin{cases} 1, & \text{if } U_{ni} > U_{nj} \ \forall j \neq i \\ 0, & \text{otherwise} \end{cases}$$

$$U_{ni} = H_{ni}(z_{ni})$$

$z_{n,i}$ are variables describing the individual attributes and the alternative choices
$H_{ni}(z_{ni})$ is a stochastic function, e.g., linear $H_{ni}(z_{ni}) = \beta z_{ni} + \epsilon_{ni}$, where $\epsilon_{ni}$ are unobserved individual factors

Implications of the choice model

Only the utility differences matter

$$\begin{aligned} P_{ni} &= Pr(y_{ni} = 1) \\ &= Pr(U_{ni} > U_{nj}, \forall j \neq i) \\ &= Pr(U_{ni} - U_{nj} > 0, \forall j \neq i) \end{aligned}$$

Note that utility here is scale-free
- May be invariant to monotonic transformations
- Ok within a single context, but will need to normalize for comparing across datasets
- Common approach: normalize scale by standardizing the variance

Example: Binary choice with individual attributes

Benefit of action depends on $s_n$ = individual characteristics

$$\begin{cases} U_n = \beta s_n + \epsilon_n \\ y_n = \begin{cases} 1 & U_n > 0 \\ 0 & U_n \leq 0 \end{cases} \end{cases} \quad \Rightarrow \quad P_{n1} = \frac{1}{1 + \exp(-\beta s_n)}$$

$\epsilon \sim$ Logistic
Replacing $\epsilon \sim$ Standard Normal gives the probit model

$$P_{n1} = \Phi(\beta s_n)$$

Where $\Phi(.)$ is the normal CDF

Example: Utility is linear function of variables that vary over alternatives (Bradley-Terry Model)

The utility of each alternative depends on the attributes of the alternatives (which may include individual attributes)
Unobserved terms are assumed to have an extreme value distribution

$$\begin{cases} U_{n1} = \beta z_{n1} + \epsilon_{n1} \\ U_{n2} = \beta z_{n2} + \epsilon_{n2} \\ \epsilon_{n1}, \epsilon_{n2} \sim \text{iid extreme value} \end{cases} \quad \Rightarrow \quad P_{n1} = \frac{\exp(\beta z_{n1})}{\exp(\beta z_{n1}) + \exp(\beta z_{n2})}$$

Equivalently $P_{n1} = \frac{1}{1 + \exp(-\beta (z_{n1} - z_{n2}))}$
Can replace noise with Standard Normal $P_{n1} = \Phi(\beta (z_{n1} - z_{n2}))$

Example: Utility for each alternative depends on attributes of that alternative

Unobserved terms are assumed to have an extreme value distribution
With $J$ alternatives

$$\begin{cases} U_{ni} = \beta z_{ni} + \epsilon_{ni} \\ \epsilon_{ni} \sim \text{iid extreme value} \end{cases} \quad \Rightarrow \quad P_{ni} = \frac{\exp(\beta z_{ni})}{\sum_{j=1}^{J} \exp(\beta z_{nj})}$$

Compare to standard model for multiclass classification (multiclass logistic)
Can also replace noise model with Gaussians

Capturing correlations across alternatives

All the prior models use the logistic model which does not capture correlations in noise.
This can be fixed using a joint distribution over the noise e.g.,

$$\begin{cases} U_{ni} = \beta z_{ni} + \epsilon_{ni} \\ \epsilon_n \equiv (\epsilon_{n1}, \cdots, \epsilon_{nJ}) \sim N(0, \Omega) \end{cases}$$

Estimation

Linear case: maximum likelihood estimators
- Logistic model: use (binary or multinomial) logistic regression
- Gaussian Model: use probit regression
More complex function classes: use standard ML fitting tools for (regularized) maximum likelihood, e.g., stochastic gradient descent (SGD)
Standard tradeoffs, e.g., bias-variance tradeoff
- More complex utility models generally require more data
- Most ML applications pool the model across individuals, individual differences may matter (more on this in future class)

What of measuring ordered preferences?

Example: On a 1-5 scale where 1 means disagree completely and 5 means agree completely, how much do you agree with the following statement: “I am enjoying this class so far”
Use ordinal regression, e.g.,

$$U_n = H_n(z_n) \quad \quad \quad y_n = \begin{cases} 1, & \text{if } U_n < a \\ 2, & \text{if } a < U_n < b \\ 3, & \text{if } b < U_n < c \\ 4, & \text{if } c < U_n < d \\ 5, & \text{if } U_n > d \end{cases}$$

For some real numbers $a, b, c, d$ (parameters)

Ordered Logit

For linear utility: $U_n = \beta z_n + \epsilon$, $\epsilon \sim$ Logistic

$Pr(\text{choosing 1}) = Pr(U_n < a) = Pr(\epsilon < a - \beta z_n) = \frac{1}{1 + \exp(-(a - \beta z_n))}$

$$\begin{aligned} Pr(\text{choosing 2}) &= Pr(a < U_n < b) = Pr(a - \beta z_n < \epsilon < b - \beta z_n) \\ &= \frac{1}{1 + \exp(-(b - \beta z_n))} - \frac{1}{1 + \exp(-(a - \beta z_n))} \end{aligned}$$

$$...$$

$Pr(\text{choosing 5}) = Pr(U_n > d) = Pr(\epsilon > d - \beta z_n) = 1 - \frac{1}{1 + \exp(-(d - \beta z_n))}$

Can also replace with Gaussian for ordered probit regression

Plackett-Luce Model

Ranking models the sequence of choices (Plackett and Luce in 1970s)
Probability of choice 1, 2, …, J is

$Pr(\text{ranking } 1, 2, \dots, J) = \frac{\exp(\beta z_1)}{\sum_{j=1}^{J} \exp(\beta z_{nj})} \cdot \frac{\exp(\beta z_2)}{\sum_{j=2}^{J} \exp(\beta z_{nj})} \cdots \frac{\exp(\beta z_{J-1})}{\sum_{j=J-1}^{J} \exp(\beta z_{nj})}$

PL is common in biomedical literature
aka rank ordered logit (econometrics ~1980s), or exploded logit model
All the extensions mentioned also apply (nonlinear utility, correlated noise, etc.)

Modeling and estimation summary

Choose the utility model, i.e., how the attributes and alternatives define the utility e.g., linear function of attributes with logistic noise
Choose the response/observation model, e.g., binary, multiple choice, ordered choice.
Fit the model using (regularized) maximum likelihood

Aside: “Revealed preference” vs “stated preference”

Revealed preference: Use observed data about the choices to estimate value ascribed to items.
- Generally offline observational data about real choices
Stated Preference: Use the choices made by individuals under experimental conditions to estimate these values
- Generally online experimental data (can include controlled experiments)
Revealed preference is considered a “real” choice, so can be more accurate
- In simulated situations, participants may not respond well to hypotheticals
- OTOH: observed data may not cover the space, hence the appeal of experiments

Exercise (inclass): choice model for class(es)

“Should you take CS 329H or not?”
- What are the attributes/features (describe what to measure about a class)?
- What utility model?
- What is the observation/response model?
- Revealed preference (observed choices) or stated preference (hypothetical)?
“Should you take CS 329H or CS 221 or CS 229?”
- What are the attributes/features?
- What utility model?
- What is the observation/response model?
- Revealed preference or stated preference?

Exercise (inclass): choice model for language

Design a choice model to evaluate the quality of a language model?

What utility model?
- What are the attributes/features?
What is the observation/response model?
Revealed preference or stated preference?
Who should you query?
- Individual or pooled responses: why or why not?
What are some pro/cons of your design?

References

Train, K. (1986). Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand. MIT Press. ISBN 9780262200554. Chapter 8.
McFadden, D.; Train, K. (2000). "Mixed MNL Models for Discrete Response" (PDF). Journal of Applied Econometrics. 15 (5): 447–470.
Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley.
Additional:
- Ben-Akiva, M.; Lerman, S. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand. Transportation Studies. Massachusetts: MIT Press.
- Park, Byeong U.; Simar, Léopold; Zelenyuk, Valentin (2017). "Nonparametric estimation of dynamic discrete choice models for time series data" (PDF). Computational Statistics & Data Analysis. 108: 97–120. doi:10.1016/j.csda.2016.10.024.
- Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." arXiv preprint arXiv:2305.18290 (2023).