We’ve spent a lot of time on binary / Binomial regression models.
Sometimes response has more than two categories: need a model for multinomial responses.
Solution? Why not look at multinomial as exponential family?
Mass function of \(Y \sim \text{Multinomial}(N,\pi)\) for \(\pi \in S_K\) the simplex in \(\mathbb{R}^K\). \[ f(y_1, \dots, y_K|\pi) = \binom{n}{y_1,\dots,y_K} \prod_{j=1}^K \pi_j^{y_j}, \qquad \sum_{j=1}^Ky_j=N. \]
Density with respect to counting measure on \[ \left\{(z_1, \dots, z_K) \in \mathbb{Z}^K: \sum_{j=1}^K z_j = n, z_j \geq 0 \right\} \]
We fix \(K\) as a baseline category.
Setting \(\pi_K = 1 - \sum_{j=1}^{K-1} \pi_j\), \(y_K = N - \sum_{j=1}^{K-1} y_j\) yields equivalent representation
\[ f(y_1, \dots, y_{K-1}|\pi) = \binom{N}{y_1,\dots,y_K} \left(\prod_{j=1}^{K-1} \pi_j^{y_j} \right) \left(1 - \sum_{j=1}^{K-1} \pi_j\right)^{N - \sum_{j=1}^{K-1}y_j} \]
with the constraint \(\sum_{j=1}^{K-1}y_j\leq N.\)
\[ \begin{aligned} \log L(\pi|Y) &= \sum_{j=1}^{K-1} y_j \cdot \log \left(\frac{\pi_j}{1 - \sum_{l=1}^{K-1}\pi_l}\right) + \\ & \qquad N \cdot \log\left(1 - \sum_{l=1}^{K-1}\pi_l \right) \end{aligned} \]
\[ \begin{aligned} \eta_j &= \log \left(\frac{\pi_j}{1 - \sum_{l=1}^{K-1}\pi_l}\right), \qquad 1 \leq j \leq K-1 \\ \pi_j &= \frac{e^{\eta_j}}{1 + \sum_{l=1}^{K-1} e^{\eta_l}}, 1 \leq j \leq K-1 \end{aligned} \]
Log-likelihood in natural parameters \[ \begin{aligned} \log L(\eta|Y) &= \sum_{j=1}^{K-1} y_j \cdot \eta_j - N \cdot \log\left(1 + \sum_{l=1}^{K-1} e^{\eta_l} \right) \end{aligned} \]
Note: parametrization here depends on baseline category which we chose to be level \(K\).
Suppose we now observe \((Y_i, X_i)\) where each \(Y_i|X_i \sim \text{Multinomial}(N_i, \pi_i)\).
Natural parametrization above leads to the baseline category logit model
\[ \eta_{ij} = X[i,]^T\beta[,j] \]
or
\[ \eta = X\beta \]
with \(\beta \in \mathbb{R}^{p \times (K-1)}\) and \(\eta \in \mathbb{R}^{n \times (K-1)}\).
(Intercept) is baked into \(X\)).\[ \pi_{i,j} = \pi_{i,j}(\beta) = \frac{\exp(X[i,]^T\beta[,j])}{1 + \sum_{l=1}^{K-1} \exp(X[i,]^T\beta[,l])}. \]
Data is represented as \(Y \in (\mathbb{Z}^+)^{n \times (K-1)}\) (dropping baseline \(K\) column)
Likelihood for \(\beta\) is
\[ \begin{aligned} \log L(\beta|Y) &= \text{Tr}((X\beta)^TY) \\ & \qquad - \sum_{i=1}^n N_i \log\left(1 + \sum_{l=1}^{K-1} \exp(X[i,]^T\beta[,l]) \right) \end{aligned} \]
\[ \nabla \log L(\beta|Y) = X^T(Y - E_{\beta}(Y)) \in \mathbb{R}^{p \times (K-1)} \]
\[ E_{\beta}(Y)[i,j] = N_i \cdot \frac{\exp(X[i,]^T\beta[,j])}{1 + \sum_{l=1}^K \exp(X[i,]^T\beta[,l])} \]
\[ \frac{\partial^2}{\partial \beta_{ij} \partial \beta_{kl}} \log L(\beta|Y) = -\sum_{c=1}^n X_{ci}X_{ck} \text{Cov}_{\beta}(Y_c)_{jl} \]
where
\[ \text{Cov}_{\beta}(Y_c)_{jl} = N_i \left(\text{diag}(\pi_i(\beta)) - \pi_i(\beta)\pi_i(\beta)^T\right)_{jl} \]
\[ -\nabla^2 \log L(\hat{\beta}|Y)^{-1}. \]
prog in "academic", "general", "vocational" from scores on a writing test write and ses.## 'data.frame': 200 obs. of 13 variables:
## $ id : num 45 108 15 67 153 51 164 133 2 53 ...
## $ female : Factor w/ 2 levels "male","female": 2 1 1 1 1 2 1 1 2 1 ...
## $ ses : Factor w/ 3 levels "low","middle",..: 1 2 3 1 2 3 2 2 2 2 ...
## $ schtyp : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 1 1 ...
## $ prog : Factor w/ 3 levels "general","academic",..: 3 1 3 3 3 1 3 3 3 3 ...
## $ read : num 34 34 39 37 39 42 31 50 39 34 ...
## $ write : num 35 33 39 37 31 36 36 31 41 37 ...
## $ math : num 41 41 44 42 40 42 46 40 33 46 ...
## $ science: num 29 36 26 33 39 31 39 34 42 39 ...
## $ socst : num 26 36 42 32 51 39 46 31 41 31 ...
## $ honors : Factor w/ 2 levels "not enrolled",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ awards : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cid : int 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "datalabel")= chr "highschool and beyond (200 cases)"
## - attr(*, "time.stamp")= chr "30 Oct 2009 14:13"
## - attr(*, "formats")= chr [1:13] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
## - attr(*, "types")= int [1:13] 254 254 254 254 254 254 254 254 254 254 ...
## - attr(*, "val.labels")= chr [1:13] "" "fl" "sl" "scl" ...
## - attr(*, "var.labels")= chr [1:13] "" "" "" "type of school" ...
## - attr(*, "version")= int 8
## - attr(*, "label.table")=List of 5
## ..$ sl : Named int [1:3] 1 2 3
## .. ..- attr(*, "names")= chr [1:3] "low" "middle" "high"
## ..$ scl : Named int [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "public" "private"
## ..$ sel : Named int [1:3] 1 2 3
## .. ..- attr(*, "names")= chr [1:3] "general" "academic" "vocation"
## ..$ fl : Named int [1:2] 0 1
## .. ..- attr(*, "names")= chr [1:2] "male" "female"
## ..$ honlab: Named int [1:2] 0 1
## .. ..- attr(*, "names")= chr [1:2] "not enrolled" "enrolled"
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 179.985215
## final value 179.981726
## converged
## Call:
## nnet::multinom(formula = prog ~ ses + write, data = ml)
##
## Coefficients:
## (Intercept) sesmiddle seshigh write
## academic -2.851973 0.5332914 1.1628257 0.05792480
## vocation 2.366097 0.8246384 0.1802176 -0.05567514
##
## Std. Errors:
## (Intercept) sesmiddle seshigh write
## academic 1.166437 0.4437319 0.5142215 0.02141092
## vocation 1.174251 0.4901237 0.6484508 0.02333135
##
## Residual Deviance: 359.9635
## AIC: 375.9635
## [1] 2 4
## academic:(Intercept) academic:sesmiddle academic:seshigh
## academic:(Intercept) 1.360576234 -0.0939555923 -0.0598081985
## academic:sesmiddle -0.093955592 0.1968980069 0.1225345572
## academic:seshigh -0.059808198 0.1225345572 0.2644237520
## academic:write -0.023832283 -0.0005216217 -0.0011823607
## vocation:(Intercept) 0.692815375 -0.0494419508 -0.0217374274
## vocation:sesmiddle -0.027178350 0.1050952297 0.0589096999
## vocation:seshigh 0.007910847 0.0584068716 0.1651121953
## vocation:write -0.012833466 -0.0001497438 -0.0007211086
## academic:write vocation:(Intercept) vocation:sesmiddle
## academic:(Intercept) -0.0238322833 0.69281537 -0.0271783502
## academic:sesmiddle -0.0005216217 -0.04944195 0.1050952297
## academic:seshigh -0.0011823607 -0.02173743 0.0589096999
## academic:write 0.0004584275 -0.01258505 -0.0005960548
## vocation:(Intercept) -0.0125850549 1.37886605 -0.0905237580
## vocation:sesmiddle -0.0005960548 -0.09052376 0.2402212037
## vocation:seshigh -0.0012667250 -0.08265037 0.1549794008
## vocation:write 0.0002545341 -0.02583910 -0.0012868585
## vocation:seshigh vocation:write
## academic:(Intercept) 0.007910847 -0.0128334660
## academic:sesmiddle 0.058406872 -0.0001497438
## academic:seshigh 0.165112195 -0.0007211086
## academic:write -0.001266725 0.0002545341
## vocation:(Intercept) -0.082650366 -0.0258391029
## vocation:sesmiddle 0.154979401 -0.0012868585
## vocation:seshigh 0.420488501 -0.0014339382
## vocation:write -0.001433938 0.0005443519
## (Intercept) sesmiddle seshigh write
## academic -2.445028 1.201832 2.2613324 2.705386
## vocation 2.014984 1.682511 0.2779203 -2.386280
## (Intercept) sesmiddle seshigh write
## academic 0.01448407 0.22942845 0.02373868 0.00682251
## vocation 0.04390634 0.09246981 0.78107356 0.01701977
## # weights: 9 (4 variable)
## initial value 219.722458
## iter 10 value 185.510839
## iter 10 value 185.510838
## final value 185.510838
## converged
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 write 396 371.0217 NA NA NA
## 2 ses + write 392 359.9635 1 vs 2 4 11.05822 0.02591741
## # weights: 12 (6 variable)
## initial value 219.722458
## iter 10 value 195.705517
## final value 195.705188
## converged
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 ses 394 391.4104 NA NA NA
## 2 ses + write 392 359.9635 1 vs 2 2 31.44692 1.483841e-07
\[ U_{ij} = X[i,]^T{\beta}[,j] + \epsilon_{ij} \]
for some noise distribution \(\epsilon_{ij}\)’s independent across \(i\).
\[ Y_i = \text{argmax}_{1 \leq j \leq K} U_{ij} \]
If \(\epsilon_{ij}\) are independent extreme value (i.e. cloglog) then the resulting likelihood is equivalent to a baseline multinomial logit, essentially with parameters \(\beta[,j] - \beta[,K]\).
A more “natural” model is to assume \(\epsilon_{ij}\)’s are independent \(N(0,1)\) (or perhaps correlated within a subject?).
Leads to a multinomial probit model, though likelihood is rather complicated:
\[ \begin{aligned} \pi_j(X) &= P_{\beta}(Y=j|X) \\ &= P_{\beta}\left(X[i,]^T\beta[,j] + \epsilon_{ij} \geq \max_{l \neq j} X[i,]^T\beta[,l] + \epsilon_{il} \right) \end{aligned} \]
When levels of \(Y\) are ordinal, we might model things differently.
Cumulative model considers probabilities
\[ P(Y\leq j | X) = \sum_{l=1}^j \pi_l(X) \]
and models these as
\[ \text{logit} \left( P(Y \leq j | X) \right) = \alpha_j + \beta^TX, \qquad 1 \leq j \leq K-1. \]
\[ \sum_{i=1}^n \sum_{j=1}^{K} y_{ij} \log\left( \frac{\exp(\alpha_j + X[i,]^T\beta)}{1 + \exp(\alpha_j + X[i,]^T\beta)} - \frac{\exp(\alpha_{j-1} + X[i,]^T\beta)}{1 + \exp(\alpha_{j-1} + X[i,]^T\beta)} \right) \]
with \(\alpha_{-1}=-\infty, \alpha_K=\infty\).
\[ Y = \sum_{j=1}^K j \cdot 1_{[\alpha_{j-1}, \alpha_j]}(Y^*) \]
Obvious analogs when errors are not logistic, but have CDF \(G\).
Leads to log-likelihood
\[ \sum_{i=1}^n \sum_{j=1}^{K} y_{ij} \log\left( G(\alpha_j + X[i,]^T\beta) - G(\alpha_{j-1} + X[i,]^T\beta) \right) \]
Questionnaire has students rank “likelihood” of applying to graduate school: apply
We also collect gpa as well as pared (indicating whether or not at least one parent has a graduate degree) and public (is undergraduate institution public or private?)
More involved analysis here.
## 'data.frame': 400 obs. of 4 variables:
## $ apply : Factor w/ 3 levels "unlikely","somewhat likely",..: 3 2 1 2 2 1 2 2 1 2 ...
## $ pared : int 0 1 1 0 0 0 0 0 0 1 ...
## $ public: int 0 0 1 0 0 1 0 0 0 0 ...
## $ gpa : num 3.26 3.21 3.94 2.81 2.53 ...
## - attr(*, "datalabel")= chr ""
## - attr(*, "time.stamp")= chr " 3 Aug 2010 10:45"
## - attr(*, "formats")= chr [1:4] "%15.0g" "%9.0g" "%9.0g" "%9.0g"
## - attr(*, "types")= int [1:4] 251 251 251 254
## - attr(*, "val.labels")= chr [1:4] "apply" "" "" ""
## - attr(*, "var.labels")= chr [1:4] "" "" "" ""
## - attr(*, "version")= int 12
## - attr(*, "label.table")=List of 1
## ..$ apply: Named int [1:3] 0 1 2
## .. ..- attr(*, "names")= chr [1:3] "unlikely" "somewhat likely" "very likely"
library(MASS)
graduate_model = MASS::polr(apply ~ public + gpa + pared,
data=graduate_data,
Hess=TRUE)
summary(graduate_model)## Call:
## MASS::polr(formula = apply ~ public + gpa + pared, data = graduate_data,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## public -0.05879 0.2979 -0.1974
## gpa 0.61594 0.2606 2.3632
## pared 1.04769 0.2658 3.9418
##
## Intercepts:
## Value Std. Error t value
## unlikely|somewhat likely 2.2039 0.7795 2.8272
## somewhat likely|very likely 4.2994 0.8043 5.3453
##
## Residual Deviance: 717.0249
## AIC: 727.0249
## # weights: 15 (8 variable)
## initial value 439.444915
## iter 10 value 357.012275
## final value 356.996982
## converged
## [1] 729.9940 727.0249
## public gpa pared
## public 0.088721414 -0.01737218 -0.003060743
## gpa -0.017372183 0.06793006 -0.007018330
## pared -0.003060743 -0.00701833 0.070644021
## unlikely|somewhat likely -0.040918689 0.20088564 -0.010390883
## somewhat likely|very likely -0.041211341 0.20389232 -0.002951814
## unlikely|somewhat likely
## public -0.04091869
## gpa 0.20088564
## pared -0.01039088
## unlikely|somewhat likely 0.60769114
## somewhat likely|very likely 0.61323301
## somewhat likely|very likely
## public -0.041211341
## gpa 0.203892325
## pared -0.002951814
## unlikely|somewhat likely 0.613233012
## somewhat likely|very likely 0.646941420
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## public -0.6522060 0.5191384
## gpa 0.1076202 1.1309148
## pared 0.5281768 1.5721750
## Likelihood ratio tests of ordinal regression models
##
## Response: apply
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 gpa 397 732.6000
## 2 public + gpa + pared 395 717.0249 1 vs 2 2 15.57509 0.0004148703
\[ \log \frac{\pi_j(X)}{\pi_{j+1}(X)}, \qquad 1 \leq j \leq K-1 \]
\[ \log \frac{\pi_j(X)}{\pi_{j+1}(X)} = \alpha_j + X^T\beta. \]
Equivalent to a linear reparametrization of the baseline logits \[ \log \frac{\pi_j(X)}{\pi_{K}(X)} = \alpha_j^* + (K-j) \cdot X^T\beta, \qquad 1 \leq j \leq K-1 \] where \[ \alpha_j^* = \sum_{l=j}^{K-1} \alpha_l. \]
Negative log-likelihood is convex in \((\alpha, \beta)\)…
For cumulative logit models, conceptually simple to sample \((Y^*,\alpha,\beta)|X\).
Constraints on \((Y^*,\alpha,\beta)\) when \(Y_i=j\):
\[ \alpha_{j-1} + X[i,]^T\beta \leq Y_i^* \leq \alpha_j + X[i,]^T\beta \]
Each draw is from \(G\) truncated to some interval determined by \((\alpha, \beta)\).
For baseline logit models, if we model using \(Y=\text{argmax}(U)\) wiith latent \(U\), then a similar augmented data model is possible with variables \((\epsilon,\beta)\)
\[ X[i,]^T\beta[,j] + \epsilon_{ij} \geq \max_{l \neq j }X[i,]^T\beta[,l] + \epsilon_{il} \]
See here
Close to baseline model, except uses overparameterized model
\[ P(Y=j|X) = \frac{\exp(X^T\beta[,j])}{\sum_l \exp(X^T\beta[,l])} \]
while baseline model sets \(\beta[,K]=0\).
# pretty slow!
X_graduate = model.matrix(graduate_model)
graduate_stan = list(X=X_graduate,
N=nrow(X_graduate),
P=ncol(X_graduate),
K=3,
y=as.integer(graduate_data$apply))
system.time(grad_samples <- rstan::stan('multinom.stan',
data=graduate_stan,
iter=100,
chains=1))##
## SAMPLING FOR MODEL 'multinom' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.002443 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 24.43 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 7
## Chain 1: adapt_window = 38
## Chain 1: term_buffer = 5
## Chain 1:
## Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
## Chain 1: Iteration: 100 / 100 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 25.8162 seconds (Warm-up)
## Chain 1: 79.5667 seconds (Sampling)
## Chain 1: 105.383 seconds (Total)
## Chain 1:
## Warning: There were 41 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.3, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
## user system elapsed
## 131.405 2.664 680.143
## Inference for Stan model: multinom.
## 1 chains, each with iter=100; warmup=50; thin=1;
## post-warmup draws per chain=50, total post-warmup draws=50.
##
## mean se_mean sd 2.5% 25% 50% 75%
## beta[1,1] -1.95 0.44 1.08 -3.43 -2.86 -2.09 -0.76
## beta[1,2] -2.40 0.45 1.08 -3.89 -3.35 -2.55 -1.20
## beta[1,3] -3.66 0.44 1.08 -5.09 -4.57 -3.82 -2.44
## beta[2,1] 3.92 0.45 1.05 2.33 3.40 3.83 4.29
## beta[2,2] 3.92 0.45 1.05 2.34 3.39 3.85 4.29
## beta[2,3] 3.92 0.45 1.05 2.31 3.37 3.84 4.31
## beta[3,1] -3.27 0.30 0.84 -5.53 -3.73 -3.14 -2.64
## beta[3,2] -3.27 0.30 0.84 -5.52 -3.73 -3.16 -2.64
## beta[3,3] -3.26 0.30 0.84 -5.51 -3.74 -3.14 -2.65
## beta[4,1] -2.06 0.35 1.11 -4.99 -2.32 -1.96 -1.31
## beta[4,2] -2.06 0.35 1.11 -4.98 -2.34 -1.97 -1.30
## beta[4,3] -2.06 0.35 1.11 -4.98 -2.34 -1.96 -1.32
## lp__ -148245.20 0.42 2.03 -148249.62 -148246.20 -148244.83 -148243.54
## 97.5% n_eff Rhat
## beta[1,1] -0.14 6 0.98
## beta[1,2] -0.61 6 0.98
## beta[1,3] -1.90 6 0.98
## beta[2,1] 6.32 5 1.34
## beta[2,2] 6.34 6 1.33
## beta[2,3] 6.29 6 1.33
## beta[3,1] -2.29 8 1.27
## beta[3,2] -2.30 8 1.27
## beta[3,3] -2.26 8 1.27
## beta[4,1] -0.40 10 0.98
## beta[4,2] -0.40 10 0.98
## beta[4,3] -0.40 10 0.98
## lp__ -148242.60 24 1.13
##
## Samples were drawn using NUTS(diag_e) at Mon Jan 31 11:21:25 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
# also a bit slow!
system.time(ordered_samples <- rstan::stan('ordered_logit.stan',
data=graduate_stan,
iter=100,
chains=1))##
## SAMPLING FOR MODEL 'ordered_logit' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000552 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.52 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 7
## Chain 1: adapt_window = 38
## Chain 1: term_buffer = 5
## Chain 1:
## Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
## Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
## Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
## Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
## Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
## Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
## Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
## Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
## Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
## Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
## Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
## Chain 1: Iteration: 100 / 100 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 6.00502 seconds (Warm-up)
## Chain 1: 7.5355 seconds (Sampling)
## Chain 1: 13.5405 seconds (Total)
## Chain 1:
## Warning: There were 12 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.63, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
## user system elapsed
## 36.309 1.592 40.185
## Inference for Stan model: ordered_logit.
## 1 chains, each with iter=100; warmup=50; thin=1;
## post-warmup draws per chain=50, total post-warmup draws=50.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff
## beta[1] 46.23 15.10 31.00 -22.24 29.08 50.68 65.30 89.18 4
## beta[2] -0.08 0.04 0.33 -0.75 -0.24 -0.07 0.12 0.54 54
## beta[3] 0.67 0.05 0.29 0.12 0.46 0.70 0.88 1.12 28
## beta[4] 1.07 0.03 0.29 0.46 0.88 1.15 1.24 1.61 85
## cutpts[1] 48.60 15.14 30.97 -19.49 31.86 52.31 67.50 91.54 4
## cutpts[2] 50.73 15.13 30.91 -17.22 33.92 54.45 69.69 93.47 4
## lp__ -360.57 0.25 1.41 -363.08 -361.61 -360.31 -359.57 -358.57 31
## Rhat
## beta[1] 1.49
## beta[2] 1.01
## beta[3] 0.99
## beta[4] 1.01
## cutpts[1] 1.50
## cutpts[2] 1.50
## lp__ 1.01
##
## Samples were drawn using NUTS(diag_e) at Mon Jan 31 11:22:06 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).