Logistic regression

web.stanford.edu/class/stats305b

Jonathan Taylor

Winter 2022

Models for binary outcomes

options(repr.plot.width=5, repr.plot.height=4)

Binary outcomes

Logit transform

g = function(p) {
  return(log(p / (1 - p)))
}
p = seq(0.01,0.99,length=200)
plot(p, g(p), lwd=2, type='l', col='red')

g.inv <- function(x) {
  return(exp(x) / (1 + exp(x)))
}

x = seq(g(0.01), g(0.99), length=200)
plot(x, g.inv(x), lwd=2, type='l', col='red')

Logistic regression model

\[ \eta_i = \text{logit}(\pi_i) \]

\[ - \log L(\eta|Y) = \sum_{i=1}^n - Y_i \eta_i + \log(1 + e^{\eta_i}) \]

\[ - \log L(\beta|y) = -(X\beta)^TY + \sum_{i=1}^n \log(1 + \exp(X_i^T\beta)) \]

Logistic regression

Logistic regression

\[ -\log L(\eta|Y) = \sum_{i=1}^n \Lambda_B(\eta_i) -\eta_i Y_i \]

with \(\Lambda_B(\eta)\) the cumulant generating function (CGF) of Bernoulli

\[ \exp(\Lambda(\eta)) = \mathbb{E}_0\left(\exp(\eta^TW) \right) \]

where \(\mathbb{P}_0\) is the law under \(\eta=0\). In our case

\[ \Lambda(\eta) = \sum_{i=1}^n \Lambda_B(\eta_i). \]

Logistic regression

Interpretation of coefficients: Odds Ratios

\[OR_{X_j} = \frac{ODDS(Y=1|\dots, X_j=x_j+1, \dots)}{ODDS(Y=1|\dots, X_j=x_j, \dots)} = e^{\beta_j} \]

Rare disease hypothesis

Other (linear) binary models

\[ Y_i = \begin{cases} 1 & T_i \leq X_i^T\beta \\ 0 & \text{otherwise.} \end{cases} \]

\[ P(Y_i=1 | X_i) = P(T_i \leq X_i^T\beta|X_i) = F(X_i^T\beta). \]

Modelling probabilities

Horseshoe crabs

library(glmbb) # for `crabs` data
data(crabs)
head(crabs)
##    color spine width satell weight y
## 1 medium   bad  28.3      8   3050 1
## 2   dark   bad  22.5      0   1550 0
## 3  light  good  26.0      9   2300 1
## 4   dark   bad  24.8      0   2100 0
## 5   dark   bad  26.0      4   2600 1
## 6 medium   bad  23.8      0   2100 0
width_glm = glm(y ~ width, family=binomial(), data=crabs)
summary(width_glm)
## 
## Call:
## glm(formula = y ~ width, family = binomial(), data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0281  -1.0458   0.5480   0.9066   1.6942  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -12.3508     2.6287  -4.698 2.62e-06 ***
## width         0.4972     0.1017   4.887 1.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 194.45  on 171  degrees of freedom
## AIC: 198.45
## 
## Number of Fisher Scoring iterations: 4
anova(glm(y ~ 1, family=binomial(), data=crabs), width_glm)
## Analysis of Deviance Table
## 
## Model 1: y ~ 1
## Model 2: y ~ width
##   Resid. Df Resid. Dev Df Deviance
## 1       172     225.76            
## 2       171     194.45  1   31.306
c(225.76-194.45, 4.887^2)
## [1] 31.31000 23.88277
summary(glm(y ~ width, data=crabs)) # same fit as `lm`
## 
## Call:
## glm(formula = y ~ width, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8614  -0.4495   0.1569   0.3674   0.7061  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.76553    0.42136  -4.190 4.46e-05 ***
## width        0.09153    0.01597   5.731 4.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1951497)
## 
##     Null deviance: 39.780  on 172  degrees of freedom
## Residual deviance: 33.371  on 171  degrees of freedom
## AIC: 212.26
## 
## Number of Fisher Scoring iterations: 2
summary(lm(y ~ width, data=crabs))  # dispersion is \hat{\sigma}^2
## 
## Call:
## lm(formula = y ~ width, data = crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8614 -0.4495  0.1569  0.3674  0.7061 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.76553    0.42136  -4.190 4.46e-05 ***
## width        0.09153    0.01597   5.731 4.42e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4418 on 171 degrees of freedom
## Multiple R-squared:  0.1611, Adjusted R-squared:  0.1562 
## F-statistic: 32.85 on 1 and 171 DF,  p-value: 4.415e-08
0.4418^2
## [1] 0.1951872

Logistic regression and \(I \times 2\) tables

\[ Y_i = N_{i1}, \qquad 1 \leq i \leq I \]

\[ Y_i \sim \text{Binomial}(N_{i+}, \pi_i) \]

\[ \text{logit}(\pi_i) = \alpha + \beta_i \]

# Table 3.8
absent = c(17066, 14464, 788, 126, 37)
present = c(48, 38, 5, 1, 1)
total = absent + present
scores = c(0, 0.5, 1.5, 4.0, 7.0)
proportion = present / total
alcohol = rbind(absent, present)
chisq.test(alcohol)
## Warning in chisq.test(alcohol): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  alcohol
## X-squared = 12.082, df = 4, p-value = 0.01675
lr_stat = function(data_table) {
    chisq_test = chisq.test(data_table)
    return(2 * sum(data_table * log(data_table / chisq_test$expected)))
}
lr_stat(alcohol)
## Warning in chisq.test(data_table): Chi-squared approximation may be incorrect
## [1] 6.201998
alcohol_scores = glm(proportion ~ factor(scores), family=binomial(), 
                     weights=total)
summary(alcohol_scores)
## 
## Call:
## glm(formula = proportion ~ factor(scores), family = binomial(), 
##     weights = total)
## 
## Deviance Residuals: 
## [1]  0  0  0  0  0
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -5.87364    0.14454 -40.637   <2e-16 ***
## factor(scores)0.5 -0.06819    0.21743  -0.314   0.7538    
## factor(scores)1.5  0.81358    0.47134   1.726   0.0843 .  
## factor(scores)4    1.03736    1.01431   1.023   0.3064    
## factor(scores)7    2.26272    1.02368   2.210   0.0271 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.2020e+00  on 4  degrees of freedom
## Residual deviance: 1.8163e-13  on 0  degrees of freedom
## AIC: 28.627
## 
## Number of Fisher Scoring iterations: 4
anova(glm(proportion ~ 1, family=binomial(), weights=total), 
      glm(proportion ~ factor(scores), family=binomial(), weights=total))
## Analysis of Deviance Table
## 
## Model 1: proportion ~ 1
## Model 2: proportion ~ factor(scores)
##   Resid. Df Resid. Dev Df Deviance
## 1         4      6.202            
## 2         0      0.000  4    6.202

A 1 df test

summary(glm(proportion ~ scores, family=binomial(), weights=total))
## 
## Call:
## glm(formula = proportion ~ scores, family = binomial(), weights = total)
## 
## Deviance Residuals: 
##       1        2        3        4        5  
##  0.5921  -0.8801   0.8865  -0.1449   0.1291  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -5.9605     0.1154 -51.637   <2e-16 ***
## scores        0.3166     0.1254   2.523   0.0116 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.2020  on 4  degrees of freedom
## Residual deviance: 1.9487  on 3  degrees of freedom
## AIC: 24.576
## 
## Number of Fisher Scoring iterations: 4
anova(glm(proportion ~ 1, family=binomial(), weights=total), 
      glm(proportion ~ scores, family=binomial(), weights=total)) # LR test
## Analysis of Deviance Table
## 
## Model 1: proportion ~ 1
## Model 2: proportion ~ scores
##   Resid. Df Resid. Dev Df Deviance
## 1         4     6.2020            
## 2         3     1.9487  1   4.2533
data.frame(df.1=pchisq(4.25, 1, lower.tail=FALSE),
           df.4=pchisq(6.2, 4, lower.tail=FALSE)) 
##         df.1      df.4
## 1 0.03925033 0.1847017

Deviance

Instead of sum of squares, logistic regression uses deviance:

\[DEV(\pi| Y) = -2 \log L(\pi| Y) + 2 \log L(\hat{\pi}_S| Y) \]

where \(\pi\) is a location estimator for \(Y\) with \(\hat{\pi}_S\) the saturated fit (i.e. no contraints on \(\pi\)’s).

\[ \hat{\pi}_{S,i} = \frac{Y_i}{{\tt total}_i} \]

\[DEV(\pi| Y) = -2 \sum_{i=1}^n \left( Y_i \log(\pi_i) + (1-Y_i) \log(1-\pi_i) \right) \]

\[ \begin{aligned} DEV(\pi| Y) &= -2 \sum_{i=1}^n \biggl( Y_i \log(\pi_i / \hat{\pi}_{S,i}) + \\ & \qquad ({\tt total}_i -Y_i) \log((1-\pi_i)/(1-\hat{\pi}_{S,i}) \biggr) \\ &= -2 \sum_{i=1}^n {\tt total}_i \biggl( \hat{\pi}_i \log(\pi_i/\hat{\pi}_{S,i}) + \\ & \qquad (1-\hat{\pi}_{S,i}) \log((1-\pi_i)/(1-\hat{\pi}_{S,i}) \biggr) \end{aligned} \]

with \(\hat{\pi}_{S,i} = Y_i / {\tt total}_i\) is the MLE under the saturated model.

Deviance for logistic regression

\[\begin{aligned} DEV(\beta| Y) &= -2 \sum_{i=1}^n \left( Y_i \cdot {\text{logit}}(\pi_i(\beta)) + \log(1-\pi_i(\beta)) \right) \end{aligned} \]

\[ \begin{aligned} -2 \left[ (X\beta)^Ty - \sum_{i=1}^n\log \left(1 + \exp \left(\sum_{j=1}^p X_{ij} \beta_j\right) \right)\right] \end{aligned} \]

Fitting the logistic model

Derivatives of log-likelihood

\[ \begin{aligned} \nabla(-\log L(\beta|Y)) &= - \left[X^T\left(y - \frac{\exp(X\beta)}{1 + \exp(X\beta)} \right) \right] \\ &= - \left[X^T\left(y - E_{\beta}[Y|X]\right) \right] \end{aligned} \]

\[ \begin{aligned} \nabla^2 (-\log L(\beta|Y)) &= X^T\frac{\exp(X\beta)}{(1 + \exp(X\beta))^2}X \\ &= X^T \text{Var}_{\beta}(Y|X) X. \end{aligned} \]

\[ \begin{aligned} E_{\beta}[Y|X] &= \frac{\exp(X\beta)}{1 + \exp(X\beta)} = \pi_{\beta}(X) \in \mathbb{R}^n \\ \text{Var}_{\beta}[Y|X] &= \text{diag}\left(E_{\beta}[Y|X] (1 - E_{\beta}[Y|X]) \right) \\ &= \text{diag}(\pi_{\beta}(X)(1 - \pi_{\beta}(X)) \\ &= W_{\beta}(X). \end{aligned} \]

\[ \Lambda(\beta) = \sum_{i=1}^n\log \left(1 + \exp \left(\sum_{j=1}^p X_{ij} \beta_j\right) \right) \]

Newton-Raphson

  1. Given initial \(\hat{\beta}^{(0)}\)

  2. Until convergence \[ \hat{\beta}^{(i+1)} = \hat{\beta}^{(i)} - (X^T W_{\hat{\beta}^{(i)}}(X)X)^{-1}X^T(Y - \pi_{\hat{\beta}^{(i)}}(X)) \]

Fitting the logistic model

Stationarity

Inference

Inference

\[ \hat{\beta}-\beta^* \approx N\left(0, \left(X^TW_{\beta^*}(X) X\right)^{-1}\right) \]

(Plugin estimator of variance appeals to Slutsky, consistency of \(\hat{\beta}\).)

Where do we really use the model?

\[ \hat{\beta} \approx N\left(\beta^*, \left(E_{\beta^*}[\nabla^2 (-\log L(\beta^*|Y))]\right)^{-1} \right) \]

center = coef(width_glm)['width']
SE = sqrt(vcov(width_glm)['width', 'width'])
U = center + SE * qnorm(0.975)
L = center + SE * qnorm(0.025)
data.frame(L, center, U)
##               L    center         U
## width 0.2978326 0.4972306 0.6966286

These are slightly different from what R will give you if you ask it for confidence intervals:

confint(width_glm)
## Waiting for profiling to be done...
##                   2.5 %     97.5 %
## (Intercept) -17.8100090 -7.4572470
## width         0.3083806  0.7090167

What is logistic regression even estimating?

\[ \beta \mapsto E_F\left[-(X^T\beta) Y + \log\left(1 + \exp(X^T\beta) \right) \right] \]

\[ \hat{\beta}-\beta^*(F) \approx \left(X^TW_{\beta^*(F)}(X) X\right)^{-1}X^T(Y - \pi_{\beta^*(F)}(X)) \]

\[ \begin{aligned} n^{-1/2} X^T(Y - \pi_{\beta^*(F)}(X)) & \overset{D}{\to} N(0, \Sigma(F)) \\ n^{-1}X^TW_{\beta^*(F)}(X) X &\overset{P}{\to} E_F[XX^T \cdot W_{\beta^*(F)}(X)] \\ & \overset{\text{def}}{=} Q(F) \end{aligned} \]

Testing in logistic regression

What about comparing full and reduced model?

\[\frac{1}{\sigma^2}\left( SSE({\cal M}_R) - SSE({\cal M}_F) \right) \sim \chi^2_{df_R-df_F} \]

\[DEV({\cal M}_R) - DEV({\cal M}_F) \overset{n \rightarrow \infty}{\sim} \chi^2_{df_R-df_F} \]

bigger_glm = glm(y ~ width + weight, data=crabs, family=binomial())
anova(width_glm, bigger_glm)
## Analysis of Deviance Table
## 
## Model 1: y ~ width
## Model 2: y ~ width + weight
##   Resid. Df Resid. Dev Df Deviance
## 1       171     194.45            
## 2       170     192.89  1   1.5608

We should compare this difference in deviance with a \(\chi^2_1\) random variable.

1 - pchisq(1.56,1)
## [1] 0.2116652

Let’s compare this with the Wald test:

summary(bigger_glm)
## 
## Call:
## glm(formula = y ~ width + weight, family = binomial(), data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1127  -1.0344   0.5304   0.9006   1.7207  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -9.3547261  3.5280465  -2.652  0.00801 **
## width        0.3067892  0.1819473   1.686  0.09177 . 
## weight       0.0008338  0.0006716   1.241  0.21445   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 192.89  on 170  degrees of freedom
## AIC: 198.89
## 
## Number of Fisher Scoring iterations: 4

Grouped goodness of fit tests (5.2.5)

library(glmbb) # for `crabs` data
data(crabs)
width_glm = glm(y ~ width, family=binomial(), data=crabs)

Is the logistic model a good fit?

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
crabs$groups = cut(crabs$width, breaks=c(0, seq(23.25, 29.25, by=1), max(crabs$width)))
crabs$fits = fitted(width_glm)
results = crabs %>%
          group_by(groups) %>%
          summarize(N=n(), numY = sum(y), numN = N - numY, 
                    fittedY=sum(fits), fittedN=N-fittedY)
## `summarise()` ungrouping output (override with `.groups` argument)
results
## # A tibble: 8 x 6
##   groups          N  numY  numN fittedY fittedN
##   <fct>       <int> <int> <int>   <dbl>   <dbl>
## 1 (0,23.2]       14     5     9    3.64  10.4  
## 2 (23.2,24.2]    14     4    10    5.31   8.69 
## 3 (24.2,25.2]    28    17    11   13.8   14.2  
## 4 (25.2,26.2]    39    21    18   24.2   14.8  
## 5 (26.2,27.2]    22    15     7   15.9    6.06 
## 6 (27.2,28.2]    24    20     4   19.4    4.62 
## 7 (28.2,29.2]    18    15     3   15.7    2.35 
## 8 (29.2,33.5]    14    14     0   13.1    0.918
G2 = 2 * (sum(results$numY * log(results$numY / results$fittedY)) +
          sum(results$numN * log((results$numN + 1.e-10) / results$fittedN)))
c(G2, pchisq(G2, 8 - 2, lower.tail=FALSE)) # why df=6?
## [1] 6.1793623 0.4034009
X2 = (sum((results$numY - results$fittedY)^2 / results$fittedY) +
      sum((results$numN - results$fittedN)^2 / results$fittedN))
c(X2, pchisq(X2, 8 - 2, lower.tail=FALSE)) 
## [1] 5.320099 0.503461

Hosmer-Lemeshow

library(ResourceSelection)
## ResourceSelection 0.3-5   2019-07-22
hoslem.test(crabs$y, crabs$fits, g=10)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  crabs$y, crabs$fits
## X-squared = 4.3855, df = 8, p-value = 0.8208

Diagnostics in logistic regression (6.2)

Residuals

Residuals

Hat matrix and leverage

Hat matrix and leverage

Hat matrix and leverage

\[ H_{\beta^*} = W_{\beta^*}^{1/2}X(X^TW_{\beta^*}X)^{-1}X^TW_{\beta^*} ^{1/2}, \]

approximated by

\[ H_{\hat{\beta}} = W_{\hat{\beta}}^{1/2}X(X^TW_{\hat{\beta}}X)^{-1}X^TW_{\hat{\beta}}^{1/2} \]

Standardized residuals

\[ r_i = \frac{e_i}{ \sqrt{1 - H_{\hat{\beta},ii}}}. \]

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
outlierTest(width_glm)
## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
##    rstudent unadjusted p-value Bonferroni p
## 94 -2.05045           0.040321           NA
pearson_resid = (crabs$y - crabs$fits) / sqrt(crabs$fits * (1 - crabs$fits))
plot(resid(width_glm, type='pearson'), pearson_resid, pch=23, col='red')

dev_resid = sign(crabs$y - crabs$fits) * 
     sqrt(- 2 * (crabs$y * log(crabs$fits) + (1 - crabs$y) * log(1 - crabs$fits)))
plot(resid(width_glm), dev_resid, pch=23, col='red')

par(mfrow=c(2,2))
plot(width_glm)

Analog of \(R^2\) (6.3)

anova(width_glm)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: y
## 
## Terms added sequentially (first to last)
## 
## 
##       Df Deviance Resid. Df Resid. Dev
## NULL                    172     225.76
## width  1   31.306       171     194.45
c((225.76 - 194.45) / 225.76, 
  1 - width_glm$deviance / width_glm$null.deviance)
## [1] 0.1386871 0.1386697

Influence measures

library(car)
head(influence.measures(width_glm)$infmat)
##         dfb.1_   dfb.wdth       dffit    cov.r      cook.d         hat
## 1 -0.047610662 0.04996645  0.06040203 1.020937 0.001132128 0.012344511
## 2 -0.104054950 0.10079960 -0.11366303 1.032553 0.004233338 0.025717205
## 3 -0.004535225 0.00948021  0.07499977 1.009632 0.002017305 0.007084208
## 4 -0.061618758 0.05535952 -0.11116748 1.007620 0.005034932 0.010062537
## 5 -0.004535225 0.00948021  0.07499977 1.009632 0.002017305 0.007084208
## 6 -0.094872323 0.08993360 -0.11878283 1.018836 0.005119353 0.016600380

Confusion matrix

crabs$predicted.label = fitted(width_glm) >= 0.5
table(crabs$predicted.label, crabs$y)
##        
##          0  1
##   FALSE 27 16
##   TRUE  35 95

Confusion matrix

ROC curve

library(ROCR)
crabs_perf_data = prediction(crabs$fits, crabs$y)
crabs_perf = performance(crabs_perf_data, 'tpr', 'fpr')
plot(crabs_perf)
performance(crabs_perf_data, 'auc')@y.values # AUC
## [[1]]
## [1] 0.7424441
abline(0, 1, lty=2)

Model selection (6.1)

\[ \begin{aligned} AIC(M) &= -2 \log L(M) + 2 \cdot p(M)\\ &= DEV(M) + 2 \cdot p(M) + g(Y) \\ \end{aligned} \]

or \(BIC(M)\) (2 replaced by \(\log n\)).

crabs_factor = crabs_orig # makes a copy
crabs_factor$color = factor(crabs_factor$color)
crabs_factor$spine = factor(crabs_factor$spine)
factorAIC_model = bestglm(crabs_factor, family=binomial(), IC="AIC")
## Morgan-Tatar search since family is non-gaussian.
## Note: factors present with more than 2 levels.
factorAIC_model
## AIC
## Best Model:
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## color         3   3.24   1.079   5.685 0.000988 ***
## width         1   4.66   4.659  24.548 1.76e-06 ***
## Residuals   168  31.88   0.190                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC_model
## AIC
## BICq equivalent for q in (2.09431094677637e-06, 0.847412646439335)
## Best Model:
##                Estimate Std. Error   z value     Pr(>|z|)
## (Intercept) -12.3508177  2.6287179 -4.698419 2.621832e-06
## width         0.4972306  0.1017355  4.887482 1.021340e-06
factorAIC_model
## AIC
## Best Model:
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## color         3   3.24   1.079   5.685 0.000988 ***
## width         1   4.66   4.659  24.548 1.76e-06 ***
## Residuals   168  31.88   0.190                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tryCatch(bestglm(crabs_factor, family=binomial(), IC="CV"), error=function(e) { geterrmessage()})
## [1] "Cross-validation not available when there are categorical variables with more than 2 levels!"
CV_model = bestglm(crabs_orig, family=binomial(), IC="CV") # note that this is not usual cross-validation -- read the help
## Morgan-Tatar search since family is non-gaussian.
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
CV_model
## CVd(d = 132, REP = 1000)
## BICq equivalent for q in (2.09431094677637e-06, 0.847412646439335)
## Best Model:
##                Estimate Std. Error   z value     Pr(>|z|)
## (Intercept) -12.3508177  2.6287179 -4.698419 2.621832e-06
## width         0.4972306  0.1017355  4.887482 1.021340e-06