Models for binary outcomes¶

Data $(X_i, Y_i), 1 \leq i \leq n$.
Examples:
1. medical: presence or absence of cancer
2. financial: bankrupt or solvent
3. industrial: passes a quality control test or not
For $0-1$ responses we need to model $$\pi(x_1, \dots, x_p) = P(Y=1|X_1=x_1,\dots, X_p=x_p)$$

options(repr.plot.width=5, repr.plot.height=4)

Binary outcomes¶

Likelihood ($X$ considered fixed or likelihood considered conditional) $$ -\log L(\pi|Y) = \sum_{i=1}^n - Y_i \log\left(\frac{\pi_i}{1-\pi_i}\right) - \log(1 -\pi_i) $$
The function $$ \pi \mapsto \log\left(\frac{\pi}{1-\pi}\right) \overset{\text{def}}{=} \text{logit}(\pi) $$

Logit transform¶

Note $$ \text{logit}\left(\frac{e^{\eta}}{1+e^{\eta}}\right) = \eta. $$ where $$ F_{\text{logistic}}(\eta) = \frac{e^{\eta}}{1+e^{\eta}} $$ is the logistic distribution function.
So, $$ \text{logit} = F_{\text{logistic}}^{-1} $$

g = function(p) {
  return(log(p / (1 - p)))
}
p = seq(0.01,0.99,length=200)
plot(p, g(p), lwd=2, type='l', col='red')

g.inv <- function(x) {
  return(exp(x) / (1 + exp(x)))
}

x = seq(g(0.01), g(0.99), length=200)
plot(x, g.inv(x), lwd=2, type='l', col='red')

Logistic regression model¶

Write likelihood in natural parameters

\[ \eta_i = \text{logit}(\pi_i) \]

Then,

\[ - \log L(\eta|Y) = \sum_{i=1}^n - Y_i \eta_i + \log(1 + e^{\eta_i}) \]

Logistic regression model: $\eta_i=X_i^T\beta$ and

\[ - \log L(\beta|y) = -(X\beta)^TY + \sum_{i=1}^n \log(1 + \exp(X_i^T\beta)) \]

Logistic regression¶

In other words,$$ \text{logit}(\pi(x_1,\dots,x_p)) = x^T\beta. $$
Negative log-likelihood is convex in $\beta$ $\implies$ easy to add regularization penalties (and solve such problems…)
Default in most software for GLMs because(?) $\eta_i=X_i^T\beta$ (i.e. canonical link)

Logistic regression¶

How do we know it’s convex?
Negative log-likelihood of saturated model with parameters $\eta=(\eta_1, \dots, \eta_n)$

\[ -\log L(\eta|Y) = \sum_{i=1}^n \Lambda_B(\eta_i) -\eta_i Y_i \]

with $\Lambda_B(\eta)$ the cumulant generating function (CGF) of Bernoulli

For exponential family with sufficient statistic $W$ and natural parameters $\eta$

\[ \exp(\Lambda(\eta)) = \mathbb{E}_0\left(\exp(\eta^TW) \right) \]

where $\mathbb{P}_0$ is the law under $\eta=0$. In our case

\[ \Lambda(\eta) = \sum_{i=1}^n \Lambda_B(\eta_i). \]

Negative log-likelihood is $$ \eta \mapsto -\eta^TY + \Lambda(\eta) $$
Logistic regression imposes linear constraint on $\eta$: $\eta \in \text{col}(X)$.

Logistic regression¶

Logistic is “natural” because it is a linear model for natural parameters of Binomial data.
It can appear in other places (c.f. 5.1.5 of Agresti)
Suppose we observe observations from $$ \begin{aligned} Y &\sim \text{Bernoulli}(\pi) \\ X | Y \sim & \begin{cases} N(\mu_0, \Sigma) & Y = 0\\ N(\mu_1, \Sigma) & Y=1 \end{cases} \end{aligned} $$
Then, logistic model for $Y|X$ holds.

Another reason this model is popular relates to the odds ratio (c.f. 5.1.4 of Agresti)
If the logistic model is correct along with an assumption about sampling, it is possible to estimate parameters of $Y|X$ distribution in case-control studies where the actual randomness is $X|Y$.
This is similar to (2.2.6) of Agresti (Lung Cancer and Smoking) which we discussed earlier.

Interpretation of coefficients: Odds Ratios¶

Logistic model:

\[OR_{X_j} = \frac{ODDS(Y=1|\dots, X_j=x_j+1, \dots)}{ODDS(Y=1|\dots, X_j=x_j, \dots)} = e^{\beta_j}\]

If $X_j \in \{0, 1\}$ is dichotomous, then odds for group with $X_j = 1$ are $e^{\beta_j}$ higher, other parameters being equal.

Rare disease hypothesis¶

When incidence is rare, $P(Y=0)\approxeq 1$ no matter what the covariates $X_j$’s are.
In this case, odds ratios are almost ratios of probabilities: $$OR_{X_j} \approxeq \frac{{\mathbb{P}}(Y=1|\dots, X_j=x_j+1, \dots)}{{\mathbb{P}}(Y=1|\dots, X_j=x_j, \dots)}$$
Hypothetical example: in a lung cancer study, if $X_j$ is an indicator of smoking or not, a $\beta_j$ of 5 means for smoking vs. non-smoking means smokers are $e^5 \approx 150$ times more likely to develop lung cancer

Other (linear) binary models¶

Suppose there is some latent variable $T_i | X_i \sim F$ such that

\[\begin{split} Y_i = \begin{cases} 1 & T_i \leq X_i^T\beta \\ 0 & \text{otherwise.} \end{cases} \end{split}\]

Then,

\[ P(Y_i=1 | X_i) = P(T_i \leq X_i^T\beta|X_i) = F(X_i^T\beta). \]

Example: probit $F=\Phi$ (dbn of $N(0,1)$)
Setting $$ \eta_i = \text{logit}(F(X_i^T\beta)) $$ the likelihood is then $$ -\log L(\beta|Y) = \sum_{i=1}^n -Y_i \cdot \text{logit}(F(X_i^T\beta)) - \log(1 - F(X_i^T\beta)). $$

Parameters no longer interpretable in odds ratios but problem can still be convex (e.g. probit)

Modelling probabilities¶

Note: $\text{Var}(Y) = \pi ( 1 - \pi) = E(Y) \cdot ( 1- E(Y))$
The binary nature forces a relation between mean and variance of $Y$.
This makes logistic regression a Generalized Linear Model, use glm in R.

Horseshoe crabs¶

library(glmbb) # for `crabs` data
data(crabs)
head(crabs)

A data.frame: 6 × 6
	color	spine	width	satell	weight	y
	<fct>	<fct>	<dbl>	<int>	<int>	<int>
1	medium	bad	28.3	8	3050	1
2	dark	bad	22.5	0	1550	0
3	light	good	26.0	9	2300	1
4	dark	bad	24.8	0	2100	0
5	dark	bad	26.0	4	2600	1
6	medium	bad	23.8	0	2100	0

width_glm = glm(y ~ width, family=binomial(), data=crabs)
summary(width_glm)
anova(glm(y ~ 1, family=binomial(), data=crabs), width_glm)
c(225.76-194.45, 4.887^2)

Call:
glm(formula = y ~ width, family = binomial(), data = crabs)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0281  -1.0458   0.5480   0.9066   1.6942  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -12.3508     2.6287  -4.698 2.62e-06 ***
width         0.4972     0.1017   4.887 1.02e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.76  on 172  degrees of freedom
Residual deviance: 194.45  on 171  degrees of freedom
AIC: 198.45

Number of Fisher Scoring iterations: 4

A anova: 2 × 4
	Resid. Df	Resid. Dev	Df	Deviance
	<dbl>	<dbl>	<dbl>	<dbl>
1	172	225.7585	NA	NA
2	171	194.4527	1	31.30586

31.31
23.882769

summary(glm(y ~ width, data=crabs)) # same fit as `lm`

Call:
glm(formula = y ~ width, data = crabs)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8614  -0.4495   0.1569   0.3674   0.7061  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.76553    0.42136  -4.190 4.46e-05 ***
width        0.09153    0.01597   5.731 4.42e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.1951497)

    Null deviance: 39.780  on 172  degrees of freedom
Residual deviance: 33.371  on 171  degrees of freedom
AIC: 212.26

Number of Fisher Scoring iterations: 2

summary(lm(y ~ width, data=crabs))  # dispersion is \hat{\sigma}^2
0.4418^2

Call:
lm(formula = y ~ width, data = crabs)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8614 -0.4495  0.1569  0.3674  0.7061 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.76553    0.42136  -4.190 4.46e-05 ***
width        0.09153    0.01597   5.731 4.42e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4418 on 171 degrees of freedom
Multiple R-squared:  0.1611,	Adjusted R-squared:  0.1562 
F-statistic: 32.85 on 1 and 171 DF,  p-value: 4.415e-08

0.19518724

Logistic regression and $I \times 2$ tables¶

For $I \times 2$ tables $N=(N_{ij})$, set

\[ Y_i = N_{i1}, \qquad 1 \leq i \leq I \]

Under independent rows sampling,

\[ Y_i \sim \text{Binomial}(N_{i+}, \pi_i) \]

Under Poisson sampling, same result holds if we condition on $N_{i+}$.
Model

\[ \text{logit}(\pi_i) = \alpha + \beta_i \]

Like a one-way ANOVA regression model
Need some identifiability constraint (R often sets one $\beta$ to 0).
Not really using a logistic assumption here – just parameterizing $\pi_i$ a certain way: model is saturated.

# Table 3.8
absent = c(17066, 14464, 788, 126, 37)
present = c(48, 38, 5, 1, 1)
total = absent + present
scores = c(0, 0.5, 1.5, 4.0, 7.0)
proportion = present / total
alcohol = rbind(absent, present)
chisq.test(alcohol)

Warning message in chisq.test(alcohol):
“Chi-squared approximation may be incorrect”

	Pearson's Chi-squared test

data:  alcohol
X-squared = 12.082, df = 4, p-value = 0.01675

lr_stat = function(data_table) {
    chisq_test = chisq.test(data_table)
    return(2 * sum(data_table * log(data_table / chisq_test$expected)))
}
lr_stat(alcohol)

Warning message in chisq.test(data_table):
“Chi-squared approximation may be incorrect”

6.20199790716322

alcohol_scores = glm(proportion ~ factor(scores), family=binomial(), 
                     weights=total)
summary(alcohol_scores)

Call:
glm(formula = proportion ~ factor(scores), family = binomial(), 
    weights = total)

Deviance Residuals: 
[1]  0  0  0  0  0

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -5.87364    0.14454 -40.637   <2e-16 ***
factor(scores)0.5 -0.06819    0.21743  -0.314   0.7538    
factor(scores)1.5  0.81358    0.47134   1.726   0.0843 .  
factor(scores)4    1.03736    1.01431   1.023   0.3064    
factor(scores)7    2.26272    1.02368   2.210   0.0271 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6.2020e+00  on 4  degrees of freedom
Residual deviance: 1.8163e-13  on 0  degrees of freedom
AIC: 28.627

Number of Fisher Scoring iterations: 4

anova(glm(proportion ~ 1, family=binomial(), weights=total), 
      glm(proportion ~ factor(scores), family=binomial(), weights=total))

A anova: 2 × 4
	Resid. Df	Resid. Dev	Df	Deviance
	<dbl>	<dbl>	<dbl>	<dbl>
1	4	6.201998e+00	NA	NA
2	0	1.816325e-13	4	6.201998

A 1 df test¶

In this case, scores is ordered, so we might fit model $$ \text{logit}(\pi_i) = \alpha + \beta \cdot \text{scores}_i. $$
Perhaps more power because direction of deviation from null is specified.

summary(glm(proportion ~ scores, family=binomial(), weights=total))

Call:
glm(formula = proportion ~ scores, family = binomial(), weights = total)

Deviance Residuals: 
      1        2        3        4        5  
 0.5921  -0.8801   0.8865  -0.1449   0.1291  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.9605     0.1154 -51.637   <2e-16 ***
scores        0.3166     0.1254   2.523   0.0116 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6.2020  on 4  degrees of freedom
Residual deviance: 1.9487  on 3  degrees of freedom
AIC: 24.576

Number of Fisher Scoring iterations: 4

anova(glm(proportion ~ 1, family=binomial(), weights=total), 
      glm(proportion ~ scores, family=binomial(), weights=total)) # LR test
data.frame(df.1=pchisq(4.25, 1, lower.tail=FALSE),
           df.4=pchisq(6.2, 4, lower.tail=FALSE)) 

A anova: 2 × 4
	Resid. Df	Resid. Dev	Df	Deviance
	<dbl>	<dbl>	<dbl>	<dbl>
1	4	6.201998	NA	NA
2	3	1.948721	1	4.253277

A data.frame: 1 × 2
df.1	df.4
<dbl>	<dbl>
0.03925033	0.1847017

Deviance¶

Instead of sum of squares, logistic regression uses deviance:

Defined as

\[DEV(\pi| Y) = -2 \log L(\pi| Y) + 2 \log L(\hat{\pi}_S| Y)\]

where $\pi$ is a location estimator for $Y$ with $\hat{\pi}_S$ the saturated fit (i.e. no contraints on $\pi$’s).

For Binomial data $Y_i \sim \text{Binom}({\tt total}_i,\pi_i)$

\[ \hat{\pi}_{S,i} = \frac{Y_i}{{\tt total}_i} \]

If $Y$ is a binary vector, with mean vector $\pi$ then

\[DEV(\pi| Y) = -2 \sum_{i=1}^n \left( Y_i \log(\pi_i) + (1-Y_i) \log(1-\pi_i) \right)\]

If $Y_i \sim \text{Binom}({\tt total}_i,\pi_i)$ then

\[\begin{split} \begin{aligned} DEV(\pi| Y) &= -2 \sum_{i=1}^n \biggl( Y_i \log(\pi_i / \hat{\pi}_{S,i}) + \\ & \qquad ({\tt total}_i -Y_i) \log((1-\pi_i)/(1-\hat{\pi}_{S,i}) \biggr) \\ &= -2 \sum_{i=1}^n {\tt total}_i \biggl( \hat{\pi}_i \log(\pi_i/\hat{\pi}_{S,i}) + \\ & \qquad (1-\hat{\pi}_{S,i}) \log((1-\pi_i)/(1-\hat{\pi}_{S,i}) \biggr) \end{aligned} \end{split}\]

with $\hat{\pi}_{S,i} = Y_i / {\tt total}_i$ is the MLE under the saturated model.

Fit using weights=total in glm.

Deviance for logistic regression¶

For any binary regression model, the deviance is:

\[\begin{aligned} DEV(\beta| Y) &= -2 \sum_{i=1}^n \left( Y_i \cdot {\text{logit}}(\pi_i(\beta)) + \log(1-\pi_i(\beta)) \right) \end{aligned}\]

For the logistic model, the RHS is:

\[ \begin{aligned} -2 \left[ (X\beta)^Ty - \sum_{i=1}^n\log \left(1 + \exp \left(\sum_{j=1}^p X_{ij} \beta_j\right) \right)\right] \end{aligned}\]

The logistic model is special in that $\text{logit}(\pi_i(\beta))=X_i^T\beta$.

Fitting the logistic model¶

Derivatives of log-likelihood¶

Let $Y \in \{0,1\}^n, X \in \mathbb{R}^{n \times p}$.
Gradient / Score

\[\begin{split} \begin{aligned} \nabla(-\log L(\beta|Y)) &= - \left[X^T\left(y - \frac{\exp(X\beta)}{1 + \exp(X\beta)} \right) \right] \\ &= - \left[X^T\left(y - E_{\beta}[Y|X]\right) \right] \end{aligned} \end{split}\]

Hessian

\[\begin{split} \begin{aligned} \nabla^2 (-\log L(\beta|Y)) &= X^T\frac{\exp(X\beta)}{(1 + \exp(X\beta))^2}X \\ &= X^T \text{Var}_{\beta}(Y|X) X. \end{aligned} \end{split}\]

Above,

\[\begin{split} \begin{aligned} E_{\beta}[Y|X] &= \frac{\exp(X\beta)}{1 + \exp(X\beta)} = \pi_{\beta}(X) \in \mathbb{R}^n \\ \text{Var}_{\beta}[Y|X] &= \text{diag}\left(E_{\beta}[Y|X] (1 - E_{\beta}[Y|X]) \right) \\ &= \text{diag}(\pi_{\beta}(X)(1 - \pi_{\beta}(X)) \\ &= W_{\beta}(X). \end{aligned} \end{split}\]

Derivatives related to moments as this is an exponential family (conditional on $X$) with natural parameter $\beta$, sufficient statistic $X^TY$ and cumulant generating function

\[ \Lambda(\beta) = \sum_{i=1}^n\log \left(1 + \exp \left(\sum_{j=1}^p X_{ij} \beta_j\right) \right) \]

Newton-Raphson¶

Given initial $\hat{\beta}^{(0)}$
Until convergence $$ \hat{\beta}^{(i+1)} = \hat{\beta}^{(i)} - (X^T W_{\hat{\beta}^{(i)}}(X)X)^{-1}X^T(Y - \pi_{\hat{\beta}^{(i)}}(X)) $$

Fitting the logistic model¶

Stationarity¶

MLE satisfies $$ X^T(Y - \pi_{\hat{\beta}}(X))) = 0 $$

Inference¶

Suppose model is true, with true parameter $\beta^*$. Then, $$ E_{\beta^*}[X^T(Y - \pi_{\beta^*}(X))] = 0 $$
Taylor series $$ \pi_{\hat{\beta}}(X) \approx \pi_{\beta^*}(X) + W_{\beta^*}(X) X(\hat{\beta}-\beta^*) $$
How big is remainder?

Inference¶

Stationarity conditions then read $$ X^T(Y - E_{\beta^*}[Y|X]) \approx X^TW_{\beta^*}(X) X(\hat{\beta}-\beta^*) $$
Or, $$ \begin{aligned} \hat{\beta}-\beta^* &\approx \left(X^TW_{\beta^*}(X) X\right)^{-1}X^T(Y - E_{\beta^*}[Y|X]) \\ &= (X^TW_{\beta^*}(X)X)^{-1}X^T(Y-\pi_{\beta^*}(X)) \end{aligned} $$
Implies $$ \begin{aligned} \text{Var}_{\beta^*}(\hat{\beta}-\beta^*) &\approx \left(X^TW_{\beta}^*(X) X\right)^{-1} \text{Var}_{\beta^*}\left(X^T(Y - \pi_{\beta^*}(X))\right) \left(X^TW_{\beta^*}(X) X\right)^{-1} \\ &\approx \left(X^T W_{\beta^*}(X) X\right)^{-1} \\ \end{aligned} $$ (Last line assumes the model is true.)

Leads to

\[ \hat{\beta}-\beta^* \approx N\left(0, \left(X^TW_{\beta^*}(X) X\right)^{-1}\right) \]

(Plugin estimator of variance appeals to Slutsky, consistency of $\hat{\beta}$.)

Where do we really use the model?¶

Under the logistic model$$ \begin{aligned} \text{Var}_{\beta^*}(X^T(Y-\pi_{\beta}^*(X)) & \approx X^TW_{\beta^*}(X)X. \end{aligned} $$
Really just using the standard asymptotic distribution of the MLE:

\[ \hat{\beta} \approx N\left(\beta^*, \left(E_{\beta^*}[\nabla^2 (-\log L(\beta^*|Y))]\right)^{-1} \right) \]

For logistic regression, note that (conditional on $X$) $$ E_{\beta^*}[\nabla^2 (-\log L(\beta^*|Y))] = \nabla^2 (-\log L(\beta^*|Y)). $$

center = coef(width_glm)['width']
SE = sqrt(vcov(width_glm)['width', 'width'])
U = center + SE * qnorm(0.975)
L = center + SE * qnorm(0.025)
data.frame(L, center, U)

A data.frame: 1 × 3
	L	center	U
	<dbl>	<dbl>	<dbl>
width	0.2978326	0.4972306	0.6966286

These are slightly different from what R will give you if you ask it for confidence intervals:

confint(width_glm)

Waiting for profiling to be done...

A matrix: 2 × 2 of type dbl
	2.5 %	97.5 %
(Intercept)	-17.8100090	-7.4572470
width	0.3083806	0.7090167

What is logistic regression even estimating?¶

Before appealing to general properties of MLE, we had only really used the assumption the model was true once.
What can we say if the model is not true?
Suppose $(Y_i,X_i) \overset{IID}{\sim} F, 1 \leq i \leq n$.

Define the population parameter $\beta^*=\beta^*(F)$ by $$ E_F[X(Y - \pi_{\beta^*(F)}(X))] = 0. $$
This is minimizer of

\[ \beta \mapsto E_F\left[-(X^T\beta) Y + \log\left(1 + \exp(X^T\beta) \right) \right] \]

Taylor expansion yields

\[ \hat{\beta}-\beta^*(F) \approx \left(X^TW_{\beta^*(F)}(X) X\right)^{-1}X^T(Y - \pi_{\beta^*(F)}(X)) \]

Note that (under conditions…)

\[\begin{split} \begin{aligned} n^{-1/2} X^T(Y - \pi_{\beta^*(F)}(X)) & \overset{D}{\to} N(0, \Sigma(F)) \\ n^{-1}X^TW_{\beta^*(F)}(X) X &\overset{P}{\to} E_F[XX^T \cdot W_{\beta^*(F)}(X)] \\ & \overset{\text{def}}{=} Q(F) \end{aligned} \end{split}\]

Implies $$ \hat{\beta}-\beta^*(F) \overset{D}{\to} N(0, n^{-1} Q(F)^{-1}\Sigma(F)Q(F)^{-1}). $$
Variance above has so-called sandwich form.

Applying Slutsky $$ \hat{\beta}-\beta^*(F) \approx N\left(0, (X^TW_{\hat{\beta}}(X) X)^{-1} (n\Sigma(F)) (X^TW_{\hat{\beta}}(X) X)^{-1}\right) $$
We know everything in the approximation above (or can estimate it) except $\Sigma(F)$
Covariance $\Sigma(F)$ can be estimated, say, by bootstrap or sandwich estimator. (Approachable description by D. Freedman here).
Alternatively, entire logistic regression could be bootstrapped.

Testing in logistic regression¶

What about comparing full and reduced model?

For a model ${\cal M}$, $DEV({\cal M})$ replaces $SSE({\cal M})$.
In least squares regression, we use

\[\frac{1}{\sigma^2}\left( SSE({\cal M}_R) - SSE({\cal M}_F) \right) \sim \chi^2_{df_R-df_F}\]

This is replaced with (when model true)

\[DEV({\cal M}_R) - DEV({\cal M}_F) \overset{n \rightarrow \infty}{\sim} \chi^2_{df_R-df_F}\]

These tests assume model ${\cal M}_F$ is correctly specified. Otherwise, Wald-type test with sandwich or other covariance estimate would have to be used.
Deviance tests do not agree numerically with Wald tests even when model is correct. Both are often used. Wald can fail!

bigger_glm = glm(y ~ width + weight, data=crabs, family=binomial())
anova(width_glm, bigger_glm)

A anova: 2 × 4
	Resid. Df	Resid. Dev	Df	Deviance
	<dbl>	<dbl>	<dbl>	<dbl>
1	171	194.4527	NA	NA
2	170	192.8919	1	1.560777

We should compare this difference in deviance with a $\chi^2_1$ random variable.

1 - pchisq(1.56,1)

0.211665220269218

Let’s compare this with the Wald test:

summary(bigger_glm)

Call:
glm(formula = y ~ width + weight, family = binomial(), data = crabs)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1127  -1.0344   0.5304   0.9006   1.7207  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)   
(Intercept) -9.3547261  3.5280465  -2.652  0.00801 **
width        0.3067892  0.1819473   1.686  0.09177 . 
weight       0.0008338  0.0006716   1.241  0.21445   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.76  on 172  degrees of freedom
Residual deviance: 192.89  on 170  degrees of freedom
AIC: 198.89

Number of Fisher Scoring iterations: 4

Ordinal association

Grouped goodness of fit tests (5.2.5)

STATS 305B

Models for binary outcomes¶

Binary outcomes¶

Logit transform¶

Logistic regression model¶

Logistic regression¶

Logistic regression¶

Logistic regression¶

Interpretation of coefficients: Odds Ratios¶

Rare disease hypothesis¶

Other (linear) binary models¶

Modelling probabilities¶

Horseshoe crabs¶

Logistic regression and \(I \times 2\) tables¶

A 1 df test¶

Deviance¶

Deviance for logistic regression¶

Fitting the logistic model¶

Derivatives of log-likelihood¶

Newton-Raphson¶

Fitting the logistic model¶

Stationarity¶

Inference¶

Inference¶

Where do we really use the model?¶

What is logistic regression even estimating?¶

Testing in logistic regression¶