Following up $\chi^2$¶

Belief in god (response) vs level of education (predictor)¶

belief = matrix(c(9, 8, 27, 8, 47, 236, 
                  23, 39, 88, 49, 179, 706,
                  28, 48, 89, 19, 104, 293), 3, 6, byrow=TRUE) # Table 3.2
belief

A matrix: 3 × 6 of type dbl
9	8	27	8	47	236
23	39	88	49	179	706
28	48	89	19	104	293

Pearson’s $X^2$¶

chisq.test(belief)

	Pearson's Chi-squared test

data:  belief
X-squared = 76.148, df = 10, p-value = 2.843e-12

Likelihood ratio test statistic¶

lr_stat = function(data_table) {
    chisq_test = chisq.test(data_table)
    return(2 * sum(data_table * log(data_table / chisq_test$expected)))
}
lr_stat(belief)

73.1879088897061

Visual representation of residuals¶

library(vcd)
mosaic(belief, shade=TRUE)

Loading required package: grid

Decomposition of LR stat¶

schizophrenia = matrix(c(90, 12, 78, 13, 1, 6, 19, 13, 50), 3, 3, byrow=TRUE) # Table 3.3
colnames(schizophrenia) = c('biogenic', 'environmental', 'combination')
rownames(schizophrenia) = c('eclectic', 'medical', 'psychoanalytic')
schizophrenia
lr_stat(schizophrenia)

A matrix: 3 × 3 of type dbl
	biogenic	environmental	combination
eclectic	90	12	78
medical	13	1	6
psychoanalytic	19	13	50

Warning message in chisq.test(data_table):
“Chi-squared approximation may be incorrect”

23.0361921040592

subtable = function(data_table, i, j) {
    new_table = matrix(0, 2, 2) 
    for (k in 1:(i-1)) {
        for (l in 1:(j-1)) {
           new_table[1,1] = new_table[1, 1] + data_table[k, l]
        }
        new_table[1, 2] = new_table[1, 2] + data_table[k, j]
    }
    for (l in 1:(j-1)) {
        new_table[2, 1] = new_table[2, 1] + data_table[i, l]
    }  
    new_table[2, 2] = data_table[i, j]
    return(new_table)
}
subtable(schizophrenia, 3, 3)

A matrix: 2 × 2 of type dbl
116	84
32	50

G_total = lr_stat(schizophrenia)
G_total

Warning message in chisq.test(data_table):
“Chi-squared approximation may be incorrect”

23.0361921040592

G_decomp = 0
for (i in 2:nrow(schizophrenia)) {
    for (j in 2:ncol(schizophrenia)) {
        increment = lr_stat(subtable(schizophrenia, i, j))
        print(increment)
        G_decomp = G_decomp + increment
    }
}
G_decomp

Warning message in chisq.test(data_table):
“Chi-squared approximation may be incorrect”

[1] 0.2941939
[1] 1.358793
[1] 12.95288
[1] 8.43033

23.0361921040592

What’s going on here?¶

Each increment is equal to LR test statistic comparing a simpler model to a richer one (this is not obvious at this point)
Important: each of these models are nested: richer model at one stage becomes simpler model at next stage.
Each increment can be written as $$ DEV(M_s) - DEV(M_r) $$ where $DEV$ is analogous to $SSE$ in lm (we’ll see this more in next few weeks)
Summing over the sequence yields telescoping sum which implies G_decomp = G_total…
For nested models $M_0 \subset M_1 \subset \dots \subset M_k$ there is a decomposition of deviance analogous to a decomposition of SS (i.e. ANOVA)…

	guess_milk	guess_tea
truth_milk	3	1
truth_tea	1	3

STATS 305B

Following up \(\chi^2\)¶

Belief in god (response) vs level of education (predictor)¶

Pearson’s \(X^2\)¶

Likelihood ratio test statistic¶

Visual representation of residuals¶

Decomposition of LR stat¶

What’s going on here?¶

Fisher’s exact test for 2x2 tables¶

Conditional distribution¶