Alternatives to T

Alternatives to T#

Download#

Outline#

Case studies:
1. Shuttle O-ring incidents
2. Cognitive load in math problems
Robustness and resistance of two-sample \(t\)-tests
Transformations

require(ggplot2)
set.seed(0)

Loading required package: ggplot2

Case study A: O-rings failing to seal fuel#

Is there a difference between Cool(<65F) and Warm (>65F)?
Temperature on launch day: 29F.

orings = read.csv('https://raw.githubusercontent.com/StanfordStatistics/stats191-data/main/Sleuth3/orings.csv', header=TRUE)
orings$Incidents[orings$Launch == 'Cool']
orings$Incidents[orings$Launch == 'Warm']

1
1
1
3

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
2

What test to use?#

Samples are really small
No sense in which they could be normally distributed…

Permutation test#

Decide a test statistic \(T\) comparing Cool(<65F) to Warm (>65F). Could be the two-sample \(t\) test statistic
Under \(H_0:\) Cool(<65F) has the same distribution Warm (>65F) suppose we:
1. Shuffle the response in the two samples by a random ordering \(\sigma\) (called a permutation)
2. Recompute the statistic yielding \(T(\sigma)\)

Permutation test in practice#

Build up a reference distribution under \(H_0\)#

B = 10000 # n_permutations
null_stats = rep(NA, B)
N = nrow(orings)
for (i in 1:B) {
    idx = sample(N, N, replace=FALSE)
    orings_star = data.frame(Incidents=orings$Incidents[idx],
                             Launch=orings$Launch)
    null_stats[i] = t.test(Incidents ~ Launch, data=orings_star)$stat
}

Histogram of \(t\)-statistic under \(H_0\)#

Doesn’t look \(t\)-shaped at all… but test is valid. Why?
Under \(H_0\): shuffling the response doesn’t change the distribution!

hist(null_stats)

../../_images/a0987d4ce16cde8cfd43a1294edd4438a7ad05ea867c066ed82a19ca437964e4.png

Compute a \(p\)-value#

observed = t.test(Incidents ~ Launch, data=orings)$stat
observed
mean(null_stats > observed)

t: 2.53163615804613

0.0023

Rank Sum Test#

One can use any test statistic for permutation test…
Only tests null \(H_0\): distributions are the same.
What should we use for \(H_a\)?

Case study B: difference in cognitive load#

Time to solve math problems under two different presentations
Small sample size, not normal?

cognitive_load = read.csv('https://raw.githubusercontent.com/StanfordStatistics/stats191-data/main/Sleuth3/cognitive_load.csv', header=TRUE)
cognitive_load

A data.frame: 28 × 3
Time	Treatment	Censored
<int>	<chr>	<int>
68	Modified	0
70	Modified	0
73	Modified	0
75	Modified	0
77	Modified	0
80	Modified	0
80	Modified	0
132	Modified	0
148	Modified	0
155	Modified	0
183	Modified	0
197	Modified	0
206	Modified	0
210	Modified	0
130	Conventional	0
139	Conventional	0
146	Conventional	0
150	Conventional	0
161	Conventional	0
177	Conventional	0
228	Conventional	0
242	Conventional	0
265	Conventional	0
300	Conventional	1
300	Conventional	1
300	Conventional	1
300	Conventional	1
300	Conventional	1

Rank sum test#

Rank the outcome
\(T\) = Sum the ranks in one of the groups (14 of 28 are Modified)
Under \(H_0:\) distributions are identical, \(T\) has distribution the sum of 14 ranks in a random permutation…

cognitive_load$R = rank(cognitive_load$Time)
rank_sum = with(cognitive_load, sum(R[Treatment == 'Modified']))
rank_sum
W = 14^2 + 14*15/2 - rank_sum
c(W, 14*14-W)

137

164
32

Rank sum test#

Using a builtin test

wilcox.test(Time ~ Treatment,
            data=cognitive_load,
            alternative='greater')

Warning message in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...):
“cannot compute exact p-value with ties”

	Wilcoxon rank sum test with continuity correction

data:  Time by Treatment
W = 164, p-value = 0.001271
alternative hypothesis: true location shift is greater than 0

Rank sum test#

Null distribution (ignoring ties)#

null_sample = runif(28)
null_treat = c(rep('M', 14), rep('U', 14))
null_df = data.frame(S=null_sample, T=null_treat)
null_df$R = rank(null_df$S)
with(null_df, sum(R[T=='M']))

202

R uses a distributional approximation…

Confidence interval#

wilcox.test(Time ~ Treatment,
            data=cognitive_load,
            alternative='two.sided',
            conf.int=TRUE)

Warning message in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...):
“cannot compute exact p-value with ties”

Warning message in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...):
“cannot compute exact confidence intervals with ties”

	Wilcoxon rank sum test with continuity correction

data:  Time by Treatment
W = 164, p-value = 0.002542
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  57.00007 159.99998
sample estimates:
difference in location 
                    94 

What parameter is this estimating?

Shift alternative#

{height=400 fig-align=”center”}

\(H_a:\) the difference between groups is a simple shift
Other interpretations: estimating median of difference…

Paired data#

There are plenty of other non-parametric tests out there

Signed rank test for paired data (i.e. `schizophrenia`)#

\(H_0:\) distributions are the same in each group..
\(\implies\) differences are symmetrically distributed around 0!

Symmetric null#

{height=400 fig-align=”center”}

\(H_0:\) distribution is symmetric around 0.

Symmetric alternative#

{height=400 fig-align=”center”}

\(H_a:\) distribution is symmetric around \(\delta\).

Example of signed rank `schizophrenia`#

schizophrenia = read.csv('https://raw.githubusercontent.com/StanfordStatistics/stats191-data/main/Sleuth3/schizophrenia.csv', header=TRUE)
with(schizophrenia, wilcox.test(Affected, Unaffected), paired=TRUE)

Warning message in wilcox.test.default(Affected, Unaffected):
“cannot compute exact p-value with ties”

	Wilcoxon rank sum test with continuity correction

data:  Affected and Unaffected
W = 71.5, p-value = 0.09295
alternative hypothesis: true location shift is not equal to 0

Compare to `t.test`#

with(schizophrenia, t.test(Affected - Unaffected))

	One Sample t-test

data:  Affected - Unaffected
t = -3.2289, df = 14, p-value = 0.006062
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.3306292 -0.0667041
sample estimates:
 mean of x 
-0.1986667 

Alternatives to T

Contents

Alternatives to T#

Download#

Outline#

Case study A: O-rings failing to seal fuel#

What test to use?#

Permutation test#

Permutation test in practice#

Build up a reference distribution under \(H_0\)#

Histogram of \(t\)-statistic under \(H_0\)#

Compute a \(p\)-value#

Rank Sum Test#

Case study B: difference in cognitive load#

Rank sum test#

Rank sum test#

Rank sum test#

Null distribution (ignoring ties)#

Confidence interval#

Shift alternative#

Paired data#

Signed rank test for paired data (i.e. schizophrenia)#

Symmetric null#

Symmetric alternative#

Example of signed rank schizophrenia#

Compare to t.test#

Signed rank test for paired data (i.e. `schizophrenia`)#

Example of signed rank `schizophrenia`#

Compare to `t.test`#