Drawing Statistical Conclusions#

Download#

Case Study A: Motivation for creative writers#

  • Creative writing students randomly assigned to intrinsic vs. extrinsic priming questionnaires.

require(ggplot2)
creativity = read.csv('https://raw.githubusercontent.com/StanfordStatistics/stats191-data/main/Sleuth3/creativity.csv', header=TRUE)
salaries = read.csv('https://raw.githubusercontent.com/StanfordStatistics/stats191-data/main/Sleuth3/salaries.csv', header=TRUE)
set.seed(0)
Loading required package: ggplot2
head(creativity)
A data.frame: 6 × 2
ScoreTreatment
<dbl><chr>
1 5.0Extrinsic
2 5.4Extrinsic
3 6.1Extrinsic
410.9Extrinsic
511.8Extrinsic
612.0Extrinsic

Summarizing the groups#

Extrinsic Group#

extrinsic = creativity$Score[creativity$Treatment == 'Extrinsic']
summary(extrinsic)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5.00   12.15   17.20   15.74   18.95   24.00 

Intrinsic Group#

intrinsic = creativity$Score[creativity$Treatment == 'Intrinsic']
summary(intrinsic)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   17.43   20.40   19.88   22.30   29.70 

Histogram of Score stratified by Sex#

# this plot is for visualization only
# students not expected to reproduce
fig <- (ggplot(creativity, aes(x=Score, fill=Treatment)) +
        geom_histogram(aes(y=after_stat(density)),
	color="#e9ecef",
	alpha=0.6,
	position='identity',
	bins=10) +
        labs(fill=""))
fig
../../_images/4361dbbfd59a1e3357dd7dab3f6087ec903cb93daf61e934bb34d88eccd019f6.png

Case Study B: Difference in salaries between male and female employees#

  • Salaries from Harris Trust and Bank over years 1969-1977

head(salaries)
A data.frame: 6 × 2
SalarySex
<int><chr>
13900Female
24020Female
34290Female
44380Female
54380Female
64380Female

Females#

female = salaries$Salary[salaries$Sex == 'Female']
summary(female)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3900    4800    5220    5139    5400    6300 

Males#

male = salaries$Salary[salaries$Sex == 'Male']
summary(male)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4620    5400    6000    5957    6075    8100 

Histogram of Salary stratified by Sex#

# this plot is for visualization only
# students not expected to reproduce

fig <- (ggplot(salaries, aes(x=Salary, fill=Sex)) +
        geom_histogram(aes(y=after_stat(density)),
                       color="#e9ecef",
                       alpha=0.6,
                       position='identity',
                       bins=10) +
        labs(fill=""))
fig
../../_images/e2f3a033b311a2e753496a121007e9da593cf3d1a720d6e3b385b5d8c309a7e5.png

Boxplot of Salary stratified by Sex#

boxplot(Salary ~ Sex, data=salaries, col='orange')
../../_images/29ab0d5bc6e2104f0700a43fe39dd694a420b952309335e625bdadaca951836e.png

Key differences between the studies#

  • Creative writing study was a randomized experiment.

  • Salary dataset was an observational study.

Implications#

  • Differences in strength of conclusions: randomized experiments like creativity can admit causal conclusions

  • Generalizability: what population are the data from?

    1. If we consider Harris a typical bank, then salaries represents a sample of starting salaries.

Making statistical inferences#

{width=600 fig-align=”center”}

  • What sort of conclusions are we entitled to make?

Modelling uncertainty in creativity#

  • The observed difference (i.e. treatment_effect) is not 0. Is there a real difference?

  • We need a (statistical) model to draw statistical inferences!

Mental model: the world before randomization#

{width=600 fig-align=”center”}

  • Potential outcomes before randomization

Mental model: the null hypothesis#

{width=400 fig-align=”center”}

  • \(H_0\): green outcome identical to red

Mental model: the world after randomization#

{width=800 fig-align=”center”}

  • Observed outcomes after randomization

Computing difference via t.test#

estimates = t.test(Score ~ Treatment, data=creativity)$estimate
estimates
mean in group Extrinsic
15.7391304347826
mean in group Intrinsic
19.8833333333333

Effect#

treatment_effect = treatment_effect=estimates[2] - estimates[1]
treatment_effect
mean in group Intrinsic: 4.14420289855072

Null hypothesis: no difference in Score between the groups#

null_treatment = sample(creativity$Treatment, 47, replace=FALSE)
null_data = data.frame(Score=creativity$Score,
                       Treatment=null_treatment)
null_estimates = t.test(Score ~ Treatment, data=null_data)$estimate
null_estimates
mean in group Extrinsic
17.3782608695652
mean in group Intrinsic
18.3125
null_treatment_effect = null_estimates[2] - null_estimates[1]
null_treatment_effect
mean in group Intrinsic: 0.934239130434783

Repeated 10000 times#

# this code/plot is for visualization only
# students not expected to reproduce

estimates = t.test(Score ~ Treatment, data=creativity)$estimate
treatment_effect = treatment_effect=estimates[2] - estimates[1]

null_treatment_effect = rep(NA, 10000)
for (i in 1:10000) {
    null_treatment = sample(creativity$Treatment, 47, replace=FALSE)
    null_data = data.frame(Score=creativity$Score,
                           Treatment=null_treatment)
    null_estimates = t.test(Score ~ Treatment, data=null_data)$estimate
    null_treatment_effect[i] = null_estimates[2] - null_estimates[1]
}

treatment_effect = data.frame(treatment_effect=treatment_effect)
fig <- (ggplot(data.frame(null_treatment_effect),
               aes(x=null_treatment_effect)) +
    geom_histogram(aes(y=after_stat(density)),
                   color="#e9ecef",
                   alpha=0.6, bins=30) +
    geom_vline(aes(xintercept=treatment_effect), treatment_effect,
               color='red', linewidth=2) +
    geom_vline(aes(xintercept=-treatment_effect), treatment_effect,
               color='red', alpha=0.5, linewidth=1) +
    geom_density() +  
    labs(fill=""))
fig
treatment_effect = treatment_effect[1,]
../../_images/b689521fb8e67d4e132298ca0260aa1c05da7a6f6fad9141b7d04cb84a8ba1fd.png
length(null_treatment_effect)
p_value = mean(abs(null_treatment_effect) > treatment_effect)
p_value
10000
0.006

Modelling uncertainty in salaries#

  • The difference is not 0. Is the difference real?

  • We need a model to draw statistical inferences!

Mental model: Male and Female salaries#

{width=600 fig-align=”center”}

  • There are two populations of salaries

Mental model: Male and Female salaries#

{width=600 fig-align=”center”}

  • \(H_0\): distribution of orange box identical to purple

Computing difference via t.test#

sex_estimates = t.test(Salary ~ Sex, data=salaries)$estimate
sex_effect = sex_estimates[2] - sex_estimates[1]
sex_effect
mean in group Male: 818.022540983607

Repeated 10000 times#

# this code/plot is for visualization only
# students not expected to reproduce

sex_estimates = t.test(Salary ~ Sex, data=salaries)$estimate
sex_effect = sex_estimates[2] - sex_estimates[1]

null_sex_effect = rep(NA, 10000)
for (i in 1:10000) {
    null_sex = sample(salaries$Sex, length(salaries$Sex), replace=FALSE)
    null_data = data.frame(Salary=salaries$Salary,
                           Sex=null_sex)
    null_estimates = t.test(Salary ~ Sex, data=null_data)$estimate
    null_sex_effect[i] = null_estimates[2] - null_estimates[1]
}

sex_effect = data.frame(sex_effect=sex_effect)
fig <- (ggplot(data.frame(null_sex_effect),
               aes(x=null_sex_effect)) +
        geom_histogram(aes(y=after_stat(density)),
                       color="#e9ecef",
	               alpha=0.6, bins=30) +
    geom_vline(aes(xintercept=sex_effect), sex_effect,
               color='red', linewidth=2) +
    geom_vline(aes(xintercept=-sex_effect), sex_effect,
               color='red', alpha=0.5, linewidth=1) +
    geom_density() +  
    labs(fill=""))
fig
sex_effect = sex_effect[1,]
../../_images/7b5723b62abd6cf1f4d19e0799bdd7e37b3134f01a01106a04abb037ac2cc03a.png
length(null_sex_effect)
p_value = mean(abs(null_sex_effect) > sex_effect)
p_value
10000
0

Other issues#

  • We used the same method even for these different studies… does this make sense?

  • Terminology:

    1. Parameter: a property of the probability model (often written \(\theta\))

    2. Estimate: a function of the sample data (often written \(\hat{\theta}\))

    3. Goal of statistical inference is to learn about the parameter \(\theta\) from the estimate \(\hat{\theta}\)

Other issues#

  • Experimental design:

    1. Randomization: individuals were randomly assigned Treatment in creative study

    2. Simple random sample: a way of sampling \(n\) from a population such that every \(n\) points are equally likely.

    3. Other sampling mechanisms: systematic sampling, cluster sampling.