Lecture 20: The Potential Outcomes Model#
STATS 60 / STATS 160 / PSYCH 10
Concepts and Learning Goals:
Analyze the results of a randomized experiment using the potential outcomes model.
p-values by simulation
Critiquing experimental designs:
Does eating breakfast help to lose weight?
Review#
The best way to determine causality is to run a randomized experiment.
Last class, we started a randomized experiment to examine whether retrieval practice causes better learning.
Today, we will finish this experiment and analyze the results!
Quiz#
Take 5 minutes to do this 6-question quiz.
Try your best, but don’t stress out—it doesn’t count towards your grade!
Answers#
B : \(A\) increases the chance of \(B\) in a causal manner.
A : \(B\) can’t happen unless \(A\) does.
C : \(A\) causes \(B\) every time.
B : Mike Pence
D : 1/10
B : He says there is not a sufficient causal relationship between smoking and smoking-related-deaths, ignoring the strong probabilistic causal relationship.
Results#
We’ll use colab to calculate the mean score for the treatment and control groups.
Inference for a Randomized Experiment#
Maybe the difference is just noise.
Even if the treatment had no effect, the difference wouldn’t be exactly zero.
How do we quantify the noise in a randomized experiment?
Potential Outcomes Model {.smaller}#
In the potential outcomes model, each subject \(i\) has two “potential outcomes”.
\(Y_i(0)\) if they receive the control
\(Y_i(1)\) if they receive the treatment
\(Y_i(1) - Y_i(0)\) represents the treatment effect for subject \(i\).
Fundamental Problem of Causal Inference: We can only observe one of the potential outcomes, \(Y_i(1)\) or \(Y_i(0)\), because each subject is either assigned to treatment or control, not both.
Potential Outcomes Table#
It is easiest to visualize this model as a table.
Here \(Y_i(0)\) is the score the \(i\)th student would receive if they end up in the control group, and \(Y_i(1)\) is the score they’d receive in the treatment group.
| $i$ | $Y_i(0)$ | $Y_i(1)$ |
|---|---|---|
| $1$ | $2$ | |
| $2$ | $3$ | |
| $3$ | $1$ | |
| $4$ | $0$ | |
| ... | ... | ... |
Notice that we only observe one potential outcome per row.
The Null Hypothesis#
The null hypothesis is that the treatment has no effect.
Under this null hypothesis, we can fill in the missing potential outcomes.
| $i$ | $Y_i(0)$ | $Y_i(1)$ |
|---|---|---|
| $1$ | $2$ | $2$ |
| $2$ | $3$ | $3$ |
| $3$ | $1$ | $1$ |
| $4$ | $0$ | $0$ |
| ... | ... | ... |
Randomness in a Randomized Experiment {.smaller}#
The null hypothesis is that the treatment has no effect.
Under this null hypothesis, we can fill in the missing potential outcomes.
| $i$ | $Y_i(0)$ | $Y_i(1)$ |
|---|---|---|
| $1$ | $2$ | $2$ |
| $2$ | $3$ | $3$ |
| $3$ | $1$ | $1$ |
| $4$ | $0$ | $0$ |
| ... | ... | ... |
In a randomized experiment, the randomness is in the assignment of subjects to treatments.
Randomness in a Randomized Experiment {.smaller}#
The null hypothesis is that the treatment has no effect.
Under this null hypothesis, we can fill in the missing potential outcomes.
| $i$ | $Y_i(0)$ | $Y_i(1)$ |
|---|---|---|
| $1$ | $2$ | $2$ |
| $2$ | $3$ | $3$ |
| $3$ | $1$ | $1$ |
| $4$ | $0$ | $0$ |
| ... | ... | ... |
In a randomized experiment, the randomness is in the assignment of subjects to treatments. (Remember we flipped a coin to assign you).
Depending on the treatment assignments, the difference in means will vary, even under the null hypothesis of no treatment effect!
p-value from simulation#
We’ll use colab to to simulate random assignments of subjects to treatment and control.
You might also like to use this applet, which is more visual:
potential-outcomes.github.io{target=”_blank”}
Summary#
Even if the difference in means between the two groups is not zero, we still have to consider the possibility that this could have happened randomly; the “null hypothesis.”
We consider this possibility using the potential outcomes model.
Every subject has a potential outcome under control, \(Y_i(0)\), and treatment, \(Y_i(1)\).
We only observe one of these potential outcomes.
But under the null hypothesis that the treatment has no effect, we can fill in the missing potential outcomes.
Now we can simulate (under the null hypothesis) the difference in means for alternative random assignments.
The number of alternative universes where the treatment has an effect at least as large as our universe is the \(P\)-value.
“The most important meal”#
Breakfast#
Anecdotally, people who eat breakfast are healthier.
Here we will see a couple of different experiments with a common goal.
GOAL: determine whether eating breakfast helps to maintain a healthy body weight.
We’ll consider a few experimental designs, and critique each. Try to answer the following questions:
What are the strengths of the experimental design?
What are the weaknesses?
Does the experiment have reliable conclusions?
Consider the following questions:
Is there a sensible control group?
Are you worried about confounding variables?
How large is the sample size?
Are you worried about sample bias?
Are the measurements relevant to the hypothesis?
Easy-Peasy#
An overweight podcaster wants to determine if eating breakfast will help him lose weight.
The podcaster normally does not eat breakfast.
He weighs himself on January 1, then eats breakfast every day for 6 months, and weighs himself again on July 1, and checks whether he lost weight.
The podcaster lost 7 lbs. He releases an episode about the benefits of breakfast.
What are the pros of this experimental design?
What are the cons?
Does the experiment have reliable conclusions?
Pros: the experiment is easy to do. The analysis is not complicated and the conclusions are easy to summarize.
Cons:
The sample size is small.
Selection bias: the fact that the podcaster opted for it might be correlated with it working
Confounding variables: The control group is the podcaster in the past; this control is probably not comparable due to many possible confounding variables (e.g. could the podcaster be subconsciously changing his behavior in other ways because of the experiment?).
Conclusions: This is at least a proof of concept that it is possible to lose weight while beginning to eat breakfast.
That is, if we let \(E\) be the event of eating breakfast and \(L\) be the event of losing weight, $\( P(L \mid E) > 0.\)$ But we cannot conclude much about how large this probability is, just that it is nonzero.
Ask around#
A health insurance provider conducts a survey of its \(10,000\) customers. The survey includes the following questions:
How much do you weigh?
How many days per week do you typically eat breakfast?
A data scientist looks at the standardized scatterplot of body weight vs. days of breakfast per week.
The correlation coefficient is \(R = -.1\), with \(p\)-value \(.03\).
The average weight among those who eat breakfast 4+ days per week is 15 lbs lower than the average weight among those who eat breakfast < 4 days per week (sadly, the average in both categories is still overweight).
What are the pros of this experimental design?
What are the cons?
Does the experiment have reliable conclusions?
Pros:
The sample size is large.
The experiment is easy to do.
The sample is likely representative (of the population of people with health insurance in the location served by the company)
Cons:
The data is self-reported and might be inaccurate in a systematic way:
people might under-report their weight
over-report days of breakfast
Selection bias: responding to the survey is a choice! Maybe survey responders are more likely to have time on their hands, and thus be older/wealthier/etc?
This is an observational study and there is no randomized control.
Confounding variables: breakfast might correlated with sleep quality, time pressure / stress, organization, family structure, all of these might affect body weight.
It is not clear that the correlation coefficient is a great measurement of the association, as days of breakfast and body weight are probably not linearly associated; maybe it makes more sense to look only at people above a certain body mass index?
Conclusions: We can pretty confidently conclude that breakfast is correlated with a lower body weight in this population, even if we allow for some inaccuracies due to reporting.
We cannot conclude a causal relationship.
Randomized Control Trial#
A medical researcher at a university hospital recruits 50 female participants between the ages of 25-50 who are habitual non-breakfast eaters, and meet the minimal requirements of regularly sleeping at least 6 hours per night and have stable body weight for the last 3 months.
They are randomized into two groups: The treatment group is assigned to eat breakfast every day for a month consuming at least 15% of recommended daily calorie intake within 90 minutes of waking. The control group does not change their behavior.
At the end of the month the researcher weighs the women and analyzes the change in weight using the potential outcomes model. The mean difference in weight gain between the breakfast eaters and non-eaters is 1 lbs (breakfast eaters weighed more), with \(p\)-value 0.03.
What are the pros of this experimental design?
What are the cons?
Does the experiment have reliable conclusions?
Pros:
This is an RCT, and the design is such that we could infer a causal relationship.
Cons:
The study is more difficult to do.
The sample size is small.
Selection bias: the population was limited to women in a given age range and probably in a specific geographic area.
The design of the study might be a bit “rigged”: if you add extra calories to someone’s day, they will probably gain weight.
Conclusions: The study suggests a causal relationship between eating breakfast and weight gain, in women who normally do not eat breakfast.
We can be reasonably confident that within the specific population recruited for the study, adding breakfast without compensating by eating fewer calories later in the day will, on average, result in weight gain.
The small sample size and restricted population might limit the applicability of the results.