Discussion 5: Conditional Probability#
STATS 60 / STATS 160 / PSYCH 10
Today’s section
Recap of lecture material.
Week 5 practice quiz 3.
Conditional probability game show!
Recap#
Conditional Probability#
Conditional probability lets us update probabilities based on new information.
For events \(A\) and \(B\), we say that the probability of \(A\) given \(B\) is $\(\Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]}\)$
When all outcomes have the same probability, $\(\Pr[A \mid B] = \frac{\# \text{ outcomes simultaneously in \)A\( and \)B\(}}{\#\text{ outcomes in \)B\(}}\)$
\(A,B\) are independent when knowing that \(B\) happened doesn’t impact the probability that \(A\) happened, \(\Pr[A \mid B] = \Pr[A]\).
In class, we practiced a lot translating between English and probabilistic language, and comparing \(\Pr[A \mid B]\) and \(\Pr[B \mid A]\).
Bayes’ rule lets us compute \(\Pr[A \mid B]\) from \(\Pr[B\mid A]\). We didn’t focus on it much, just pointed out that this tool exists.
Fallacies in reasoning about conditional probability#
The base rate fallacy is the mistake of weighting specific information too heavily, without remembering the bigger picture.
Knowing that \(\Pr[B \mid A]\) is large, you assume \(\Pr[A \mid B]\) is also large, even though the “base rate” \(\Pr[A]\) might be really small.
Examples: conflating test accuracy with low false positive rate, librarians vs. farmers
Conditioning out of context: a large (or small) conditional probability is reported, but it is not clear if conditioning increased or decreased the probability.
Example: Scottish hiking deaths, NFL players from California
The prosecutor’s fallacy: confusing \(\Pr[A \mid B]\) with \(\Pr[B \mid A]\)
In the courtroom, \(A\) is the evidence and \(B\) is guilt.
Example: Sally Clark case
The defense attourney’s fallacy: failing to condition on all of the available information.
“Not conditioning hard enough”
Examples: OJ Simpson, bad study habits
Generalizing from a biased sample: Confusing \(\Pr[A \mid B]\) with \(\Pr[A]\).
Example: hot guys are jerks!
Practice Quiz 3#
Question 1#
A classroom of 28 students is evenly split between seniors, juniors, sophomores and first-years. There are four English majors in the class; two are juniors and two are first-years.
Choose a student uniformly at random from the class; let \(E\) be the event that the student is an English major, and let \(F\) be the event that the student is a first year.
Describe \(\Pr[E \mid F]\) in plain English.
Answer 1#
This is the chance that if you choose a first-year uniformly at random, they will be an English major.
Question 2#
A classroom of 28 students is evenly split between seniors, juniors, sophomores and first-years. There are four English majors in the class; two are juniors and two are first-years.
Choose a student uniformly at random from the class; let \(E\) be the event that the student is an English major, and let \(F\) be the event that the student is a first year.
What is larger, \(\Pr[E \mid F]\) or \(\Pr[E \mid \overline{F}]\)?
Answer 2#
while
so \(\Pr[E \mid F]\) is larger.
Question 3#
A classroom of 28 students is evenly split between seniors, juniors, sophomores and first-years. There are four English majors in the class; two are juniors and two are first-years.
Choose a student uniformly at random from the class; let \(E\) be the event that the student is an English major, and let \(F\) be the event that the student is a first year.
The class takes an “anonymized” survey. One of the questions on the survey is “what is your major?” and another question is “what is your class year?.” Explain the flaw in the following statement by the course instructor using the language of conditional probability:
“The survey is anonymous because there are 7 of you in each year, so even if I know your class year, I only have a 1/7 chance of guessing who you are.”
Answer 3#
The flaw is that the instructor will also know the class year; there are only two English majors in a year, so conditioned on all available information of both major and class year the instructor might have a 1/2 chance of guessing who the student is.
Conditional logic game show!#
Game show rules#
Form teams of 2, and choose an ordering of all of the teams.
Each of the following slides will have a statement which contains a mistake in conditional logic.
Before I reveal the slide, next two teams will face off.
I will read the example on the slide, then the teams race to
Model the scenario using the language of conditional probabilities: what are the events in question, what is the information/statistic phrased in the language of conditional probability, and what is the mistaken conclusion or implication?
Identify the logical mistake or fallacy.
The first team to finish raises their hands. If they are correct, they are awarded a bonus point. If they are wrong, the other team gets a bonus point.
We’ll then talk through the scenario as a class.
Scenario 1:#
9 out of 10 students who got a B+ or above on the assignment used AI to write the essay. If I want a good grade I had better use AI to write my own essay.
\(A\) is the event of using AI to write the essay \(B\) is the event of getting a B+ or better.
This is saying \(\Pr[A \mid B] = 0.9\).
The statement suggests that the chance of doing well on the essay increase if you use AI.
But we cannot conclude that from this information!
This could be seen as either:
The prosecutor’s fallacy (confusing \(\Pr[A \mid B]\) with \(\Pr[B \mid A]\)), or
The base rate fallacy (maybe \(\Pr[A]\) is very large to begin with, and \(\Pr[A \mid B]\) is about the same as \(\Pr[A]\); they could basically be independent).
Scenario 2:#
A Stanford professor interacts with their students, and concludes that young people these days are much more likely to be stressed out than the professor remembers from back in their day.
This is the fallacy of generalizing from a biased sample.
Let \(U\) be the event that a person is a Stanford student. Let \(S\) be the event that a person is stressed out.
The professor has observed \(\Pr[S \mid U]\) to be high, and based on this assumes that \(\Pr[S]\) is high (in the sample space of young people).
But this could be sample bias, as it could be that young people who are not Stanford students are not as stressed and \(\Pr[S]\) is much smaller than \(\Pr[S \mid U]\).
Scenario 3:#
A crime scene has footprints from size 14 men’s shoes; less than 5% of men have size 14 shoes.
The primary suspect wears size 14 shoes, but in the city where the crime took place there are at least 25,000 men with size 14 shoes, so there is only a 1/25000 chance that he committed the crime.
This is an example of the defense attorney’s fallacy.
Let \(G\) be the event of having committed the crime, let \(L\) be the event of having size 14 shoes.
The argument here is that, within the sample space of the city where the crime was committed,
\(\Pr[G \mid L] = 1/25000\).
But presumably the primary suspect was not just chosen uniformly among the people who wear size 14 shoes; there is likely other evidence against him as well.
When conditioning on that additional evidence, the likelihood of guilt increases.
Scenario 4:#
Only 7% of people have type O-negative blood. At a crime scene of a murder, the blood of the victim (type A positive) was found to be mixed with type O-negative blood.
The chief suspect has type O-negative blood. So the chance that the suspect is the perpetrator is 93%.
Let \(O\) be the event of the defendant having type O-negative blood. Let \(G\) be the event of the defendant being guilty.
The argument is that if the man were innocent, the chance of having type O-negative blood is \(\Pr[O \mid \overline{G}] = 0.07\).
The fallacy here is to assume that \(\Pr[\overline{G} \mid O] = 0.07\), and therefore by the law of complements \(\Pr[G \mid O] = .93\).
This is the prosecutor’s fallacy; we are actually interested in the chance the defendant is innocent, $\(\Pr[\overline{G} \mid O].\)$
Scenario 5:#
A recruiter is reviewing applications for a job, for which only 10% of the applicant pool is actually qualified.
The recruiter automatically marks any applicant with a GPA \(\ge 3.7\) as “likely qualified”, reasoning that 3.7 is the GPA associated with an A- average or above, and an A- is a good grade.
This is an example of the base rate fallacy.
Let \(Q\) be the event that the candidate is qualified, and let \(A\) be the event of having a GPA at least 3.7.
The recruiter reasons that \(\Pr[A \mid Q]\) is high. But the recruiter then goes on to assume that \(\Pr[Q \mid A]\) must be similarly high.
This neglects the base rate for \(Q\) and also the base rate for \(A\)!
If almost everyone has a high GPA because of grade inflation, and only 10% of the applicant pool is qualified, then \(\Pr[Q \mid A]\) would be small.