Lecture 13: Conditional Probability

Lecture 13: Conditional Probability#

STATS 60 / STATS 160 / PSYCH 10

Concepts and Learning Goals:

Conditional probability
Independence
Bayes’ rule

Probability and partial information#

Stock market: we made an investment based on a prediction that a company will do well. How can we update our forecasted profits based on news stories?
- Example: any company that relies on import/export costs is sensitive to tariffs
- Example: any company that relies on making foodstuffs with grains is sensitive to weather and the war in Ukraine
Weather: yesterday you read that there is a 50% chance of rain today. When you woke up there were no clouds. No you think rain is probably less likely.
Texas Hold ‘Em: In this variant of poker, each player’s hand is made up of two private “hole” cards, and 5 shared “community” cards which are revealed in stages. As the community cards are revealed, you know more about how strong your hand is.

Conditioning is the act of updating probabilities based on partial information.

Conditional probability#

For events $A,B$, the conditional probability of $A$ given $B$ is the probability that $A$ happens, given that we know $B$ happens. We write

\[ \text{the conditional probability of $A$ given $B$ = } \Pr[A \mid B] \]

And sometimes we say the probability of $A$ given $B$.

Example: poker hands#

Example: In poker, your own hand gives you information about other players’ hands!

Let $A$ be the event that your rival has at least one ace.

Let $B$ be the event that your $5$-card hand has all $4$ aces in it.

What is the sample space? How many outcomes are there?

The sample space is the space of all pairs of 5-card hands.

By the multiplication rule, there are $52 \times 51 \times \cdots \times 43$ outcomes.

Example: poker hands#

Example: In poker, your own hand gives you information about other players’ hands!

Let $A$ be the event that your rival has at least one ace.

Let $B$ be the event that your $5$-card hand has all $4$ aces in it.

What is the probability of $A$, $\Pr[A]$?

It seems easier to calculate the probability of the complement, so we use the rule of complements:

\[\Pr[A] = 1-\Pr[\text{rival has 0 aces}]\]

\[ = 1 - \frac{\text{\# outcomes without any aces}}{\text{total \# outcomes}} \]

\[ = 1 - \frac{48 \times 47 \times 46 \times 45 \times 44}{52 \times 51 \times 50 \times 49 \times 48} \approx 0.33.\]

Example: poker hands#

Example: In poker, your own hand gives you information about other players’ hands!

Let $A$ be the event that your rival has at least one ace.

Let $B$ be the event that your $5$-card hand has all $4$ aces in it.

Suppose now we look at our cards and see that $B$ happened. Given this new information, is it still possible that $\Pr[A] \approx 0.33$?

No! If $B$ happens, then for sure your opponent cannot have any aces! So

\[\Pr[A\mid B] = 0.\]

Conditioning can dramatically change probabilities#

Conditioning on the information you know can dramatically change probabilities!

Question: Take a moment to think of an example of an uncertain situation from real life, in which you learned information $B$ that dramatically changed your estimate of whether some $A$ was going to happen.

Conditioning as “zooming in”#

Conditioning on $B$ is like “zooming in” to the set of outcomes in event $B$.

Basically, we are taking all outcomes that are not in $B$, and eliminating them from the sample space.

Zooming in on $B$. Image credit to Blitzstein and Hwang, chapter 2.

Consequently, we have the rule:

\[ \Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]} \]

Can you justify this formula?

Example: Two coinflips. {.nostretch}#

Suppose I flip a fair coin twice.

Two coin flips.

Let $A$ be the event that the first coinflip comes up heads, and let $B$ be the event that at least one of the coinflips comes up heads.

What is $\Pr[A \mid B]$?
What is $\Pr[B \mid A]$?

Remember how we zoom in: $$\Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]}$$

Answers:

\[\Pr[A \cap B] = \frac{1}{2}, \qquad \Pr[A] = \frac{1}{2},\qquad \Pr[B] = \frac{3}{4}\]

Plugging in,

\[\Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]} = \frac{\frac{1}{2}}{\frac{3}{4}} = \frac{2}{3}, \quad \text{ and } \Pr[B \mid A] = \frac{\Pr[A \cap B]}{\Pr[A]} = \frac{\frac{1}{2}}{\frac{1}{2}} = 1.\]

Notice that:

\[\Pr[A \mid B] \neq \Pr[B \mid A]\]

People mistake $\Pr[A \mid B]$ with $\Pr[B \mid A]$ all the time. The terminology is confusingly similar, but often they are very different.

Example: Distracted driving#

Let $A$ be the event that you are driving distracted, and let $B$ be the event that you get in a car accident.

According to these stats from the National Highway Traffic Safety Administration, in recent years about $13\%$ of car accidents involve distracted driving.

How would you phrase this in the language of conditional probability?
What is $\Pr[B \mid A]$, in plain English?
What do you think happens more frequently: distracted driving, or car accidents?
Do you think $\Pr[B \mid A]$ is smaller, larger, or no different than $\Pr[A \mid B]$?

The gray box represents all instances of driving. The orange circle is the instances of distracted driving, the red is drives that result in accidents.

Answers:

\[\Pr[ A \mid B] = 0.13\]
The chance that you have a crash if you are driving distracted.
Most likely distracted driving.
Probably smaller. People drive distracted all the time. If they crashed every time, the roads would be insanely dangerous.

Example: the gateway drug#

Sometimes you’ll hear:

“Marijuana is a gateway drug: 9 out of 10 hard drug addicts tried marijuana first.

$B$ is the event of being a hard drug addict. $A$ is the event of trying marijuana before trying hard drugs.

How would you phrase this in the language of conditional probability?
What is $\Pr[B \mid A]$, in plain English?
Which do you think is more common: people trying marijuana, or hard drug addiction?
Do you think $\Pr[B \mid A]$ is smaller, larger, or no different than $\Pr[A \mid B]$?

Answers:

\[\Pr[ A \mid B] = 9/10\]
The chance of becoming a hard drug addict if you try marijuana.
Most likely trying marijuana.
Probably smaller. Trying marijuana is very common. Hard drug addiction is less common.

Exmaple: OJ Simpson’s trial#

Recall the OJ Simpson trial: the prosecution gave evidence that OJ had abused his wife, the defense argued that only 1/2,500 abused women are murdered by their husbands.

Let’s see how to put this in the language of conditional probabilities.

Let $A$ be the event that a woman is abused, let $M$ be the event that the woman is murdered, and let $G$ be the event that her abuser is guilty of murdering her.

The gray rectangle represents all abused women. The small purple circle is the set of abused women who are murdered, and the red circle is the set of abused women murdered by their partners.

The prosecution gave convincing evidence that in the case of OJ’s wife, the event $A$ had occurred.
The defense argued that only $1/2500$ abused women are murdered by their husband. How would you phrase this using conditional probabilities?

\[\Pr[G \mid A] \le \frac{1}{2500}.\]

But we know that not only the event $A$ occurred, but that also $M$ occurred! When we condition on $M$ and $A$, we have that

\[ \Pr[G \mid A \cap M] = \frac{8}{9}. \]

This is an example where we dramatically underestimate a probability by failing to condition on available information! More on this in the coming lectures.

When conditioning has no impact#

Question: Can you think of events $A,B$, where conditioning on $B$ has no impact on the probability of $A$?

Independent events#

We say $A,B$ are independent events if $\Pr[A \mid B] = \Pr[A]$.

Example: You toss a fair coin twice. Let $A$ be the even that the first toss comes up heads, and let $B$ be the event that the second toss comes up heads.

Intuitively, these events are independent; the fact that $B$ happened gives us no information about whether $A$ happened.

We can also verify that the calculation of conditional probability comes out as we would expect:

\[\Pr[A \mid B] = \frac{\Pr[A \cap B]}{\Pr[B]} = \frac{\frac{1}{4}}{\frac{1}{2}} = \frac{1}{2}\]

Bayes’ rule#

Testing for disease#

Bayes’ rule gives us a way to figure out $\Pr[B \mid A]$ from $\Pr[A \mid B]$.

Example: A doctor orders a test for a patient to detect a rare disease. The test is 95% accurate. The disease affects 1% of the population.

The test comes back positive.

Question: How confident should the doctor be that the patient has the disease, given that the test came back positive?

Confidence vs. test accuracy#

Example: A doctor orders a test for a patient to detect a rare disease. The test is 95% accurate. The disease affects 1% of the population. The test comes back positive. How confident should the doctor be that the patient has the disease?

Answer: For many people, their first impulse is to say the doctor should be 95% confident.

But as we will soon see, this is the common mistake of confusing $\Pr[A \mid B]$ with $\Pr[B \mid A]$.

The blue-red region is true positives. The blue-only region is false positives.

In the language of conditional probability#

The blue-red region is true positives. The blue-only region is false positives.

Let $A$ be the event that the patient has the rare disease. Let $B$ be the event that the test is positive. The test is 95% accurate.

How can we express the accuracy of the test in the language of conditional probabilities?
How can we express our confidence that the patient has the disease, given that the test is positive, in the language of conditional probabilities?
How would you express $\Pr[\overline{A} \mid B]$ in plain English?

We can express the test accuracy by saying that

\[\Pr[B \mid A] = 0.95 \quad \text{and} \quad \Pr[B \mid \overline{A}] = 0.05\]

This is a property of the test, determined previously in clinical trials.

Our confidence that the patient has the disease given that the test was positive is $$\Pr[A \mid B].$$

$\Pr[\overline{A} \mid B]$ is the chance of a false positive.

Why is test accuracy not the same as confidence?#

We know that the test is 95% accurate, in the sense that $\Pr[B \mid A] = 0.95$.

But, taking the test accuracy for our confidence $\Pr[A \mid B]$ ignores the fact that the disease is very rare, affecting only $1\%$ of the population.

In the language of probability, $\Pr[A] = 0.01$.

Consider the following picture:

The disease#

The red region represents the 0.01 fraction of people with the disease.

The test#

The dotted region represents the 0.05 fraction of time that the test is wrong.

The positive test#

The blue region represents the event that the test is positive. The blue-red region is true positives. The blue-only region is false positives.

Even though the test is 95% accurate, the disease is so rare that most of the time when the test is positive (when $B$ occurs), it is actually a false positive.

Bayes’ Rule#

Bayes’ Rule is the following rule for computing conditional probabilities:

\[ \Pr[A \mid B] = \Pr[B \mid A] \cdot \frac{\Pr[A]}{\Pr[B]}. \]

$A venn diagram illustrating Bayes' rule. If we know (how large is as a fraction of ), and we know how large is relative to , then we can figure out . Image credit: Wikipedia.$

So if we know the test accuracy, and we know how rare the disease is, we can decide how confident to be in the positive test result.

Computing the chance of a true positive#

The disease affects $1\%$ of the population, so $\Pr[A] = 0.01$.

The test is $95\%$ accurate, so $\Pr[B \mid A] = 0.95$.

Bayes’ rule gives us that

\[ \Pr[A \mid B] = \Pr[B \mid A] \frac{\Pr[A]}{\Pr[B]} = 0.95 \cdot \frac{0.01}{\Pr[B]}. \]

Figuring out $\Pr[B]$#

We can figure this out from the information we already have. We can use the law of total probability:

\[ \Pr[B] = \Pr[B \cap A] + \Pr[B \cap \overline{A}] \]

And then the definition of conditional probability:

\[ = \Pr[B\mid A]\cdot \Pr[A] + \Pr[B \mid \overline{A}] \cdot \Pr[\overline{A}] \]

And the law of complements:

\[ = \Pr[B\mid A]\cdot \Pr[A] + \Pr[B \mid \overline{A}] \cdot (1-\Pr[A]) \]

And finally, since the test is 95% accurate, $\Pr[B \mid A] = 0.95$ and the probability of a false positive is $\Pr[B \mid \overline{A}] = 0.05$, so we can plug in

\[ = 0.95\cdot 0.01 + 0.05 \cdot 0.99 = 0.059. \]

Final answer#

Knowing that $\Pr[B] = 0.059$, we return to Bayes’ rule, $$ \Pr[A \mid B] = \Pr[B \mid A] \cdot \frac{\Pr[A]}{\Pr[B]} = 0.95 \cdot \frac{0.01}{0.059} \approx 0.16. $$

Takeaway #1: The base rate for the disease matters!#

Because the disease is rare, even though the test is 95% accurate,

we can only be 16% confident that the patient actually has the disease!

The chance of a false positive is $84\%$!

Takeaway #2: Bayes’ rule#

Bayes’ rule let us compute $\Pr[A \mid B]$ from $\Pr[B \mid A]$ without having to think hard about the sample space!

A second example: Steve#

This example is based on a lesson on Bayes Theorem by 3blue1brown.
Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A polite and tidy soul, he has a need for order and structure, and a passion for detail.
Is it more likely that Steve is a librarian or a farmer?

Baseline#

There are roughly 20 times as many farmers as librarians

The sample space of librarians and farmers

\[\mathrm{Pr}[\text{librarian}] = \frac{1}{21}\]

Updating#

The description of Steve would match a higher proportion of librarians than farmers.

\[\mathrm{Pr}[\text{description} \mid \text{librarian}] = \frac{4}{10},\quad \mathrm{Pr}[\text{description}\mid \text{farmer}] = \frac{1}{10}\]

Librarians and farmers

Question: how many librarians match the description? How many farmers match the description?

Bayes’ Rule#

Bayes Rule

\[\Pr[\text{librarian} \mid \text{description}] = \frac{4}{24} = 16.7\% \]

Bayes’ Rule#

\[\begin{align*} \Pr[\text{librarian} \mid \text{description}] &=\Pr[\text{description} \mid \text{librarian}] \frac{\Pr[\text{librarian}]}{\Pr[\text{description}]}\\ & = \frac{4}{10} \cdot \frac{\frac{1}{21}}{\frac{4}{10}\cdot \frac{1}{21} + \frac{1}{10} \cdot \frac{20}{21} } \\ &= \frac{4}{24} \end{align*}\]

Bayes Rule#

The "heart of Bayes' rule"

\[\Pr[A \mid B] = \frac{\Pr[A]\Pr[A\mid B]}{\Pr[B]}= \frac{\Pr[A]\Pr[B \mid A]}{\Pr[A]\Pr[B \mid A]+\Pr[\overline{A}]\Pr[B \mid \overline{A}]}\]

Recap#

Conditional probability
- updating probability to account for new information
- formulas for calculating
$\Pr[A \mid B]$ is not the same thing as $\Pr[B \mid A]$!
Bayes’ Rule

Lecture 13: Conditional Probability

Contents

Lecture 13: Conditional Probability#

Probability and partial information#

Conditional probability#

Example: poker hands#

Example: poker hands#

Example: poker hands#

Conditioning can dramatically change probabilities#

Conditioning as “zooming in”#

Example: Two coinflips. {.nostretch}#

Example: Distracted driving#

Example: the gateway drug#

Exmaple: OJ Simpson’s trial#

When conditioning has no impact#

Independent events#

Bayes’ rule#

Testing for disease#

Confidence vs. test accuracy#

In the language of conditional probability#

Why is test accuracy not the same as confidence?#

The disease#

The test#

The positive test#

Bayes’ Rule#

Computing the chance of a true positive#

Figuring out \(\Pr[B]\)#

Final answer#

Takeaway #1: The base rate for the disease matters!#

Takeaway #2: Bayes’ rule#

A second example: Steve#

Baseline#

Updating#

Bayes’ Rule#

Bayes’ Rule#

Bayes Rule#

Recap#