Frequency as probability
Rules of probability
Reading: Chapters 13 & 14 of Freedman, Pisani and Purves
If you flip a fair coin many times, the long-run proportion of heads will be 50%.
Rolling a fair 6-sided die, will result in a long-run proportion of 1âs of
\[1/6=16 \frac{2}{3}\%.\]
Range of values: Chances are between 0 % and 100 % (i.e. between 0 and 1).
Opposites: The chance of something equals 100 % minus the chance of the opposite thing.
The chance of not getting a 1 when rolling a die is
\[ 1 - \frac{1}{6} = 83.33\% \]
Box #1 (large) :30 blue, 20 red
Box #2 (small): 3 blue, 2 red
If you have to draw a blue marble to win, which box would you choose?
\[\frac{\# \ \text{blue marbles}}{\# \ \text{marbles}} = 60\%\]
What if you draw 5 marbles with replacement?
Without replacement?
\[ 1 - \left(\frac{2}{5}\right)^5 \]
small will be easier to win.small. Why?Suppose our experiment consists of drawing a ticket out of a hat with 20 tickets numbered 1 to 20 in it. We are going to draw 3 tickets.
Describe the hat after each draw if we draw with replacement. What are the possible outcomes of the experiment?
Before and after each draw, the hat has 20 tickets in it. The possible outcomes are triples of numbers from 1 to 20: (1,1,4),(2,3,4), etc.
After the first draw, the hat has 19 tickets in it, after the second 18, and after the third 17. The possible outcomes are all triples of numbers from 1 to 20 but there can be no ties: (1,1,4) is impossible.
Observing some information can change the chances of something.
We already saw this in the marble example. If drawing without replacement, suppose the first draw was red. What are the chances a blue marble is drawn on the second draw?
What if we draw with replacement?
In this examples, we are given that the first draw was red. These chances are conditioned on knowing the first draw was red.
The chance that two things will both happen equals the chance that the first will happen, multiplied by the chance that the second will happen given the first has happened.
Winning is the opposite of losing, so letâs compute the chance of losing when drawing twice, starting with small.
The chances of losing when drawing twice with small are
\[ \frac{2}{5} \cdot \frac{1}{4} = 0.1 \]
large are\[ \frac{20}{50} \cdot \frac{19}{49} = 0.155 \]
âwinning when drawing twice from smallâ is called an event ;
âfirst draw from small is redâ and âsecond draw with replacement from small is blueâ are also events;
We usually write \(P\) for âchancesâ. For an event \(E\)
\[ P(E) = \text{chances $E$ occurs}. \]
\[ P(A \vert B). \]
\[P(A \, \text{and} \, B) = P(A \cap B) = P(A \vert B) * P(B). \]
The chances of something occuring are 100%.
Example: when we drew marbles, the chances we draw a marble whose color is blue or red is 100 %.
In mathematical notation, we often use \(S\) for âsomethingâ or the sample space
\[P(S) = 100\% \qquad (= 1)\]
What is the sample space if the experiment is to draw 3 balls without replacement from the small box?
What if we draw them with replacement?
small without replacement, we will draw a blue ball within the first three draws.\[P(\text{one of the first three balls is blue}) = 100 \%\]
\[\begin{aligned} P(\text{first blue ball is on 1st draw }) &= \frac{3}{5} \\ P(\text{first blue ball is on 2nd draw}) &= \frac{2}{5} \times \frac{3}{4} = \frac{3}{10} \\ P(\text{first blue ball is on 3rd draw}) &= \frac{2}{5} \times \frac{1}{4} = \frac{1}{10} \\ \end{aligned}\]
When can we add probabilities of different events?
When rolling a die, the events \(E_1= \text{roll is 4}\) , \(E_2=\text{roll is 3}\) are mutually exclusive because the result of the roll cannot be 4 and 3 simultaneously.
Therefore
\[ P(\text{roll is 3 or 4}) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}. \]


\[ P(E_1 \ \text{or} \ E_2 \ \text{occurs}) = P(E_1) + P(E_2).\]
\[ P(E_1 \ \text{or} \ E_2 \ \text{or} \ \dots \ \text{or} \ E_n) = \sum_{i=1}^nP(E_i).\]
We often write â\(E_1\) or \(E_2\)â as \(E_1 \cup E_2\) and â\(E_1\) and \(E_2\)â as \(E_1 \cap E_2\).
The events \(E_1, E_2\) are mutually exclusive if \(E_1 \cap E_2\) is empty.
From a Venn diagram, we can deduce the general form of the addition rule
\[P(E_1 \cup E_2) = P(E_1) + P(E_2) - P(E_1 \cap E_2).\]

\[ \begin{aligned} P(E_1 \cup E_2 \cup E_3) &= P(E_1) + P(E_2) + P(E_3) \\ & - P(E_1 \cap E_2) - P(E_1 \cap E_3) - P(E_2 \cap E_3) \\ & + P(E_1 \cap E_2 \cap E_3) \end{aligned} \]

\[ P(E_1 \cup E_2) \leq P(E_1) + P(E_2) \]
\[ P\left(\cup_{i=1}^n E_i \right) \leq \sum_{i=1}^n P(E_i). \]
Google says there is a 40% chance of rain on Friday, a 10% chance of rain on Saturday and a 0% chance of rain on Sunday.
Google may be wrong about these chances! BUT, if Googleâs weather predictions follows the rules of probability we should expect it to say
\[ P(\text{rain on at least one of Friday, Saturday or Sunday}) \leq 50%. \]
Google predicts 100% chance of precipitation on Tuesday, 10% on Wednesday and 80% on Thursday.
There is at most 190% chance of rain on one of Tuesday, Wednesday or Thursday.
BUT, we know there is at most a 100% chance of rain on one of Tuesday, Wednesday or Thursday.
Our previous example used Googleâs estimate of rain.
Their forecast team may or may not be good. They are likely somewhat incorrect.
BUT, their team probably has its own way it computes probabilities.
We used the rules of probability to say something about chances of rain on the weekend.
In tossing a coin once, the sample space is \(\{\text{head, tail}\}\).
We will typically assume \(P(\text{head})=P(\text{tail})=50%\).
There are weighted coins out there, for which \(Q(\text{head})\) might be 55% (so \(Q(\text{tail})\) is 45%). We use the letter \(Q\) because it is a different way of computing chances than \(P\).
What if we didnât know whether a coin is fair or weighted? How might we decide?
\[P(A \vert B)=P(A)\]
\[P(A \cap B) = P(A \vert B) * P(B) = P(A) * P(B).\]
Letâs go back to drawing marbles from a box.
\[\begin{aligned} A &= \text{first draw is red} \\ B &= \text{second draw is blue} \end{aligned}\]
are independent
We can even conclude that the draws are independent in this case.
When drawing without replacement the events \(A\) and \(B\) are dependent. Show this.
We know know that when drawing with replacement the probability of drawing 5 red balls in a row is
\[ \left(\frac{2}{5}\right)^5 \]
When performing an experiment (drawing a sample) where each outcome is equally likely, we can compute probabilities by counting.
Example: when rolling two dice, what is the probability of obtaining a sum of 9?
For such experiments \[P(E) = \frac{\# E}{\# S}\] where \(S\) is the set of all possible outcomes (our sample space).
| Die 1 Die 2 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | (1,1) | (1,2) | (1,3) | (1,4) | (1,5) | (1,6) |
| 2 | (2,1) | (2,2) | (2,3) | (2,4) | (2,5) | (2,6) |
| 3 | (3,1) | (3,2) | (3,3) | (3,4) | (3,5) | (3,6) |
| 4 | (4,1) | (4,2) | (4,3) | (4,4) | (4,5) | (4,6) |
| 5 | (5,1) | (5,2) | (5,3) | (5,4) | (5,5) | (5,6) |
| 6 | (6,1) | (6,2) | (6,3) | (6,4) | (6,5) | (6,6) |
| Die 1 Die 2 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| Die 1 Die 2 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
\[1â\frac{7}{12}=\frac{5}{12}.\]
Formally, the âoppositeâ rule is the rule of complements.
We write the complement of an event \(E\) as \(E^c\)
\[P(\text{not} \, E) = P(E^c).\]
\[P(E^c) = 1 - P(E)\]

\[\begin{aligned} B &= B \cap S \\ &= B \cap (A \cup A^c) \\ &= (B \cap A) \cup (B \cap A^c) \end{aligned}\]
where \(B \cap A\) and \(B \cap A^c\) are mutually exclusive (i.e. disjoint).
\[ P(B) = P(B \cap A) + P(B \cap A^c). \] â
For the small box, if we draw with replacement, what are the chances it will take less than 6 draws to draw 1st blue marble?
If \(E\)={takes less than 6 draws to draw 1st blue marble}, then
\[\begin{aligned} E^c &=\{\text{takes 6 or more draws to draw 1st blue marble}\} \\ &=\{\text{first 5 draws are red}\} \\ \end{aligned}\]
\[ P(\text{first 5 draws are red}) = \left(\frac{2}{5}\right)^5 \]
\[P(\text{takes less than 6 draws to draw 1st blue marble}) = 1 - \left(\frac{2}{5}\right)^5 = 99\%\]
small box suppose we draw without replacement and want to compute\[ P(\text{2nd marble is blue}) \]
\[ \begin{aligned} P(\text{1st marble is red, 2nd marble is blue}) &= \frac{2}{5} * \frac{3}{4} \\ P(\text{1st marble is not red, 2nd marble is blue}) &= P(\text{1st marble is blue, 2nd marble is blue}) \\ \frac{3}{5} * \frac{2}{4} \\ \end{aligned} \]
\[ P(\text{2nd marble is blue}) = \frac{2}{5} * \frac{3}{4} + \frac{3}{5} * \frac{2}{4} = \frac{3}{5} \]
Credited to Reverend Thomas Bayes
Given two events \(A\) and \(B\)
\[\begin{aligned} P(A \vert B) &= \frac{P(B \, \text{and} \, A)}{P(B)} \\ &= \frac{P(A \cap B)}{P(B)} \\ &= \frac{P(B \vert A)\times P(A)}{P(B)} \end{aligned}\]
\[\begin{aligned} P(B) &= P(B \cap A) + P(B \cap A^c) \\ &= P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c) \end{aligned}\]
\[\begin{aligned} P(A \vert B) &= \frac{P(B \vert A) \times P(A)}{P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c) } \\ &= \frac{P(B \cap A)}{P(B \cap A) + P(B \cap A^c)} \\ \end{aligned}\]
small box using Bayesâ rule\[\begin{aligned} A&=\{\text{draw a red marble on first draw}\} \\ B&=\{\text{draw a blue marble on second draw}\} \\ \end{aligned}\]
Compute \(P(A \vert B)\).
What do we know?
\[\begin{aligned} P(A) &= \frac{2}{5} \\ P(B \vert A) &= \frac{3}{4} \\ \end{aligned}\]
We need \[P(B) = P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c).\]
Note that \(A^c=\{\text{draw a blue marble on first draw}\}\).
We know
\[\begin{aligned} P(A^c) &= \frac{3}{5} \\ P(B \vert A^c) &= \frac{1}{2} \\ \end{aligned}\]
\[\begin{aligned} P(B) &= \frac{3}{4} \times \frac{2}{5} + \frac{1}{2} \times \frac{3}{5} = \frac{3}{10} \\ P(A \vert B) &= \frac{ \frac{3}{4} \times \frac{2}{5}}{\frac{3}{4} \times \frac{2}{5} + \frac{1}{2} \times \frac{3}{5}} = \frac{1}{2} \end{aligned}\]
\[ P(B \cap A) + P(B \cap A^c) = \frac{2}{5} * \frac{3}{4} + \frac{3}{5} * \frac{2}{4} \]
\[ P(A|B) = \frac{\frac{2}{5} * \frac{3}{4}}{\frac{2}{5} * \frac{3}{4} + \frac{3}{5} * \frac{2}{4}} \]
Suppose a patient from some population is tested for a disease based on some diagnostic test.
The prevalence of the disease is 0.1% in the population.
If a patient has the disease, the test result is positive with probability 95 %. (True positive )
If a patient does not have the disease, the test result is positive with probability 1 %. (False positive ).
What is the probability a patient has the disease given a positive test result? What if the false positive rate were 0.1%?
\[\begin{aligned} D &= \{\text{patient has disease}\} \\ T^+ &= \{\text{test result is positive}\} \\ \end{aligned}\]
\[\begin{aligned} P(D) &= 0.001 \\ P(T^+ \vert D) &= 0.95 \\ P(T^+ \vert D^c) &= 0.01 \\ \end{aligned}\]
We want to compute \(P(D \vert T^+)\).
By Bayesâ theorem
\[\begin{aligned} P(D \vert T^+) &= \frac{P(T^+ \vert D) \times P(D)}{P(T^+ \vert D) \times P(D) + P(T^+ \vert D^c) \times P(D^c)} \\ &= \frac{0.95 \times 0.001}{0.95 \times 0.001 + 0.01 \times 0.999} \\ &= 8.7 \% \end{aligned}\]
\[\begin{aligned} P(D \vert T^+) &= \frac{0.95 \times 0.001}{0.95 \times 0.001 + 0.001 \times 0.999} \\ &= 48.7 \% \end{aligned}\]