Lecture 10: Probability#

STATS60 so far#

  • Unit 1: Thinking About Scale.

    • Putting numbers in context.

    • Fermi estimates.

    • Cost benefit analysis.

  • Unit 2: Exploratory Data Analysis.

    • Data terminology.

    • Data visualization.

    • Summaries of center (mean and median).

    • Summaries of variability (standard deviation, quantiles).

    • Summaries of association (correlation coefficient).

Looking ahead#

  • Unit 3: Probability.

    • The mathematics of uncertainty.

    • One of the foundations of statistics.

    • Probability will help us:

      • Assess how likely/unlikely coincidences are.

      • Make decisions when we don’t know all the relevant information.

      • Generalize findings from data to a broader group of people.

Probability#

Games of chance#

  • Probability theory started with people trying to understand games of chance.

  • The Book on Games of Chance is perhaps the first mathematical text on probability and contain a section on how to use probability to cheat!

  • The book’s author, Cardano, had trouble holding down an academic position and made money by gambling and playing chess.

Coins#

  • Tossing a coin is a simple game of chance.

  • What is the probability that the coin lands with heads facing up? Why?

Coins#

  • The probability is \(1/2\) or \(50\%\).

  • Two possible reasons why:

    1. There are two possible outcomes that are equally likely. The outcome of “heads facing up” has probability \(\frac{1}{2}\).

    2. If the coin was flipped many, many times, then the coin will land with heads facing up in roughly \(50\%\) of the times.

  • We can ask a computer to simulate flipping many coins and count the number of heads.

Number of tosses

10

100

1 000

10 000

Number of heads

7

46

534

5003

Fraction of heads

0.7

0.46

0.534

0.5003

Flipping many coins#

  • As more and more coins are flipped, the fraction of heads “settles down” near \(1/2\).

Dice#

  • A die (plural, “dice”) is a cube with six faces labelled 1,2,3,4,5,6.

  • When a die is rolled, the number on the top face “shows”.

  • What is the probability that when a die is rolled it shows a 1? Why?

Dice#

  • The probability is \(1/6\) or approximately \(16.7\%\).

  • Again there are two reasons why:

    1. There are six possible outcomes (one for each face) and they are all equally likely. The probability the die shows a 1 is therefore \(1/6\).

    2. If the die was rolled over and over again, then a 1 will show in about \(1/6\) of the times.

  • Again we can get the computer to throw many dice and count the number of 1’s.

Number of throws

10

100

1 000

10 000

Number of 1s

1

16

189

1759

Fraction of 1s

0.1

0.16

0.189

0.1759

Throwing many dice#

  • The fraction of times settles down at \(1/6\).

Dice again#

  • What is the probability that the die shows a 1 or shows a 2?

  • The probability is \(1/3\) or roughly \(33.3\%\).

    1. There are six possible outcomes (one for each face) and they are all equally likely. There is one outcome where the die shows a 1 and one outcome where the die shows a 2. The probability is therefore \(2/6=1/3\).

    2. If the die was rolled over and over again, then a 1 will be the top face in about \(1/6\) of the times and 2 will be on the top face in about \(1/6\) of the times. This means that 1 or 2 will be on the two face happens in about \(1/6+1/6=1/3\) of the times.

Number of throws

10

100

1 000

10 000

Number of 1s or 2s

4

38

335

3335

Fraction of 1s or 2s

0.4

0.38

0.335

0.3335

Throwing many dice#

  • The fraction of times settles down at \(1/3\).

Definition of probability#

Random process and sample space#

  • A random process is something that results in a random outcome.

  • The set of possible outcomes is called the sample space.

  • Examples:

    • Flipping a coin is a random process. The sample space contains two possible outcomes “Heads” and “Tails”.

    • Rolling a die is a random process. The sample space contains six possible outcomes: 1, 2, 3, 4, 5, 6.

Events#

  • A random process is something that results in a random outcome.

  • The set of all possible outcomes is called the sample space.

  • An event is a collection of some of the possible outcomes.

  • Examples:

    • Rolling a die: the die shows 1 or 2 is an event (it contains two possible outcomes).

    • Tossing a coin: the coin landing on heads (this an event that just contains a single outcome).

Probability#

  • A random process is something that results in a random outcome.

  • The set of all possible outcomes is called the sample space.

  • An event is a collection of some of the possible outcomes.

  • If all outcomes are equally likely, then the probability of an event is equal to the number of outcomes in the event divided by the total number of possible outcomes.

  • The probability of an event is written as \(\mathrm{Pr}[\text{event}]\).

\[\mathrm{Pr}[\text{event}] = \frac{\text{number of outcomes in event}}{\text{total number of possible outcomes}}\]

Diagram - sample space#

  • The sample space is the set of all possible outcomes.

Diagram - event#

  • An event is a collection of some of the possible outcomes.

Die example#

  • Rolling a die example:

    • There are 6 total possible outcomes.

    • The event “die shows 1 or 2” contains two outcomes.

    • The probability is \(2/6=1/3\).

\[\mathrm{Pr}[\text{event}] = \frac{\text{number of outcomes in event}}{\text{total number of possible outcomes}}\]
\[\mathrm{Pr}[\text{die shows 1 or 2}] = \frac{2}{6} = \frac{1}{3} \]

Calculating probabilities#

There are two main methods for calculating the probability of an event:

  1. Direct: count the number of outcomes in the event and divide by the total number of outcomes.

  2. Simulation: repeat the random process many times and compute the fraction of times when the event occurs.

  • These two methods will give you the same answer!

  • We will use both methods to calculate probabilities.

  • For simulations, there are a few options:

    • You can use websites that do probability simulations.

    • You can do a tactile simulation (flip real coins/roll real dice).

    • We will give you the output of a computer simulation.

Examples#

Balls in a bag#

  • You are playing a game where you draw a single ball from a bag that contains a mix of white and red balls.

  • You win a dollar if you draw a red ball.

  • Which of the two bags would you prefer to use? Why?

    • Bag A: 2 white balls and 3 red balls.

    • Bag B: 20 white balls and 30 red balls.

Balls in a bag#

  • Both bags have the same chance of winning.

  • For bag A:

\[\mathrm{Pr}[\text{draw a red ball}] = \frac{3}{2+3} =\frac{3}{5}= 0.6 \]
  • For bag B:

\[\mathrm{Pr}[\text{draw a red ball}] = \frac{30}{20+30} =\frac{30}{50}= 0.6 \]
  • In probability, the relative number of outcomes is what matters!

Random outfits#

  • Suppose that each day you pick a random outfit by:

    • First picking one of 3 pairs of shoes.

    • Then picking one of 2 pairs of pants.

  • What is the total number of possible outfits?

Random outfits#

  • The total number of possible outfits is \(3 \times 2 = 6\).

  • This can be visualized with a “decision tree.”

From The Art of Chance

Multiplication rule#

  • Consider a compound random process that consists of two smaller random process: first process \(A\) and then process \(B\).

  • Suppose that process \(A\) has \(a\) possible outcomes, and for each of these outcomes, process \(B\) has \(b\) possible outcomes.

  • Then the compound process has \(a \times b\) possible outcomes.

  • For example:

    • There were \(3\) possible outcomes for the choice of shoes (random process \(A\)).

    • For each of these outcomes, there were \(2\) possible outcomes for the choice of pants (random process \(B\)).

    • The total number of shoes and pants combinations is \(3 \times 2=6\) (compound process).

Random outfits again#

  • The multiplication rule can also be used when there are more than two processes.

  • Suppose that each day you pick a random outfit by:

    • First picking one of 3 pairs of shoes.

    • Then picking one of 2 pairs of pants.

    • Then picking one of 4 possible shirts.

  • What is the total number of possible outfits?

Random outfits#

  • The multiplication rule says that the total number of outfits is \(3 \times 2 \times 4 = 24\).

From The Art of Chance

Random outfits#

  • What is the probability that the person wears flip-flops and a long sleeved shirt?

From The Art of Chance
  • There are \(1 \times 2 \times 2 = 4\) outcomes that correspond to flip-flops and a long sleeved shirt. The probability is \(\mathrm{Pr}[\text{flip-flops and long sleeved shirt}] = \frac{4}{24} = \frac{1}{6}\).

Polling#

  • In the course survey there were:

    • 45 student who think we don’t live in a simulation.

    • 15 students who think we might live in a simulation.

    • 8 students who think we do live a simulation.

  • Suppose I select one of those students at random. What is the probability that they think we live in a simulation?

\[\mathrm{Pr}[\text{Student thinks we live in a simulation}] = \frac{8}{45+15+8} = \frac{8}{68} = 0.118 \]

Polling two students#

  • Now suppose that I pick two different students. What is the probability that they both think we live in a simulation?

  • By the multiplication rule: the total number of possible outcomes is \(68 \times 67\).

  • There are \(8 \times 7\) outcomes where both students think we live in a simulation. $\(\mathrm{Pr}[\text{Both think we live in a simulation}] = \frac{8 \times 7}{68 \times 67} = 0.012\)$

Birthdays#

  • On Wednesday, we will compute the probability that two people in class has the same birthday.

  • Today, we will compute the probability of some simpler events related to birthdays.

  • What is the probability that a randomly selected person has their birthday today (April 20)?

  • What assumptions do you need to make?

Birthdays#

  • Assuming that all birthdays are equally likely and that we can ignore leap years, then the probability is: $\(\mathrm{Pr}[\text{Birthday on April 20}] = \frac{1}{365} = 0.002740 \)$

  • If we account for leap years and still assume that all birthdays are equally likely:

\[\mathrm{Pr}[\text{Birthday on April 20}] = \frac{4}{4\times 365 + 1} = 0.002738 \]
  • Is it reasonable to assume that all birthdays are equally likely? How could we make our answer more realistic?

Birthdays#

  • What is the probability that a randomly chosen person has their birthday on the 20th day of the month?

\[\mathrm{Pr}[\text{Birthday on the 20th}]=\frac{12}{365} = 0.0329\]

Complements#

Complements#

  • Remember: an event is a collection of possible outcomes.

  • The complement of an event is the collection of all outcomes not in the original event.

  • For the event “the die shows 1 or 2”, the complement is “the die shows 3, 4, 5 or 6”.

  • You can think of the complement event as the opposite event.

Sample space and complements#

Probability of complements#

  • Calculate the probabilities of the following events. What do you notice?

    • The die shows 1 or 2.

    • The die shows 3, 4, 5, or 6.

\[\mathrm{Pr}[\text{die shows 1 or 2}] = \frac{2}{6}=\frac{1}{3}\]
\[\mathrm{Pr}[\text{die shows 3, 4, 5 or 6}] = \frac{4}{6}=\frac{2}{3}\]
\[\mathrm{Pr}[\text{die shows 3, 4, 5 or 6}] = 1-\mathrm{Pr}[\text{die shows 1 or 2}]\]
  • The probability of the complement is always 1 minus the probability of the original event.

Birthdays again#

  • What is the complement of the event “a randomly chosen person has their birthday on April 20”?

  • What is the probability of this event?

  • The complement is “a randomly chosen person has their birthday on a day other than April 20.”

  • The probability is

\[\mathrm{Pr}[\text{birthday not on April 20}] = \frac{364}{365} = 0.99726\]

Birthdays again#

  • Suppose now that we randomly choose three people.

  • What is the complement of the event “at least one person has their birthday on April 20”?

  • Both of these are correct:

    1. None of the three people have their birthday on April 20.

    2. All three people have their birthday on a day other than April 20.

\[\mathrm{Pr}[\text{at least one April 20 birthday}] = 1-\mathrm{Pr}[\text{all 3 birthdays not on April 20}]\]
\[\mathrm{Pr}[\text{all 3 birthdays not on April 20}] = \frac{364 \times 364 \times 364}{365 \times 365 \times 365} = \left(\frac{364}{365}\right)^3 = 0.991 \]
\[\mathrm{Pr}[\text{at least one April 20 birthday}] = 1-0.991=0.009 \]

Birthdays again#

  • What is the complement of “at least one person in this class has their birthday today?”

  • Both of these are correct:

    1. No one in the class has their birthday on April 20.

    2. Everyone in the class has their birthday on a day other than April 20.

Roughly 120 people come to class. $\(\mathrm{Pr}[\text{all birthdays in class not on April 20}] = \left(\frac{364}{365}\right)^{120} = 0.72 \)$

\[\mathrm{Pr}[\text{at least one birthday on April 20 in class}] = 1-0.72=0.28 \]
  • Take-away: although it is unlikely that a specific person has their birthday today, there is a decent chance that in a large group it is someone’s birthday.

Probability recap#

  • A random process is something that results in a random outcome.

  • The set of possible outcomes is called the sample space.

  • An event is a collection of some of the possible outcomes.

  • If all outcomes are equally likely, then the probability of an event is equal to the number of outcomes in the event divided by the total number of possible outcomes.

  • You can use the multiplication rule to calculate the number of outcomes in the sample space or event.

  • The complement of an event is the collection of all outcomes not in the original event.

  • The probability of a complement is one minus the probability of the original event.