Lecture 11: Coincidences#

Announcements

  • Practice quizzes are online.

  • Guest lecture on Friday from Will Hartog on STATS100: Mathematics of Sport.

Recap#

Probability definitions#

  • A random process is something that results in a random outcome.

  • The set of possible outcomes is called the sample space.

  • An event is a collection of some of the possible outcomes.

  • If all outcomes are equally likely, then the probability of an event is equal to the number of outcomes in the event divided by the total number of possible outcomes.

Sample space and an event#

Calculating probabilities#

  • You can use the multiplication rule to calculate the number of outcomes in the sample space or in an event.

  • The complement of an event is the collection of all outcomes not in the original event.

  • The probability of a complement is one minus the probability of the original event.

Multiplication rule#

  • By multiplication rule, the total number of outfits is \(3 \times 2 \times 4 = 24\).

From The Art of Chance

Complements#

  • The probability of a complement is one minus the probability of the original event.

Sampling with and without replacement#

With and without replacement#

Suppose a bag contains ten balls with labels 1 through 10.

a. If you take three balls out of the bag and replace them each time, what is the size of the sample space? b. If you take three balls out of the bag do not replace them, what is the size of the sample space?

a. For drawing with replacement the size of the sample space is \(10 \times 10 \times 10 = 1,000\). b. For drawing without replacement the size of the sample space is \(10 \times 9 \times 8=720\).

With and without replacement#

Suppose a bag contains ten balls with labels 1 through 10. You randomly take out three of the balls.

a. If you replace the balls each time, what is the probability that you don’t draw a 1? b. If you do not replace the balls each time, what is the probability that you don’t draw a 1?

a. The probability is \(9^3/10^3 = 0.729\). b. The probability is \(9 \times 8 \times 7 / (10 \times 9 \times 8 ) = 7/10 = 0.7\).

With and without replacement#

Suppose a bag contains ten balls with labels 1 through 10. You randomly take out ten of the balls.

a. If you replace the balls each time, what is the probability that you don’t draw a 1? b. If you do not replace the balls each time, what is the probability that you don’t draw a 1?

a. The probability is \(9^{10}/10^{10} = 0.35\). b. The probability is \(0\).

With and without replacement#

  • If you select a small fraction of the total items, then it doesn’t really matter if you do with replacement or without replacement.

  • But if you select a large fraction of the total items, then with replacement and without replacement are very different.

  • This applies to drawing balls from a bag, polling, survey sampling, and dealing playing cards.

Exclusive events#

Exclusive events#

  • Two events, A and B, are exclusive if there are no outcomes that are in both A and B.

  • “the die shows 1 or 3” and “the die shows 2” are exclusive.

  • People also say that the events are “mutually exclusive” or “disjoint”.

Exclusive events#

Not exclusive events#

  • Two events, A and B, are not exclusive if there is at least one outcome in both A and B.

  • “the die shows 1 or 2” and “the die shows 2” are not exclusive.

Not exclusive events#

Probability of exclusive events#

  • If the events A and B are exclusive, then probability of “A or B” is equal to the probability of A plus the probability of B.

  • In symbols: $\( \mathrm{Pr}[\text{A or B}] = \mathrm{Pr}[A] + \mathrm{Pr}[B] \)$

  • Some people write “A \(\cup\) B” as shorthand for “A or B”.

  • Important: “or” is inclusive and so “A or B” means A or B or both.

Exclusive events#

  • A and B are two events with

\[ \mathrm{Pr}[\text{A}]=0.7 \text{ and } \mathrm{Pr}[\text{B}] = 0.5 \]
  • Can the events A and B be exclusive?

  • No! If they were exclusive, then the \(\mathrm{Pr}[\text{A or B}]\) would be bigger than 1:

  • \(\mathrm{Pr}[\text{A or B}] = \mathrm{Pr}[A] + \mathrm{Pr}[B] = 1.2\)

  • Only add probabilities if events are exclusive!

The Monty Hall Problem#

Let’s Make a Deal#

  • Let’s Make a Deal was a popular game show in the 1960s and 1970s.

  • The show was hosted by Monty Hall.

Let’s Make a Deal#

  • Monty offers the contestant a choice between three different doors.

  • The contestant gets to keep what is behind their chosen door.

  • Behind one is a car, behind the other two doors are goats.

Monty Hall Problem#

  • After the contestant picks a door, Monty opens one of the other doors and shows that it contains a goat.

  • Monty then offers the contestant the option to switch to the other door.

  • Should the contestant switch or stay with the original door?

Monty Hall: solution#

  • It seems like the contestant has a \(1/2\) chance of winning is they switch or stay (there are only two doors).

  • But they actually have a \(2/3\) chance of winning if they switch.

Monty Hall: solution#

  • By symmetry, we can assume that the contestant always picks door 1.

  • The sample space then contains three possible outcomes:

Monty Hall: switch#

  • If the car is behind door 1, then Monty will open one of the other doors and switching will lose.

  • If a goat is behind door 1, then Monty will have to open the other door with a goat and switching will win.

  • \(\mathrm{Pr}[\text{win car}] = 2/3\).

Monty Hall: stay#

  • If the car is behind door 1, then Monty will open one of the other doors and staying will win.

  • If a goat is behind door 1, then Monty will have to open the other door with a goat and staying will lose.

  • \(\mathrm{Pr}[\text{win car}] = 1/3\).

Coincidences#

Birthday problem#

  • Question What do you think is the probability that two people in this room share a birthday?

  • A calendar is coming around the room. When the calendar gets to you, circle your birthday.

  • If your birthday is circled, interrupt whatever we are doing, stand up, and announce that we have a match!

  • Meanwhile, let’s calculate the probability.

Birthday problem: calculation#

  • We want to calculate the probability:

    \[\mathrm{Pr}( \text{2 or more people share a birthday})\]
  • What is the complement of “at least two people share a birthday”?

    • “no one shares a birthday”.

    • Or equivalently “everyone has a different birthday”.

Birthday problem: calculation#

  • What is the probability that two people have different birthdays?

  • \(\mathrm{Pr}[\text{2 people have different birthdays}] = \frac{365 \times 364}{365 \times 365}\)

  • What is the probability that \(n\) people have different birthdays?

\[\frac{365 \times 364 \times 363 \times \cdots \times (365-n+1)}{365^n}\]

Birthday problem#

  • Using a calculator with \(n=85\) (approximate class size):

\[1-\frac{365 \times 364 \times 363 \times \cdots \times 281}{365^{85}} = 0.999976\ldots \]
  • Even for a smaller class (\(n=40\)) this is still close to 1:

\[1-\frac{365 \times 364 \times 363 \times \cdots \times 326}{365^{40}} = 0.891\]

What Assumptions Did We Make?#

  • Birthdays are equally likely to fall on any of the 365 days.

    • This is not true. More babies are born in Aug than Feb.

    • But the probability of a match is even higher if the birthdays are not equally likely.

  • Birthdays are independent, meaning that no person’s birthday gives you information about another person’s birthday.

Intuition for the Birthday Problem#

  • In a room with 40 people, there are \(\frac{40 \times 39}{2} = 780\) pairs.

  • The probability that a particular pair has a match is \(1/365\).

  • The probability of each coincidence may be small (\(1/365\)), but there are many opportunities for a coincidence (the \(780\) pairs).

What is probability that someone has your birthday?#

  • The complement of “someone shares your birthday” is “everyone has a different birthday than yours.”

  • The probability of “everyone has a different birthday than yours” is \((364/365)^{n-1}\) (why \(n-1\)?)

  • So the probability someone has your birthday is

    \[\mathrm{Pr}[\text{someone has your birthday}]=1-\left(\frac{364}{365}\right)^{n-1} \]

What is the probability that someone has your birthday?#

\[\mathrm{Pr}[\text{someone has your birthday}]=1-\left(\frac{364}{365}\right)^{n-1} \]
  • For \(n=85\) this is around \(0.21\).

  • Conclusion: If someone else has your birthday, it kind of is a surprising coincidence. But if some pair of people happen to have the same birthday, it’s not that surprising.

Coincidences#

  • Diaconis and Mosteller (1989) define a coincidence as “a surprising concurrence of events, perceived as meaningfully related, with no apparent causal connection”.

  • Example: People in this room sharing a birthday.

Probability and coincidences#

  • Probability can help us study if coincidences are unsurprising or potentially meaningful.

  • Example: The probability of two people sharing a birthday is quite high, so we should not be surprised by this coincidence.

Streaks#

Streakiness#

  • In sports, streakiness refers to the cases when players or teams seem to experience long runs of success (or failure).

    • Should we be surprised if a team wins many matches in a row?

    • Should we be surprised if a player makes many shots in a row?

Streaks in basketball#

  • Suppose every NBA player is equally good.

  • Assume that the probability that a player makes a shot is \(p=0.47\) (the 2023 average scoring probability).

  • Let \(n\) be the number basketball players.

  • What is the probability that at least one basketball player scores all \(k\) of their next shots?

Streaks in basketball#

  • Use the complement rule.

  • The complement of “at least one basketball player scores all \(k\) of their next shots” is “every basketball player misses one of their next \(k\) shots.”

  • The probability that a single player misses one of their next \(k\) shots is \(\mathrm{Pr}[\text{misses 1 or more shots}] = 1-p^k\).

  • The probability that every basketball players misses one of their next \(k\) shots is \((1-p^k)^n\).

Streaks in basketball#

  • The probability that at least one basketball player gets a length \(k\) streak is

    \[\mathrm{Pr}[\text{at least one length $k$ streak}]=1-(1-p^k)^n \]
  • Using a calculator with \(n=450\) and \(p=0.47\), we get:

    • For \(k=8\), the probability is \(0.66\).

    • For \(k=9\), the probability is \(0.39\).

  • We expect at least one player to get a pretty long streak!

Phone numbers#

Phone numbers#

  • Suppose phone numbers are chosen by choosing a random sequence of \(7\) digits drawn from the collection 0,1,…,8,9.

  • Question: Which of the following phone numbers are you more likely to be assigned? a. 358-6049 b. 111-1111

  • Answer: Each outcome has probability \(\frac{1}{10^7}\), so they are equally likely.

Phone numbers#

  • Why does the number 111-1111 feel like a bigger coincidence than the number 358-6049?

  • We see these numbers and identify them immediately with patterns.

    • 111-1111 represents the pattern “all digits are the same”

    • 358-6049 represents the pattern “all digits are different”

Digits#

  • What is the probability of getting a number like 111-1111 with all digits the same?

  • Answer: \(\frac{10}{10^7}=\frac{1}{10^6}\) (one in a million).

  • What is the probability of getting a number like 358-6049 with all digits different?

  • The number of phone number with all digits different is:

    \[10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4\]
  • The probability is this number divided by \(10^7\) (about 0.06).

Roger Federer#

Roger Federer’s Commencement speech#

  • In his commencement speech at Dartmouth, Roger Federer made the following claim:

    • He won 80% of his tennis matches.

    • But only 54% of his points.

Roger Federer’s claim#

  • Two interpretations of Federer’s stats:

    1. He was winning the points he really had to win.

    2. Having a small, consistent edge (4 percentage points) can really pay off.

  • We can use probability and simulation to study the second explanation.

Tennis scoring#

  • A game is won by the first player to have at least 4 points and a margin of 2 points.

  • A set is won by the first player to have win 6 games and have a margin of 2 games won. If both players win 6 games, then a tie-breaker decides to the winner of the set.

  • A match is won by the first player to win 2 sets (or 3 in Grand Slam).

Tennis simulation#

  • Suppose that in every point Roger Federer’s win probability is \(p=0.54\).

  • Then the probability Roger Federer wins a game is 60%.

  • The probability Roger Federer wins a set is 77%.

  • The probability Roger Federer wins a match is 87%.

  • This is even higher than his actual match win percentage!

  • What is the simulation missing?

Tennis serving#

  • Tennis players are much more likely to win a point when they are serving.

  • Roger Federer won around 70% of the points he served but only around 40% of the points he returned (source).

  • In a match, the two players alternate serving for each game.

Independence#

  • The simulation assumed that result of the current point has no effect on the result of the next point.

  • This is an assumption of independence which is not true of real tennis matches.

  • You will learn more about independence next week when discussion conditional probability.