Practice Quizzes#


Week 1 - Ballpark estimates#

There are three practice quizzes below, each with three questions. The prompt is the same for all three of them:

Ballpark estimates: For each of the quantities below, decide whether the statement is reasonable by coming up with a ballpark estimate. You do not need to compute the final number, instead set up the calculation and be sure to explain your reasoning, and estimate the order of magnitude for comparison. You will be graded on how you broke up the problem and whether your reasoning made sense, rather than the accuracy of your estimate.

Practice Quiz #1#

  1. One million cups of coffee are consumed on Stanford Campus on an average Monday.

Solution

I know there are roughly 8,000 undergraduate students, so say there are 10,000 people in total (nearest factor of 10).

Say on average each person consumes about 2 cups of coffee per day.

(10,000 people) x (2 cups / person day)

Rounding to the nearest factors of 10,

(10,000 people ) x (1 cup / person / day) ≈ 10,000 cups.

This is not a reasonable statement.

  1. More than a billion hours of human labor are spent on styling hair in the US each year.

Solution

There are roughly 330 million people in the U.S., the nearest factor of 10 is 10^8 people.

Say about 40% are women old enough to style their hair, and say 40% are men old enough to style their hair. Suppose women spend 20 minutes per day on average and men spend 1 minute per day on average.

Then up to a nearest factor 10, the average person spends 10 minutes per day styling their hair.

There are 365 ~ 100 = 10^2 days per year, and 1/60 ~ .01 = 10^{-2} hours per minute

(10^8 people)×(10 min/day) x (1 hr/60 min)×(365 day/yr)

~ (10^8 people)×(10^1 min/person/day) x (10^{-2} hr/min )×(10^2 day/yr)

~ 10^9 = 1 billion hours/year

This statement is not unreasonable.

  1. You could fit billions of dice into the stats 60 classroom.

Solution

Say the classroom is about 40m by 40m by 7m, volume = (40x40x7 m^3).

That’s about 10^3 m^3.

A standard die is about 2cm per side, volume = (2 cm x 2cm x 2cm) * (.01 m/cm)^3.

That’s about 10^{-6} m^3.

(10^3 m^3) / ( 10^{-6} m^3 )

≈ 10^9 dice

The statement is not unreasonable.

Practice Quiz #2#

  1. You could park at least 10,000 Honda Civics on the quad.

Solution

Say the quad is about 500ft by 500ft, and we’ll round up to ~ (10^3)^2 ft^2

A Honda Civic is roughly 15ft by 6ft, ~ 10^2 ft^2

(10^6 ft^2 in the quad )/(10^2 ft^2 per Civic) ~ 10^4 Honda Civics

The number looks a little high.

  1. A billion hours of human labor each year are spend on correcting typos.

Solution

There are roughly 9 billion people in the world. Say about 5 billion type regularly. That is still about 10^9 people typing.

Say each corrects about 10 typos per day, and each typo takes about 10 seconds to notice and fix.

There are 365 ~ 100 days/year and (1/60)^2 ~ 10^{-4} seconds/hour

( 10^9 people)×(10 typos/day)x(10^2 days/ year)×(10 sec/typo)×(10^{-4} hours/sec)

≈ 10^9 hr/yr

The statement looks reasonable.

  1. No more than 1 million people in the U.S. have the letter “z” in their first name.

Solution

There are about 330 million ~ 10^8 people in the U.S. The letter z is one of 26 letters, it’s reasonable to assume there are about 5 ~ 10^1 letters per name. The letter z is not that common, but say it appears at least 1/100 of the time.

(10^8 names) x (10 letters/name) (10^{-2} z’s per letter) ≈ 10^{7) people

The statement doesn’t seem reasonable, the number looks a bit low.

Practice Quiz 3#

  1. More than 10 million gallons of milk are consumed in the bay area each year.

Solution

Say there are about 10 million = 10^7 people in the Bay Area. Each probably consumes about 1 cups of milk per week.

There are closer to 100 than 10 weeks per year, and about 10 cups/gallon.

(10^7 people) x (1 cups / week) x (10^2 weeks / year) x (10^{-1} gallons/cup) ≈ 10^8 gallons/year

The statement looks reasonable.

  1. I could empty the fountain in white plaza in one day using only a teaspoon.

Solution

The fountain is about 4m x 4m x .5 m, so (4 x 4 x .5 m^3) ~ 10 m^3 of water.

A teaspoon is about 5 mL = 5/(10^2)^3 meters^3 ~ 10^{-6} m^3 of water.

Say I can scoop and dump 1 teaspoon every second, there are 60 x 60 x 24 ~ 10^{5} seconds/day.

(10 m^3 of water) / ((10^{-6} m^3/ teaspoon) x (1 teaspoon / second) x (10^{5} seconds/day)) = 10^2 days.

The statement doesn’t seem plausible, the number of days looks kind of low.

  1. Stanford students collectively buy many thousands of textbooks each year.

Solution There are roughly 8,000 undergraduate students, so say there are ~10^4 undergrads + grad students in total.

Say each student takes about 10 courses per year on average, and about 1 in 2 courses requires a textbook, but only 1/10 textbooks cannot be found for free online.

(10^4 students )×(10 courses/student/yr)×(1 books/course) x (10^{-2} books purchased/required) ≈ 10^3 books / year.

The statement seems a bit unreasonable, our estimate makes it seem like there would only be a couple thousand.

Week 2 - Exploratory data analysis#

Practice quiz #1#

Practice quiz #1 is about a 2013 study that found that bird nests that contained cigarette butts typically contained fewer parasites. This was done by measuring the number of parasites and the weight of cigarette butts in different bird nests.

  1. What are the observational units for this study? What are some relevant variables?

    Solution

    The observational units are individual bird nests.

    The weight of cigarette butts and number of parasites are relevant variables.

  2. What type of visualization would be best to see the relationship between the weight of cigarette butts and the number of nest parasites?

Solution

A scatter plot would be best. This is because the weight of cigarette butts and the number of pests are both quantitative variables, and we want to see the relationship between these two variables.

  1. Below is a histogram of the weight of cigarettes found in different birds nests. Based on the histogram, is the mean or median weight larger? Explain why.

Solution

Based on the histogram, the mean weight will be larger than the median weight. This is because the mean is more sensitive to outliers and so the small number of nests with a large weight will increase the mean more than the median.

Practice quiz #2#

Practice quiz #2 is about a 2021 study that found switch a smartphone to grayscale can reduce self-reported problematic screen use. Half the study participants had their phones put on grayscale for a week. The researchers measured the phone use of participants, and recorded the participants’ responses to questions about their phone use and mental health.

  1. What are the observational units for this study? What are some relevant variables?

Solution

The observational units are the people who participated in the study.

Some relevant variables would be whether the phone was in grayscale, the time spent on their phone during the week and their level of anxiety.

  1. What type of visualization could be used to see the relationship between the participant’s use of grayscale and the time they spent on their phones.

Solution

There are two correct answers:

  • A pair of histograms with one for the group who put their phones in grayscale and one for the group who did not.

  • A bar chart showing the mean amount of time for each group (or the median amount of time for each group).

  1. Below is a histogram of the participants’ anxiety scores at the start of the experiment. The mean anxiety score value is 4.47. Does the mean represent the typical value?>

Solution

No, the mean does not represent the typical anxiety score. The most common anxiety scores are around 0 and 1. Relatively few had scores that were around 4 and 5.

Practice quiz #3#

Practice quiz #3 is about a blog post about the heights of the winners of Wimbledon tennis matches over time. They want to know if the heights of the winners has increased over time.

  1. What are the observational units for this study? What are some relevant variables?

Solution

The observational units are the different Wimbledon tennis tournaments.

Some relevant variables are the height of the tournament champion and the year of the tournament.

  1. What type of visualization could you use to see the trend over time of the height of the Wimbledon champions?

Solution

A line chart would be best. This is because we are plotting a quantitative variable (height) over time.

  1. What are some issues with the following visualization showing the average height of the Wimbledon champions per decade?

Solution

The visualization has the following issues:

  • The 3D perspective distorts the numbers.

  • The y-axis does not start at zero.

  • A line chart would be better than a bar graph for seeing the trend over time.

Week 3 - Variability#

Practice Quiz #1#

  1. I hand you a dataset which shows the weekly section attendance numbers for each of the 5 discussion sections of STATS 60 last quarter. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?

There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.

  1. In the dataset represented by the top histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?

Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).

Histogram of age of heart failure patients in Lanssnig et al, "A novel hybrid modeling approach for the evaluation of integrated care and economic outcome in heart failure treatment"{width=600}

  1. Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

{width=600}

Practice Quiz 1 Solutions
  1. Make a line plot with the week on the x-axis and attendance on the y-axis with one line for each section to see how different section attendances change over time. Make a histogram for each section to see the distribution of how many students attended each section. For summary statistics, compute the mean and median attendance for each section. Also compute the min, max, and mean of all sections combined to detect anomalous weeks.

  2. It looks like the mean is smaller than the median; the distribution looks heavy-tailed on the left, with the highest frequencies on high numbers, and then many entries spread out over low numbers, which usually decreases the mean relative to the median.

  3. The Cholesterol level among female patients exhibits higher variability. The means are pretty close and the standard deviation is a factor 2/3 larger, and also the distance between the 90th and 10th percentiles is larger.

Practice Quiz # 2#

  1. I hand you a dataset which contains the number of points scored by each player in the Golden State Warriors (the local NBA basketball team) in each game of the 2025 season. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?

There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.

  1. In the dataset represented by the histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?

Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).

From Wikipedia.

  1. Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

Practice Quiz 2 Solutions
  1. Make a histogram of points per game for each player to compare the scoring distributions and variability across players. Make a line plot to see points per game for key players to spot trends over the course of the season such as injuries. Make a pie chart to see what fraction of all points were scored by each player. For summary statistics, compute the mean and variance of points scored per player.

  2. It looks like the mean is larger than the median; the distribution looks heavy-tailed on the right, with the highest frequencies on low numbers, and then many entries spread out over high numbers, which usually pulls up the mean relative to the median.

  3. The Warriors points distribution exhibits greater variability; the distance between the median and 10th percentile is almost 30 points, whereas the thunders’ is less than 20. Also, there are more outliers, the distance between the 10th and 90th percentile is slightly larger, and the standard deviation is also slightly larger.

Practice Quiz #3#

  1. I hand you a dataset which shows the maximum and minimum temperature each day in San Francisco over the past year. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?

There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.

  1. In the dataset represented by the histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?

Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).

From Ngwira and Stanley, "Determinants of Low Birth Weight in Malawi: Bayesian Geo-Additive Modelling"

  1. Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

Comparison of global GDP distribution by country in 2020 vs. 2000

Practice Quiz 3 Solutions
  1. Make a line plot of the temperature vs day with one line for the daily maximum and one line for the daily minimum to look for seasonal patterns. Make a histogram of the daily maximum temperatures to get a sense of the variability. For summary statistics, compute the max, min, mean, and median high and low temperatures for each month as well as the year. Then make a line plot with the mean temps for each month with one line for high temps and one line for low temps to see a smoothed plot of seasonal trends.

  2. The mean and median look about the same; the histogram looks like it is roughly balanced around the center.

  3. The variability of the 2020 data is higher. The standard deviation is almost twice as large. The gap between the mean and median is also almost twice as large, which reflects the presence of more outliers.

Week 4 - Probability#

For this week’s quiz you do not need to simplify your answers, but you must explain your reasoning.

Practice Quiz #1#

  1. There is a class with 30 students. Suppose a professor randomly selects one student and then randomly selects a second, different student.

    • What is the size of the sample space? In other words, what is the number of possible outcomes?

    • Give an example of an event for this sample space.

  2. Suppose that I flip 5 coins. Which of the follow sequences of heads (H) and tails (T) is more likely? Why?

    a. HHHHH

    b. HTTHT

  3. Suppose I have a bag with 10 balls labeled 1,2,3,…,10. I draw three balls from the bag without replacement. What is the probability that the labels on the balls are increasing by one (for example the first ball could be 1, the second ball 2, and the third ball 3)? Justify your answer.

Practice Quiz 1 Solutions
  1. The number of possible outcomes is \(30 \times 29\). An example of an event is that both the students are first year students.

  2. The two sequences have the same probability. By the multiplication rule, the total number of possible outcomes is \(2^{5}\).

This means that any particular outcome has probability \(\frac{1}{2^5}\). In particular, these two outcomes are equally likely.

  1. The probability is \(\frac{8}{10 \times 9 \times 8}\).

This is because there are 8 outcomes in the event that the labels on the balls are increasing by one. These outcomes correspond to the starting number which could be any number between 1 and 8.

By the multiplication rule, the total number of possible outcomes is \(10 \times 9 \times 8\). This means that the probability of the event is

$$\mathrm{Pr}[\text{labels increasing by one}] = \frac{8}{10 \times 9 \times 8}$$

Practice Quiz #2#

  1. Suppose that you roll two six sided dice.

    • What is the size of the sample space? In other words, what is the number of possible outcomes?

    • Give an example of an event for this sample space.

  2. Suppose I flip 5 coins. What is the probability that I get at least one heads? Justify your answer.

  3. Suppose you create a new pin by selecting a random number between 0000 and 9999. What is the probability that all the digits are distinct? Justify your answer.

Practice Quiz 2 Solutions
  1. By the multiplication rule, the number of possible outcomes is \(6 \times 6 = 36\). An example of an event is that both die land on 6.

  2. The total number of outcomes is \(2^5\) (by the multiplication rule). The compliment of the event “getting at least one head” is “getting no heads”. The event no heads corresponds to exactly one event. This means that

    \[\mathrm{Pr}[\text{no heads}] = \frac{1}{2^5}\]

    And by the rule of compliments

    \[\mathrm{Pr}[\text{at least one head}] = 1-\frac{1}{2^5}\]
  3. The total number of possible outcomes is \(10,000 = 10^4\). By the multiplication rule, the number of pins with distinct digits is

    \[ 10 \times 9 \times 8 \times 7\]

    The probability that the pin has all distinct digits is therefore

    \[\mathrm{Pr}[\text{all digits distinct}] = \frac{10 \times 9 \times 8 \times 7}{10^4} \]

Practice Quiz #3#

  1. Suppose that you flip three coins.

    • What is the size of the sample space? In other words, what is the number of possible outcomes?

    • Give an example of an event for this sample space.

  2. In a class with 100 people, 60 people think that a hot dog is a type of sandwich. If you randomly picked two people from the class without replacement, what is the probability that neither of them think that a hot dog is a type of sandwich? Justify your answer.

  3. Suppose that I roll a red die and a blue die. What is the probability that the two die show the same number? What is the probability that the red die shows a bigger number than the blue die? Justify your answer.

Practice Quiz 3 Solutions
  1. By the multiplication rule, the number of possible outcomes is \(2 \times 2 \times 2 = 8\). An example of an event is that all the coins land on tails.

  2. By the multiplication rule, the number of possible outcomes is \(100 \times 99\). The number of outcomes where both people think that a hot dog is not a type of sandwich is \(40 \times 39\). The probability that neither of them think that a hot dog is a type of sandwich is

    \[\mathrm{Pr}[\text{neither think a hot dog is a type of sandwich}] = \frac{40 \times 39}{100 \times 99}\]
  3. By the multiplication rule, the total number of possible outcomes is \(6 \times 6 = 36\). There are \(6\) outcomes where the two dice show the same number. Therefore

    \[\mathrm{Pr}[\text{both die show the same number}] =\frac{6}{6 \times 6}=\frac{1}{6}\]

    There are \(5+4+3+2+1=15\) outcomes where the red die shows a bigger number. So

    \[\mathrm{Pr}[\text{red die shows a bigger number}] =\frac{15}{6 \times 6}=\frac{15}{36}\]

Week 5 - Conditional probability#

Practice Quiz #1#

11% of the U.S. population lives in California.

7% of people incarcerated in the United States are Californian.

Let \(C\) be the event that an individual is Californian, and let \(I\) be the event that an individual is incarcerated in the U.S.

  1. Phrase the above statistic in the language of conditional probabilities.

  2. What would you expect to be higher: \(\Pr[C \mid I]\), or \(\Pr[I \mid C]\)? Why?

  3. Identify the flaw in the following statement, and explain the flaw using the language of conditional probabilities:

“Since 7% of incarcerated individuals are Californians, and there are 50 states, Californians are more likely to be incarcerated than citizens of other states!”

Practice Quiz 1 Solutions
  1. \(\Pr[C \mid I] = 0.07\).

  2. The fraction of Californians incarcerated \(\Pr[I \mid C]\) should be much smaller than the fraction of incarcerated people who are Californian, \(\Pr[C \mid I]\). We’d expect \(\Pr[C \mid I]\) to be on the same order as \(\Pr[C]\), and \(\Pr[I \mid C]\) to be on the same order as \(\Pr[I]\); the fraction of people incarcerated, \(\Pr[I]\), should be much smaller than the fraction of Californians.

  3. Even though California is only 1/50 states, it actually contains 11% of the US population, so this argument ignores the base rate of being Californian.

In fact, though we wouldn’t expect you to reproduce the following math on a quiz,

\[\Pr[C] = 0.11 > 0.07= \Pr[C \mid I] = \frac{\Pr[C \cap I]}{\Pr[I]}\]

Which, multiplying both sides by \(\Pr[I]\) and dividing by \(\Pr[C]\), gives

\[\Pr[I] > \frac{\Pr[C \cap I]}{\Pr[C]} = \Pr[I \mid C].\]

So actually, the probability of being incarcerated is lower, conditioned on being Californian.

Practice Quiz # 2#

17% of NBA players are at least 7 ft.

  1. Phrase the statistic above in the language of conditional probabilities.

  2. What do you think is larger, the number of NBA players or the number of people more than 7ft tall?

  3. Identify the flaw in the following statement, and explain the flaw using the language of conditional probabilities:

“Wow, you’re more than 7ft tall! Are you a professional basketball player?”

Practice Quiz 2 Solutions
  1. Let \(H\) be the event of being at least 7ft tall, the \(B\) be the event of being in the NBA. The statistic above says that \(\Pr[H \mid B] = 0.17\).

  2. The number of people over 7ft tall is small, but it is probably much larger than the number of NBA players (a Fermi estimate indicates that there are probably around 20 x 30 NBA players).

  3. This confuses \(\Pr[H \mid B]\) with \(\Pr[B \mid H]\). Even though \(\Pr[H \mid B]\) is large, \(\Pr[B \mid H]\) is still very small, as there are so few professional basketball players.

Practice Quiz #3#

A classroom of 28 students is evenly split between seniors, juniors, sophomores and first-years. There are four English majors in the class; two are juniors and two are first-years.

Choose a student uniformly at random from the class; let \(E\) be the event that the student is an English major, and let \(F\) be the event that the student is a first year.

  1. Describe \(\Pr[E \mid F]\) in plain English.

  2. Compute \(\Pr[E \mid F]\) and \(\Pr[F \mid E]\). You may leave your answer as an unsimplified fraction. Are these quantities equal?

  3. The class takes an “anonymized” survey. One of the questions on the survey is “what is your major?” and another question is “what is your class year?.” Explain the flaw in the following statement by the course instructor using the language of conditional probability:

“The survey is anonymous because there are 7 of you in each year, so even if I know your class year, I only have a 1/7 chance of guessing who you are.”

Practice Quiz 3 Solutions
  1. This is the chance that if you choose a first-year uniformly at random, they will be an English major.

  2. \(\Pr[E \mid F] = \frac{\Pr[E \cap F]}{\Pr[F]} = \frac{2}{7}\), while \(\Pr[F \mid E] = \frac{\Pr[E \cap F]}{\Pr[E]} = \frac{2}{4}\), so \(\Pr[E \mid F]\) is not the same as \(\Pr[F \mid E]\).

  3. The flaw is that the instructor will also know the class year; there are only two English majors in a year, so conditioned on all available information of both major and class year the instructor might have a 1/2 chance of guessing who the student is.

Week 6 - Hypothesis testing#

Practice Quiz #1#

Suppose that the league average for a soccer player scoring a penalty is 78%. A new player just scored 18 out of their last 20 penalty kicks.

You will investigate whether the number of goals scored by this new player is significantly different from the league average.

  1. What are the null and alternative hypotheses? Describe them both in English and in mathematical symbols.

  2. Describe how you would do a simulation to compute a p-value. If the null was true, what would be the “probability of success”? What would be the “number of trials”? What value would you compare the simulated data to?

  3. The p-value for the observed results (18 out of 20 goals) is 0.15. What do you conclude about the null hypothesis?

Practice Quiz 1 Solutions
  1. The null hypothesis is that the new player is just as good at scoring penalties as the league average. The alternative hypothesis is that the new player is better at scoring penalties.

    In symbols, let \(\pi\) be the long run probability of the new player scoring on a penalty. The null hypothesis is \(H_0 : \pi = 0.78\) and the alternative hypothesis is \(H_A : \pi > 0.78\).

    You could also do a two-sided alternative hypothesis \(H_A : \pi \neq 0.78\). In words, the new player has a probability of scoring a penalty that is different from the league average.

  2. A “success” would correspond to the goal being scored. If the new player was just as good as the league average then they would score with probability 0.78. Therefore, the “probability of success” is 0.78.

    The number of trails is 20 (the number of penalties taken).

    The value to compare to is 18 (the number of goals scored in the sample).

  3. Since the p-value is greater than 0.05, we do not have evidence against the null hypothesis that the player is just as good as the league average.

Practice Quiz #2#

In a parking garage, there are three elevators and someone suspects that elevator 3 might be broken and hence less likely to be the elevator that comes down when called. The next 10 times an elevator is called, elevator 3 comes down 0 times.

You will investigate whether the number of times the elevator did not come down is statistically significant.

  1. What are the null and alternative hypotheses? Describe them both in English and in mathematical symbols.

  2. Describe how you would do a simulation to compute a p-value. If the null was true, what would be the “probability of success”? What would be the “number of trials”? What value would you compare the simulated data to?

  3. The p-value for the observed results (elevator 3 comes down 0 out of 10 times) is 0.017. What do you conclude about the null hypothesis?

Practice Quiz 2 Solutions
  1. The null hypothesis is that elevator three is not broken and is as likely to come down as either of the other two elevators. The alternative hypothesis is that elevator three is broken and less likely than the other two elevators to come down.

    In symbols, let \(\pi\) be the long run proportion of times elevator three comes down. The null hypothesis is \(H_0 : \pi = 1/3\) and the alternative hypothesis is \(H_A : \pi < 1/3\).

    You could also do a two-sided alternative hypothesis \(H_A : \pi \neq 1/3\). In words, elevator three comes down more or less often than due to chance.

  2. A “success” would correspond to elevator 3 coming down. If elevator 3 was not broken then the probability that it would come down is \(1/3\). The “probability of success” is therefore \(1/3\).

    The number of trails is 10 (the number times an elevator was called).

    The value to compare to is 0 (the number of times elevator 3 came down in the sample).

  3. Since the p-value is less than 0.05, we have evidence against the null hypothesis that elevator three is not broken.

Practice Quiz #3#

A candy company promises that at least 30% of their chocolate eggs contain a figurine of the fictional character Elsa from Frozen; the rest contain other toys. Suppose you buy 40 chocolate eggs, and only 9 of them contain an Elsa figurine.

You will investigate whether the low number of Elsa figurines is statistically significant.

  1. What are the null and alternative hypotheses? Describe them both in English and in mathematical symbols.

  2. Describe how you would do a simulation to compute a p-value. If the null was true, what would be the “probability of success”? What would be the “number of trials”? What value would you compare the simulated data to?

  3. The p-value for the observed results (an Elsa figurine in 9 of the 40 chocolate eggs) is 0.04. What do you conclude about the null hypothesis?

Practice Quiz 3 Solutions
  1. The null hypothesis is that the company is telling the truth and the chocolate eggs have a 30% chance of containing an Elsa figurine. The alternative hypothesis is that the chocolate eggs have a smaller than 30% chance of containing an Elsa figurine.

    In symbols, let \(\pi\) be the long run proportion chocolate eggs that contain an Elsa figurine. The null hypothesis is \(H_0 : \pi = 0.3\) and the alternative hypothesis is \(H_A : \pi < 0.3\).

    You could also do a two-sided alternative hypothesis \(H_A : \pi \neq 0.3\). In words, the probability of a chocolate egg containing an Elsa figurine is more or less than the 30% advertised by the company.

  2. A “success” would correspond to the chocolate egg containing an Elsa figurine. If the company is telling the truth, then the 30% of the chocolate eggs would contain an Else figurine. The “probability of success” is therefore 0.3

    The number of trails is 40 (the number chocolate eggs bought).

    The value to compare to is 9 (the number of chocolate eggs that contain an Elsa figurine).

  3. Since the p-value is less than 0.05, we have evidence against the null hypothesis that 30% of chocolate eggs contain an Elsa figurine.