Practice Quizzes#
Week 1 - Ballpark estimates#
There are three practice quizzes below, each with three questions. The prompt is the same for all three of them:
Ballpark estimates: For each of the quantities below, decide whether the statement is reasonable by coming up with a ballpark estimate. You do not need to compute the final number, instead set up the calculation and be sure to explain your reasoning, and estimate the order of magnitude for comparison. You will be graded on how you broke up the problem and whether your reasoning made sense, rather than the accuracy of your estimate.
Practice Quiz #1#
One million cups of coffee are consumed on Stanford Campus on an average Monday.
Solution
I know there are roughly 8,000 undergraduate students, so say there are 10,000 people in total (nearest factor of 10).
Say on average each person consumes about 2 cups of coffee per day.
(10,000 people) x (2 cups / person day)
Rounding to the nearest factors of 10,
(10,000 people ) x (1 cup / person / day) ≈ 10,000 cups.
This is not a reasonable statement.
More than a billion hours of human labor are spent on styling hair in the US each year.
Solution
There are roughly 330 million people in the U.S., the nearest factor of 10 is 10^8 people.
Say about 40% are women old enough to style their hair, and say 40% are men old enough to style their hair. Suppose women spend 20 minutes per day on average and men spend 1 minute per day on average.
Then up to a nearest factor 10, the average person spends 10 minutes per day styling their hair.
There are 365 ~ 100 = 10^2 days per year, and 1/60 ~ .01 = 10^{-2} hours per minute
(10^8 people)×(10 min/day) x (1 hr/60 min)×(365 day/yr)
~ (10^8 people)×(10^1 min/person/day) x (10^{-2} hr/min )×(10^2 day/yr)
~ 10^9 = 1 billion hours/year
This statement is not unreasonable.
You could fit billions of dice into the stats 60 classroom.
Solution
Say the classroom is about 40m by 40m by 7m, volume = (40x40x7 m^3).
That’s about 10^3 m^3.
A standard die is about 2cm per side, volume = (2 cm x 2cm x 2cm) * (.01 m/cm)^3.
That’s about 10^{-6} m^3.
(10^3 m^3) / ( 10^{-6} m^3 )
≈ 10^9 dice
The statement is not unreasonable.
Practice Quiz #2#
You could park at least 10,000 Honda Civics on the quad.
Solution
Say the quad is about 500ft by 500ft, and we’ll round up to ~ (10^3)^2 ft^2
A Honda Civic is roughly 15ft by 6ft, ~ 10^2 ft^2
(10^6 ft^2 in the quad )/(10^2 ft^2 per Civic) ~ 10^4 Honda Civics
The number looks a little high.
A billion hours of human labor each year are spend on correcting typos.
Solution
There are roughly 9 billion people in the world. Say about 5 billion type regularly. That is still about 10^9 people typing.
Say each corrects about 10 typos per day, and each typo takes about 10 seconds to notice and fix.
There are 365 ~ 100 days/year and (1/60)^2 ~ 10^{-4} seconds/hour
( 10^9 people)×(10 typos/day)x(10^2 days/ year)×(10 sec/typo)×(10^{-4} hours/sec)
≈ 10^9 hr/yr
The statement looks reasonable.
No more than 1 million people in the U.S. have the letter “z” in their first name.
Solution
There are about 330 million ~ 10^8 people in the U.S. The letter z is one of 26 letters, it’s reasonable to assume there are about 5 ~ 10^1 letters per name. The letter z is not that common, but say it appears at least 1/100 of the time.
(10^8 names) x (10 letters/name) (10^{-2} z’s per letter) ≈ 10^{7) people
The statement doesn’t seem reasonable, the number looks a bit low.
Practice Quiz 3#
More than 10 million gallons of milk are consumed in the bay area each year.
Solution
Say there are about 10 million = 10^7 people in the Bay Area. Each probably consumes about 1 cups of milk per week.
There are closer to 100 than 10 weeks per year, and about 10 cups/gallon.
(10^7 people) x (1 cups / week) x (10^2 weeks / year) x (10^{-1} gallons/cup) ≈ 10^8 gallons/year
The statement looks reasonable.
I could empty the fountain in white plaza in one day using only a teaspoon.
Solution
The fountain is about 4m x 4m x .5 m, so (4 x 4 x .5 m^3) ~ 10 m^3 of water.
A teaspoon is about 5 mL = 5/(10^2)^3 meters^3 ~ 10^{-6} m^3 of water.
Say I can scoop and dump 1 teaspoon every second, there are 60 x 60 x 24 ~ 10^{5} seconds/day.
(10 m^3 of water) / ((10^{-6} m^3/ teaspoon) x (1 teaspoon / second) x (10^{5} seconds/day)) = 10^2 days.
The statement doesn’t seem plausible, the number of days looks kind of low.
Stanford students collectively buy many thousands of textbooks each year.
Solution
There are roughly 8,000 undergraduate students, so say there are ~10^4 undergrads + grad students in total.Say each student takes about 10 courses per year on average, and about 1 in 2 courses requires a textbook, but only 1/10 textbooks cannot be found for free online.
(10^4 students )×(10 courses/student/yr)×(1 books/course) x (10^{-2} books purchased/required) ≈ 10^3 books / year.
The statement seems a bit unreasonable, our estimate makes it seem like there would only be a couple thousand.
Week 2 - Exploratory data analysis#
Practice quiz #1#
Practice quiz #1 is about a 2013 study that found that bird nests that contained cigarette butts typically contained fewer parasites. This was done by measuring the number of parasites and the weight of cigarette butts in different bird nests.
What are the observational units for this study? What are some relevant variables?
Solution
The observational units are individual bird nests.
The weight of cigarette butts and number of parasites are relevant variables.
What type of visualization would be best to see the relationship between the weight of cigarette butts and the number of nest parasites?
Solution
A scatter plot would be best. This is because the weight of cigarette butts and the number of pests are both quantitative variables, and we want to see the relationship between these two variables.
Below is a histogram of the weight of cigarettes found in different birds nests. Based on the histogram, is the mean or median weight larger? Explain why.

Solution
Based on the histogram, the mean weight will be larger than the median weight. This is because the mean is more sensitive to outliers and so the small number of nests with a large weight will increase the mean more than the median.
Practice quiz #2#
Practice quiz #2 is about a 2021 study that found switch a smartphone to grayscale can reduce self-reported problematic screen use. Half the study participants had their phones put on grayscale for a week. The researchers measured the phone use of participants, and recorded the participants’ responses to questions about their phone use and mental health.
What are the observational units for this study? What are some relevant variables?
Solution
The observational units are the people who participated in the study.
Some relevant variables would be whether the phone was in grayscale, the time spent on their phone during the week and their level of anxiety.
What type of visualization could be used to see the relationship between the participant’s use of grayscale and the time they spent on their phones.
Solution
There are two correct answers:
A pair of histograms with one for the group who put their phones in grayscale and one for the group who did not.
A bar chart showing the mean amount of time for each group (or the median amount of time for each group).
Below is a histogram of the participants’ anxiety scores at the start of the experiment. The mean anxiety score value is 4.47. Does the mean represent the typical value?>

Solution
No, the mean does not represent the typical anxiety score. The most common anxiety scores are around 0 and 1. Relatively few had scores that were around 4 and 5.
Practice quiz #3#
Practice quiz #3 is about a blog post about the heights of the winners of Wimbledon tennis matches over time. They want to know if the heights of the winners has increased over time.
What are the observational units for this study? What are some relevant variables?
Solution
The observational units are the different Wimbledon tennis tournaments.
Some relevant variables are the height of the tournament champion and the year of the tournament.
What type of visualization could you use to see the trend over time of the height of the Wimbledon champions?
Solution
A line chart would be best. This is because we are plotting a quantitative variable (height) over time.
What are some issues with the following visualization showing the average height of the Wimbledon champions per decade?

Solution
The visualization has the following issues:
The 3D perspective distorts the numbers.
The y-axis does not start at zero.
A line chart would be better than a bar graph for seeing the trend over time.
Week 3 - Variability#
Practice Quiz #1#
I hand you a dataset which shows the weekly section attendance numbers for each of the 5 discussion sections of STATS 60 last quarter. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?
There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.
In the dataset represented by the top histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?
Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).

Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

Practice Quiz 1 Solutions
Make a line plot with the week on the x-axis and attendance on the y-axis with one line for each section to see how different section attendances change over time. Make a histogram for each section to see the distribution of how many students attended each section. For summary statistics, compute the mean and median attendance for each section. Also compute the min, max, and mean of all sections combined to detect anomalous weeks.
It looks like the mean is smaller than the median; the distribution looks heavy-tailed on the left, with the highest frequencies on high numbers, and then many entries spread out over low numbers, which usually decreases the mean relative to the median.
The Cholesterol level among female patients exhibits higher variability. The means are pretty close and the standard deviation is a factor 2/3 larger, and also the distance between the 90th and 10th percentiles is larger.
Practice Quiz # 2#
I hand you a dataset which contains the number of points scored by each player in the Golden State Warriors (the local NBA basketball team) in each game of the 2025 season. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?
There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.
In the dataset represented by the histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?
Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).
![]()
Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

Practice Quiz 2 Solutions
Make a histogram of points per game for each player to compare the scoring distributions and variability across players. Make a line plot to see points per game for key players to spot trends over the course of the season such as injuries. Make a pie chart to see what fraction of all points were scored by each player. For summary statistics, compute the mean and variance of points scored per player.
It looks like the mean is larger than the median; the distribution looks heavy-tailed on the right, with the highest frequencies on low numbers, and then many entries spread out over high numbers, which usually pulls up the mean relative to the median.
The Warriors points distribution exhibits greater variability; the distance between the median and 10th percentile is almost 30 points, whereas the thunders’ is less than 20. Also, there are more outliers, the distance between the 10th and 90th percentile is slightly larger, and the standard deviation is also slightly larger.
Practice Quiz #3#
I hand you a dataset which shows the maximum and minimum temperature each day in San Francisco over the past year. What would you do in an exploratory analysis of this data? What kind of visualizations would you make, which summary statistics would you compute, and why?
There is no one correct answer, you will be graded on whether your approach is reasonable given this sort of data, and whether your approach is likely to reveal interesting trends.
In the dataset represented by the histogram below, is the mean a. About the same as the median, b. Larger than the median, or c. Smaller than the median?
Explain why you think your answer is correct. (Even if your answer is technically incorrect, you’ll get full credit if your reasoning is sound—we don’t expect you to compute the mean and median).

Which of the two distributions exhibits greater variability? Justify your answer with an appropriate quantitative measure of variability.

Practice Quiz 3 Solutions
Make a line plot of the temperature vs day with one line for the daily maximum and one line for the daily minimum to look for seasonal patterns. Make a histogram of the daily maximum temperatures to get a sense of the variability. For summary statistics, compute the max, min, mean, and median high and low temperatures for each month as well as the year. Then make a line plot with the mean temps for each month with one line for high temps and one line for low temps to see a smoothed plot of seasonal trends.
The mean and median look about the same; the histogram looks like it is roughly balanced around the center.
The variability of the 2020 data is higher. The standard deviation is almost twice as large. The gap between the mean and median is also almost twice as large, which reflects the presence of more outliers.