Discussion 9: Review Session#
STATS 60 / STATS 160 / PSYCH 10
Today’s section
Review of material from the course.
Overview of units 1-4 and the topics covered in each.
Feel free to ask questions as we go.
Practice Quiz 1
Unit 1: Thinking about scale#
Putting numbers in context: three questions a. What type of number is this? - Is it an average? A percentage? A rate? - How was this number calculated?
- Who is reporting it? b. What can I compare this number to? Is it large or small compared to other similar values? c. What would I have expected this number to be? - Is the number surprising? Does it seem plausible?Ballpark estimates and Fermi problems
Cost-benefit analysis
Unit 2: Exploratory data analysis#
Data visualization
Pie chart, bar chart, histogram, time series, scatter plot, and what each figure is suitable for
Misleading and confusing graphics
Fundamental summary statistics:
mean
median
variance
standard deviation
quantiles
correlation and correlation coefficient
Mutli-modal data
Outliers
Unit 3: Probability#
Sample spaces, outcomes, and events
Calculating probability of events
Conditional probability
Bayes’ rule
Common mistakes and fallacies in conditional probability
Expectation
Unit 4: Hypothesis tests#
Hypothesis testing
Null and alternative hypothesis
\(p\)-values
multiple testing, family-wise error rate, Bonferroni correction
Using a simulation to calculate a p-value:
A p-value is the probability of finding a result at least as extreme/surprising, if outcomes happened by random chance alone.
Unit 4: Experiments#
\(p\)-values for correlation coefficient from simulation
Experimental design
Randomized controlled trials vs. observational studies
Potential outcomes model
\(p\)-values from simulation
Unit 4: Confidence intervals#
The sample mean as an estimate
Sample size and the effect of sample size on standard deviation
Normal Approximation for the sample mean
Confidence intervals
68-95-99 rule
Selection bias
Student Questions#
Practice Quiz 2, week 9#
Question 1#
Below is a linear model that, given a mother Chinstrap penguin’s body mass, tries to predict how early in the season she will lay her egg. Based on this model, on which day of the year would you predict that a 3000 g penguin will lay her egg?

Answer 1#
Around day 330.
Question 2#
The model was trained on Chinstrap penguins. Gentoo penguins are a distinct species from Chinstrap penguins. Would you use the same model to make predictions for Gentoo penguins? Explain why or why not.
Answer 2#
The model might not make good predictions for Gentoo penguins. In general, different species might have radically different body mass and breeding seasons, so the training data being Chinstrap might mean the conclusions are not relevant for Gentoo.
Question 3#
Suppose you want to determine the average day of the year \(\mu\) in which a 3000g-3500g Chinstrap penguin will lay her egg. You sample penguins in this weight range at random and see when they lay their eggs. You’ll take your estimate \(\hat{\mu}\) to be the average of the days. How many penguins \(n\) would you have to observe to be 99% confident that your estimate is within one day of the truth? The standard deviation for the date of egg laying is 6 days.
There is no need to solve for n, you can leave your answer as an unsimplified equation.
Answer 3#
Using the 69-95-99 rule, we need that 1 day is more than 3 standard deviations of the sample mean. We solve for \(n\): \(1 \ge 3\cdot \frac{6}{\sqrt{n}}\) and get that we need \(n \ge (18)^2\).