{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion 9: Review Session\n",
    "\n",
    "STATS 60 / STATS 160 / PSYCH 10\n",
    "\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "**Today's section**\n",
    "\n",
    "\n",
    "\n",
    "- Review of material from the course. \n",
    "\n",
    "- Overview of units 1-4 and the topics covered in each.\n",
    "\n",
    "- Feel free to ask questions as we go. \n",
    "\n",
    "- Practice Quiz 1 \n",
    "\n",
    "\n",
    "</div>\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Unit 1: Thinking about scale \n",
    "\n",
    "- Putting numbers in context: three questions\n",
    "    a. What type of number is this?\n",
    "        - Is it an average? A percentage? A rate? \n",
    "        - How was this number calculated?  \n",
    "        - Who is reporting it?\n",
    "    b. What can I compare this number to? Is it large or small compared to other similar values?\n",
    "    c. What would I have expected this number to be? \n",
    "        -  Is the number surprising? Does it seem plausible?\n",
    "- Ballpark estimates and Fermi problems\n",
    "- Cost-benefit analysis\n",
    "\n",
    "## Unit 2: Exploratory data analysis\n",
    "\n",
    "- Data visualization\n",
    "    - Pie chart, bar chart, histogram, time series, scatter plot, and what each figure is suitable for\n",
    "    - Misleading and confusing graphics\n",
    "- Fundamental summary statistics:\n",
    "    - mean\n",
    "    - median\n",
    "    - variance\n",
    "    - standard deviation\n",
    "    - quantiles\n",
    "    - correlation and correlation coefficient\n",
    "- Mutli-modal data\n",
    "- Outliers\n",
    "\n",
    "## Unit 3: Probability\n",
    "\n",
    "- Sample spaces, outcomes, and events\n",
    "- Calculating probability of events\n",
    "- Conditional probability\n",
    "- Bayes' rule\n",
    "- Common mistakes and fallacies in conditional probability\n",
    "- Expectation\n",
    "\n",
    "\n",
    "\n",
    "## Unit 4: Hypothesis tests\n",
    "\n",
    "- Hypothesis testing\n",
    "    - Null and alternative hypothesis\n",
    "    - $p$-values\n",
    "    - multiple testing, family-wise error rate, Bonferroni correction\n",
    "- Using a simulation to calculate a p-value:\n",
    "    - A **p-value** is the probability of finding a result *at least* as extreme/surprising, if outcomes happened by random chance alone.\n",
    "\n",
    "## Unit 4: Experiments\n",
    "\n",
    "- $p$-values for correlation coefficient from simulation\n",
    "- Experimental design\n",
    "    - Randomized controlled trials vs. observational studies\n",
    "- Potential outcomes model\n",
    "    - $p$-values from simulation\n",
    "\n",
    "## Unit 4: Confidence intervals\n",
    "\n",
    "- The sample mean as an estimate\n",
    "- Sample size and the effect of sample size on standard deviation\n",
    "- Normal Approximation for the sample mean\n",
    "    - Confidence intervals\n",
    "    - 68-95-99 rule\n",
    "- Selection bias\n",
    "\n",
    "# Student Questions\n",
    "\n",
    "\n",
    "# Practice Quiz 2, week 9\n",
    "\n",
    "## Question 1\n",
    "\n",
    "Below is a linear model that, given a mother Chinstrap penguin's body mass, tries to predict how early in the season she will lay her egg. Based on this model, on which day of the year would you predict that a 3000 g penguin will lay her egg?\n",
    "\n",
    "![](../figures/penguin-egg-weight.png)\n",
    "\n",
    "## Answer 1\n",
    "\n",
    "Around day 330.\n",
    "\n",
    "\n",
    "## Question 2\n",
    "\n",
    "\n",
    "The model was trained on Chinstrap penguins. Gentoo penguins are a distinct species from Chinstrap penguins. Would you use the same model to make predictions for Gentoo penguins? Explain why or why not.\n",
    "\n",
    "## Answer 2\n",
    "\n",
    "The model might not make good predictions for Gentoo penguins. In general, different species might have radically different body mass and breeding seasons, so the training data being Chinstrap might mean the conclusions are not relevant for Gentoo.\n",
    "\n",
    "## Question 3 \n",
    "\n",
    "Suppose you want to determine the average day of the year $\\mu$ in which a 3000g-3500g Chinstrap penguin will lay her egg. You sample penguins in this weight range at random and see when they lay their eggs. \n",
    "You'll take your estimate $\\hat{\\mu}$ to be the average of the days. How many penguins $n$ would you have to observe to be 99\\% confident that your estimate is within one day of the truth? The standard deviation for the date of egg laying is 6 days.\n",
    "\n",
    "*There is no need to solve for n, you can leave your answer as an unsimplified equation.*\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Answer 3 \n",
    "\n",
    " Using the 69-95-99 rule, we need that 1 day is more than 3 standard deviations of the sample mean. We solve for $n$: $1 \\ge 3\\cdot \\frac{6}{\\sqrt{n}}$ and get that we need $n \\ge (18)^2$.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
