{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion 8: Estimation\n",
    "\n",
    "STATS 60 / STATS 160 / PSYCH 10\n",
    "\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "**Today's section**\n",
    "\n",
    "\n",
    "- Recap of lecture material\n",
    "- Week 8 practice quiz 1\n",
    "- Ranking baseball batters with confidence intervals\n",
    "- <a href = \"https://colab.research.google.com/drive/1YM3aMBv428zTROy5PljxOsGXqpU9GsGq?usp=sharing\">Notebook link</a>\n",
    "\n",
    "\n",
    "</div>\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "# Recap\n",
    "\n",
    "## Population and sample\n",
    "\n",
    "- There is a variable $x$\n",
    "which we want to measure on observation units in a **population**.\n",
    "- Our goal is to estimate the population mean $\\mu$ which is a **parameter**.\n",
    "- We take independent $n$ **samples** from the population and record the variable on the sample.\n",
    "- This gives measurements $x_1,\\ldots,x_n$.\n",
    "- The sample mean $\\hat{\\mu}_n = \\frac{x_1+\\cdots+x_n}{n}$ is an **estimate** of $\\mu$.\n",
    "\n",
    "\n",
    "## Standard deviation of $\\hat{\\mu}_n$\n",
    "\n",
    "- The **standard deviation** of $\\hat{\\mu}_n$ is\n",
    "\n",
    "    $$\\frac{\\sigma_x}{\\sqrt{n}}$$\n",
    "\n",
    "- $n$ is the sample size and $\\sigma_x$ is the standard deviation of the variable $x$.\n",
    "- We don't know $\\sigma_x$ and so instead we use $\\hat{\\sigma}_x$ which is the sample standard deviation $x_1,\\ldots,x_n$.\n",
    "\n",
    "    $$\\text{standard deviation of } \\hat{\\mu}_n \\approx \\frac{\\hat{\\sigma}_x}{\\sqrt{n}}$$\n",
    "\n",
    "## Confidence intervals\n",
    "\n",
    "- A **confidence interval** for $\\mu$ is a collection of plausible values of $\\mu$.\n",
    "- The estimate $\\hat{\\mu}_n$ can be used to make confidence intervals of the form\n",
    "\n",
    "  $$ \\hat{\\mu}_n \\pm 2 \\frac{\\hat{\\sigma}_x}{\\sqrt{n}}$$\n",
    "\n",
    "- This produces a 95% confidence interval (2 standard deviations).\n",
    "- For 68% use 1 standard deviation and for 99% use 3 standard deviations.\n",
    "\n",
    "## Proportions and means\n",
    "\n",
    "- A special case is when $x$ is binary (yes/no).\n",
    "- In this case $\\pi$ is the **population proportion** and $\\hat{\\pi}_n$ is the **sample proportion**.\n",
    "- The standard deviation of $\\hat{\\pi}_n$ is\n",
    "\n",
    "    $$\\text{standard deviation of } \\hat{\\pi}_n = \\sqrt{\\frac{\\pi(1-\\pi)}{n}} $$\n",
    "\n",
    "    which is approximately\n",
    "\n",
    "    $$\\text{standard deviation of } \\hat{\\pi}_n \\approx \\sqrt{\\frac{\\hat{\\pi}_n(1-\\hat{\\pi}_n)}{n}} $$\n",
    "\n",
    "\n",
    "# Practice Quiz 1\n",
    "\n",
    "## Question 1\n",
    "\n",
    "Decide whether the following statement is True or False, and justify your answer:\n",
    "\n",
    "\"The Normal Approximation of the sample mean applies, no matter how the samples are collected\"\n",
    "\n",
    "\n",
    "**Answer:** False. It is important that the data points are collected independently and uniformly.\n",
    "\n",
    "\n",
    "## Question 2\n",
    "\n",
    "\n",
    "In the following scenario, explain \n",
    "\n",
    "a. What is the population\n",
    "\n",
    "b. What is the variable $x$ being measured\n",
    "\n",
    "c. What is the sample $x_1,\\ldots,x_n$\n",
    "\n",
    "An economist wants to estimate what percent of the average San Francisco restaurant servers' earned income comes from tips. The economist looks at the municipal tax records, picks 100 random people who list their occupation as \"restaurant server\", and calls them to ask them what percent of their earned income last year was from tips.\n",
    "\n",
    "\n",
    "## Question 2 - answer\n",
    "\n",
    "In this scenario:\n",
    "\n",
    "a. The population is the San Francisco restaurant servers.\n",
    "\n",
    "b. The variable $x$ is the percent of the server's income that came from tips last year. \n",
    "\n",
    "c. The sample is, for each contacted server, the percent $x_i$ that this specific server made from tips in the last year.\n",
    "\n",
    "## Question 3\n",
    "\n",
    "You are trying to estimate the proportion of left-handed people on Stanford campus (about 10\\% of people in the population overall are left-handed). \n",
    "\n",
    "You collect a sample of size $n=50$ and record whether they are left-handed. Will the normal approximation apply to $\\hat{\\pi}_n$ (the sample proportion of people who are left-handed)?\n",
    "\n",
    "\n",
    "**Answer:** No. The normal approximation does not apply. This is because we expect to have $\\hat{\\pi}_n \\approx \\frac{1}{10}$ and so $n\\hat{\\pi}_n \\approx 50 \\times \\frac{1}{10} = 5$ which is less than $10$. This means that $n$ is too small for the normal approximation to be used.\n",
    "\n",
    "\n",
    "# Ranking baseball batters\n",
    "\n",
    "## Notebook activity\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "- Open <a href = \"https://colab.research.google.com/drive/1YM3aMBv428zTROy5PljxOsGXqpU9GsGq?usp=sharing\">this notebook</a> (the link is also at the top of Discussion 8 on the course website).\n",
    "- Follow the instructions to see how sample size effects variability.\n",
    "\n",
    "\n",
    "</div>\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/Rogers_Hornsby.jpg\" alt=\"\" style=\"width:70%;\"><figcaption></figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}