{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 18: Multiple hypotheses\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=#1 alt=\"\" style=\"width:70%; display:block; margin:0 auto;\" align=\"right\"><figcaption></figcaption></figure>\n",
    "\n",
    "# Recap\n",
    "\n",
    "\n",
    "## Hypotheses and p-values\n",
    "\n",
    "\n",
    "- A **p-value** is the probability of finding a result *at least* as extreme/surprising, if outcomes happened by random chance alone.\n",
    "- The **null hypothesis** corresponds to \"just chance\" or \"no effect.\"\n",
    "- The **alternative hypothesis** corresponds to \"better than chance\" or \"an effect.\"\n",
    "- Small p-value (less than 0.05) \u2192 evidence against the null hypothesis.\n",
    "\n",
    "\n",
    "## One-sided and two-sided hypotheses\n",
    "\n",
    "- A **one-sided** hypothesis is an alternative hypothesis with a direction ($>$ or $<$).\n",
    "- A **two-sided** hypothesis is an alternative hypothesis without a direction ($\\neq$).\n",
    "- The type of hypothesis determines what is \"more extreme\" in the p-value calculation.\n",
    "- Two-sided is a good default.\n",
    "\n",
    "## Practice quiz #3\n",
    "\n",
    "- A candy company promises that at least 30% of their chocolate eggs contain a figurine of the fictional character Elsa from Frozen; the rest contain other toys. \n",
    "- Suppose you buy 40 chocolate eggs, and only 9 of them contain an Elsa figurine.\n",
    "- You will investigate whether the low number of Elsa figurines is statistically significant.\n",
    "\n",
    "## Question 1\n",
    "\n",
    "- What are the null and alternative hypotheses? Describe them both in English and in mathematical symbols.\n",
    "\n",
    "- **Answer:** The null hypothesis is that the company is telling the truth and the chocolate eggs have a 30% chance of containing an Elsa figurine. The alternative hypothesis is that the chocolate eggs have a smaller than 30% chance of containing an Elsa figurine.\n",
    "\n",
    "- In symbols, let $\\pi$ be the long run proportion chocolate eggs that contain an Elsa figurine. The null hypothesis is $H_0 : \\pi = 0.3$ and the alternative hypothesis is $H_A : \\pi < 0.3$.\n",
    "\n",
    "- You could also do a two-sided alternative hypothesis $H_A : \\pi \\neq 0.3$. In words, the probability of a chocolate egg containing an Elsa figurine is more or less than the 30% advertised by the company.\n",
    "\n",
    "## Question 2\n",
    "\n",
    "- Describe how you would do a simulation to compute a p-value. If the null was true, what would be the \"probability of success\"? What would be the \"number of trials\"? What value would you compare the simulated data to?\n",
    "\n",
    "- **Answer:** A \"success\" would correspond to the chocolate egg containing an Elsa figurine. If the company is telling the truth, then the 30% of the chocolate eggs would contain an Else figurine. The \"probability of success\" is therefore 0.3\n",
    "\n",
    "- The number of trails is 40 (the number chocolate eggs bought). \n",
    "\n",
    "- The value to compare to is 9 (the number of chocolate eggs that contain an Elsa figurine).\n",
    "\n",
    "## Question 3\n",
    "\n",
    "- The p-value for the observed results (an Elsa figurine in 9 of the 40 chocolate eggs) is 0.04. What do you conclude about the null hypothesis?\n",
    "\n",
    "- **Answer:** Since the p-value is less than 0.05, we have evidence against the null hypothesis that 30% of chocolate eggs contain an Elsa figurine.\n",
    "\n",
    "# Type 1 and type 2 errors\n",
    "\n",
    "## Type 1 and type 2 errors\n",
    "\n",
    "|        |                | **Truth** |        |\n",
    "|--------|----------------|----------:|-------:|\n",
    "|        |                | $H_0$     | $H_A$  |\n",
    "| **Decision** | reject $H_0$       | Type I error |      |\n",
    "|        | don't reject $H_0$ |            | Type II error |\n",
    "\n",
    "\n",
    "- From last lecture:\n",
    "  - **Type 1 error:** a \"false alarm\" or false positive.\n",
    "  - **Type 2 error:** a \"missed opportunity\" or a false negative.\n",
    "\n",
    "## Type 1 error rate\n",
    "\n",
    "- The **Type 1 error rate** is:\n",
    "\n",
    "  $$\\frac{\\text{Number of times a type 1 error is made}}{\\text{Number of times the null hypothesis is true}}$$\n",
    "\n",
    "- Rejecting the null hypothesis when the p-value is less than 0.05 means that the type 1 error rate is less than 0.05.\n",
    "- In general, if we reject the null hypothesis when the p-value is less than a threshold $\\alpha$, then the type 1 error rate is less than $\\alpha$.\n",
    "\n",
    "## False positives and multiple experiments\n",
    "\n",
    "\n",
    "This comic comes from [xkcd](https://xkcd.com/882/).\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd1.png\" alt=\"\" style=\"width:70%\"><figcaption></figcaption></figure>\n",
    "\n",
    "\n",
    "##\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd2.png\" alt=\"\" style=\"width:70%\"><figcaption></figcaption></figure>\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd3.png\" alt=\"\" style=\"width:70%\"><figcaption></figcaption></figure>\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd4.png\" alt=\"\" style=\"width:70%\"><figcaption></figcaption></figure>\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd5.png\" alt=\"\" style=\"width:70%\"><figcaption></figcaption></figure>\n",
    "\n",
    "##\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/xkcd6.png\" alt=\"\" style=\"width:60%\"><figcaption></figcaption></figure>\n",
    "\n",
    "# Multiple hypothesis\n",
    "\n",
    "## Multiple testing\n",
    "\n",
    "- In the comic, the scientists investigated _multiple hypotheses_. \n",
    "- There were twenty null/alternative hypotheses pairs (one for each Jellybean color).\n",
    "- If there are $m$ hypothesis tests, then the probability of having at least one false positive goes up.\n",
    "- This can lead to accidental \"p-hacking\" where reported p-values are artificially small and do not accurately measure the evidence against a null hypothesis.\n",
    "\n",
    "## Family wise error rate\n",
    "\n",
    "- A collection of multiple hypotheses is called a family.\n",
    "- The **family wise error rate** (FWER) is the probability that there is at least one false positive (type 1 error) in the family.\n",
    "- In symbols:\n",
    "\n",
    "  $$\\text{FWER} = \\mathrm{Pr}[\\text{at least one false positive}] $$\n",
    "\n",
    "\n",
    "\n",
    "## Example: AI faces\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/real-or-ai-1-sol.png\" alt=\"Image 1: Which face is real?\" style=\"width:100%; display:block; margin:0 auto;\"><figcaption>The left face is real.</figcaption></figure>\n",
    "\n",
    "Did anyone identify the real face 7 out of 7 times?\n",
    "\n",
    "\n",
    "\n",
    "## AI faces: FWER\n",
    "\n",
    "- The probability of a specific person guessing and getting 7 out of 7 faces correct is $2^{-7} \\approx 0.0078$.\n",
    "- There are about 90 students who go to section each week.\n",
    "- If everyone was guessing, then the probability that *somebody* got 7 out of 7 is\n",
    "\n",
    "  $$ 1-(1-0.0078)^{90} \\approx 0.51$$\n",
    "\n",
    "- The family wise error rate can be much higher than the type 1 error rate for a single hypothesis.\n",
    "\n",
    "## Bonferroni correction\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 0.25;\" >\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/bonferroni.jpeg\" alt=\"\" style=\"width:100%\"><figcaption></figcaption></figure>\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 0.75;\" >\n",
    "\n",
    "- Suppose there are $m$ null/alternative hypothesis pairs in the family.\n",
    "- This means that we would compute $m$ p-values.\n",
    "- Instead of rejecting each null hypothesis when its p-value is less than $\\alpha$, we will only reject when its p-value is less than $\\alpha /m$.\n",
    "- Changing $\\alpha$ to $\\alpha/m$ is called a **Bonferroni correction**.\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Bonferroni correction\n",
    "\n",
    "- The Bonferroni correction makes sure that the family wise error rate is at most $\\alpha$.\n",
    "- This is because:\n",
    "  \n",
    "    $$\\begin{align*}\n",
    "    \\text{FWER} &= \\mathrm{Pr}[\\text{at least one false positive}] \\\\\n",
    "    &\\leq m \\times \\mathrm{Pr}[\\text{false positive for one hypothesis}] \\\\\n",
    "    &\\leq m \\times \\frac{\\alpha}{m} \\\\\n",
    "    &= \\alpha\n",
    "    \\end{align*}$$\n",
    "\n",
    "## Bonferroni example\n",
    "\n",
    "- In the xkcd comic, what would be the Bonferroni correction?\n",
    "- Since there are 20 hypotheses, the new threshold should be $\\frac{0.05}{20}=0.0025$\n",
    "- **Interpretation**: Since the scientists did 20 tests, a p-value less than 0.05 is not strong evidence that green jelly beans are linked to acne.\n",
    "- The p-value needs to be less than 0.0025 for there to be evidence that green jelly beans are linked to acne.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "# Dream again\n",
    "\n",
    "## Dream recap\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- In a speedrun attempt, Dream received 42 Ender pearls in 262 trades.\n",
    "- Last lecture, we saw that is very unlikely he would get that many Ender pearls without cheating.\n",
    "- But, do we need to do a multiple hypothesis correction?\n",
    "    \n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/dream.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "    \n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Dream and multiple hypotheses\n",
    "\n",
    "- We should account for the fact that there are many people who play Minecraft and therefore there have been many speed running attempts.\n",
    "- Maybe it is reasonable that *someone* would get as lucky as Dream, and we are unfairly focusing on him.\n",
    "\n",
    "## Bonferroni correction for Dream\n",
    "\n",
    "- How could we do a Bonferroni correction for Dream? What is $m$ the number of hypotheses?\n",
    "- $m$ should be the number of Minecraft speed run attempts. We do not know $m$ exactly but we could do a Fermi estimate.\n",
    "\n",
    "\n",
    "\n",
    "$$\\begin{align*}\n",
    "&\\# \\text{speed run attempts} \\\\\n",
    "&= \\#\\text{number of speed runners} \\\\\n",
    "&\\times \\#\\text{number of attempts per runner per year}\\\\ & \\times \\#\\text{number of years of speed running}\\\\\n",
    "&\\approx 10^{5} \\times 10 \\times 10\\\\\n",
    "&=10^{7}\n",
    "\\end{align*}$$\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Bonferroni correction for Dream\n",
    "\n",
    "- Giving Dream the benefit of the doubt, we can use $m=10^8$ (one hundred million).\n",
    "- A calculation gives that the p-value for Dream's trades is roughly $\\frac{6}{10^{12}}$ (less than 1 in a hundred billion).\n",
    "- The Bonferroni correction is\n",
    "\n",
    "  $$\\frac{0.05}{10^8} = \\frac{5}{10^{10}} = \\text{1 in 20 billion}$$\n",
    "\n",
    "- Dream's p-value is still much smaller than the corrected threshold so we still have evidence for cheating.\n",
    "\n",
    "## The dark side of Bonferroni\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 0.75;\" >\n",
    "\n",
    "- The Bonferroni correction makes it much harder to reject the null hypothesis.\n",
    "- This keeps the false positive (type 1 error) rate under control.\n",
    "- But if we don't reject the null hypothesis, we risk having a lot of false negatives (type 2 errors).\n",
    "- The Bonferroni correction increases the false negative (type 2 error) rate.\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 0.25;\" >\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/bonferroni_dark.jpeg\" alt=\"\" style=\"width:100%\"><figcaption></figcaption></figure>\n",
    "\n",
    "\n",
    "</div>\n",
    "\n",
    "\n",
    "</div>\n",
    "\n",
    "# Genomic studies\n",
    "\n",
    "## Genome wide association study\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- The human body has roughly 20,000 genes.\n",
    "- The expression levels of each person's genes can vary widely.\n",
    "- A genome wide association (GWA) study looks at whether there are any genes that are associated with a particular disease.\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/gwas.png\" alt=\"\" style=\"width:100%\"><figcaption>An illustration of GWA study (<a href=\"https://en.wikipedia.org/wiki/Genome-wide_association_study\">source</a>).</figcaption></figure>\n",
    "\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "\n",
    "## Genome wide association study\n",
    "\n",
    "- Scientists conducting GWA studies have to be very careful about false positives due to testing multiple hypotheses (one for each gene).\n",
    "- This has led to a lot of new statistical methods (many developed at Stanford) that are designed specifically for GWA studies that have lower false negative rates than the Bonferroni correction.\n",
    "- GWA studies have successfully identified genes that are associated with a variety of diseases including heart disease, diabetes, and Crohn's disease.\n",
    "\n",
    "## Recap\n",
    "\n",
    "- Using p-values controls the false positive rate:\n",
    "    - If we reject a null hypothesis when the p-value is less than $\\alpha$, then the false positive rate will be $\\alpha$.\n",
    "    - If we make the $\\alpha$ smaller, there will be more false negatives.\n",
    "\n",
    "- Multiple testing:\n",
    "    - The *family-wise error rate* is the chance of at least one false positive.\n",
    "    - p-hacking.\n",
    "    - Bonferroni correction for multiple testing.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}