{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 16: Statistical Significance\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## STATS60 so far\n",
    "\n",
    "\n",
    "- Unit 1 -- Thinking About Scale:\n",
    "    - Putting numbers in context, Fermi estimates, cost benefit analysis.\n",
    "- Unit 2 -- Exploratory Data Analysis:\n",
    "    - Terminology, data visualization, data summaries.\n",
    "- Unit 3 -- Probability:\n",
    "    - Computing probabilities, conditional probability, probability fallacies, expectation.\n",
    "\n",
    "\n",
    "## Looking ahead\n",
    "\n",
    "\n",
    "- Unit 4 -- Estimates, hypothesis testing and experiments:\n",
    "    - Generalizing from data to a larger group.\n",
    "- Today:\n",
    "    - Significance: how strong is the evidence?\n",
    "    - Chimpanzees problem-solving.\n",
    "\n",
    "\n",
    "# Significance\n",
    "\n",
    "## Organ donations\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "- The wording of the question seems to have an impact on the proportion of people who sign-up to be organ donors.\n",
    "- Is the difference between groups large enough to <b>statistically significant</b>?\n",
    "\n",
    "</div>\n",
    "<div style=\"flex: 1; text-align: center;\">\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/organ-donation-results.gif\" alt=\"Sign up rates across different question types\" style=\"width:80%;\"><figcaption><a href = https://www.science.org/doi/10.1126/science.1091721>Do defaults save lives?</a> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Statistical significance\n",
    "\n",
    "\n",
    "- **Statistically significant** means that the results are unlikely to have occurred by random chance alone. \n",
    "    - Example: it is possible that the different sign-up rates are due to chance, but this seems unlikely based on the study results.\n",
    "- Statistical significance asks \"is our result unlikely to happen by random chance?\"\n",
    "- To answer this, we will investigate what the results would like if any differences were due to random chance.\n",
    "\n",
    "\n",
    "\n",
    "# Chimpanzees and problem-solving\n",
    "\n",
    "## Can chimpanzees solve problems?\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "- Chimpanzees are known to use tools and demonstrate simple problem-solving.\n",
    "- <a href = \"https://doi.org/10.1126/science.705342\"> Premack and Woodruff (1978)</a> wanted to know if chimpanzees could understand problems faced by people.\n",
    "\n",
    "</div>\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/chimp.jpg\"  style=\"width:80%;\"><figcaption><a href = https://en.wikipedia.org/wiki/Chimp_Haven>Marcus at Chimp Haven.</a> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Sarah\n",
    "\n",
    "\n",
    "- Sarah (an adult chimpanzee) was shown 8 videos of a human facing a problem.\n",
    "- Sarah was shown two photos where one photo has the correct solution.\n",
    "     <figure style=\"text-align:center;\"><img src=\"../figures/sarah_test.png\" style=\"width:40%;\"><figcaption>One of the problems and its solution. </figcaption></figure>\n",
    "\n",
    "- Sarah picked the correct photo 7 out of 8 times.\n",
    "\n",
    "   \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Observational units and variables\n",
    "\n",
    "\n",
    "\n",
    "- In the <a href = https://doi.org/10.1126/science.705342> chimpanzee problem-solving study</a>:\n",
    "\n",
    "    a. What are the observational units?\n",
    "\n",
    "    b. What are the relevant variables?\n",
    "\n",
    "a. The observational units are the problems.\n",
    "\n",
    "b. A relevant variable is whether Sarah chose the correct picture.\n",
    "\n",
    "\n",
    "\n",
    "## Samples and statistics\n",
    "\n",
    "\n",
    "### Definitions\n",
    "\n",
    "\n",
    "\n",
    "- The set of observational units on which data is collected is called the **sample**.\n",
    "- The number of observational units in the sample is the **sample size**.\n",
    "- A **statistic** is a number summarizing the data in the sample.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "## Samples and statistics\n",
    "\n",
    "- In the <a href = https://doi.org/10.1126/science.705342> chimpanzee problem-solving study</a>:\n",
    "\n",
    "    a. What is the sample size?\n",
    "\n",
    "    b. What would be a relevant statistic?\n",
    "\n",
    "a. The sample size is 8 (the number of problems).\n",
    "\n",
    "b. A relevant statistic is the proportion of times Sarah picked the correct picture (7 out of 8 times).\n",
    "\n",
    "# Beyond the sample\n",
    "\n",
    "## Sample as a snapshot\n",
    "\n",
    "- The 8 problems that were shown to Sarah are just a snapshot of Sarah's ability to solve problems.\n",
    "- Sarah's problem-solving can be thought of as a *random process*. \n",
    "- We want to know the long run probability of Sarah correctly answering a question.\n",
    "\n",
    "## Parameters\n",
    "\n",
    "\n",
    "### Definition\n",
    "\n",
    "- For a random process, a **parameter** is a long-run numerical property of the process.\n",
    "\n",
    "- Parameters are often written using Greek letters.\n",
    "\n",
    "\n",
    "\n",
    "- Example: the long-run frequency of Sarah correctly solving a problem. We will call this number $\\pi$ (a Greek p).\n",
    "- We won't know the exact value of $\\pi$, but we can use a sample to make conclusions about the likely values $\\pi$.\n",
    "\n",
    "\n",
    "## Two explanations\n",
    "\n",
    "- Sarah selected the correct photo in 7/8 attempts.\n",
    "- What are two possible explanations for why Sarah got 7 out of 8 correct?\n",
    "\n",
    "    a. Sarah knows how to solve problems and is using this skill to select the photo (the probability of correctly selecting the correct photo is larger than 0.5).\n",
    "\n",
    "    b.  Sarah is just guessing (the probability of correctly selecting the correct photo is 0.50) and she got lucky in these 8 problems.\n",
    "\n",
    "\n",
    "- Which of these two explanations do you think is more a reasonable explanation? How would you convince a skeptic?\n",
    "\n",
    "## Tactile simulation\n",
    "\n",
    "- How can we model what the study would have looked like if Sarah was just guessing?\n",
    "- Flip your coin 8 times and <a href = \"https://docs.google.com/forms/d/e/1FAIpQLSfqZHbeKSNeSDwFfk3f_oi09gaHjBnUVqe3AeZjS-GMhgdxKw/viewform\">record </a> the number of \"heads\" which we will call a success.\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/qr_code_chimp.svg\" alt=\"\" style=\"width:25%;\"><figcaption>You can view the results   <a href = \"https://docs.google.com/spreadsheets/d/1R__uug9g_2IEJCX4CHqYa6dyfwjtIsGyD2ZlcepHJEw/edit?usp=sharing\">here</a>. </figcaption></figure>\n",
    "\n",
    "## Questions\n",
    "\n",
    "Based on the histogram of results:\n",
    "\n",
    "- What does each square represent?\n",
    "\n",
    "- What was the most common outcome for number of heads in 8 coin tosses? Does that make sense?\n",
    "\n",
    "- Why did we need everyone to toss their coin 8 times? Why couldn't we just ask one person and look at their results?\n",
    "\n",
    "- Where does Sarah's observed result of 7 correct out of 8 fall in this histogram?  Does \"just guessing\" seem to be a good explanation for 7 correct?\n",
    "\n",
    "\n",
    "\n",
    "## One proportion applet\n",
    "\n",
    "- Instead of flipping more and more coins, we will use the <a href = https://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm> One Proportion applet</a>.\n",
    "- What value should we use for \"probability of heads\"?\n",
    "- What about \"Number of tosses\"?\n",
    "- If we draw many samples, what does each dot in the dotplot represent?\n",
    "\n",
    "## One proportion applet\n",
    "\n",
    "- Based on the dotplot, how would you describe the result of 7 out of eight heads? Why?\n",
    "\n",
    "    a. Very surprising.\n",
    "    b. Somewhat surprising.\n",
    "    c. Not surprising.\n",
    "\n",
    "- It seems somewhat surprising because the 7 heads out of 8 tosses seems far out in the tail, and therefore unusual to happen by chance alone. \n",
    "\n",
    "## Quantifying surprise\n",
    "\n",
    "- To quantify \"how surprising\" or \"how unlikely\" the observed result is, we need to calculate what proportion of times the simulation results were at least as surprising as the observed result.\n",
    "- In the One Proportion applet, we can use \"Count samples\" to find the number of times we got a result as extreme as 7 out of 8 heads.\n",
    "- What is the proportion of repetitions as extreme as our observed result?\n",
    "\n",
    "# p-values\n",
    "\n",
    "## p-value\n",
    "\n",
    "- The previous quantity (\"proportion of as extreme repetitions\") is called a **p-value**.\n",
    "\n",
    "\n",
    "\n",
    "### Definition (p-value)\n",
    "\n",
    "- A **p-value** is the probability of finding a result *at least* as extreme/surprising, in settings identical to the actual study, if outcomes happened by random chance alone.\n",
    "- A p-value is always between 0 and 1.\n",
    "\n",
    "\n",
    "## p-value example\n",
    "\n",
    "- In the Chimpanzee study:\n",
    "    - Settings identical to the actual study \u2192 8 trials.\n",
    "    - Random chance alone \u2192 probability of success is 0.5.\n",
    "\n",
    "## p-value visualization\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; align-items: center; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/applet-chimps.png\" alt=\"\" style=\"width:120%;\"><figcaption> </figcaption></figure>\n",
    "</div>\n",
    "<div style=\"flex: 1;\">\n",
    "\n",
    "- The p-value is represented by the red area in the dotplot.\n",
    "- The p-value is the tail area of the dotplot starting at 7, the number of correct answers in the study.\n",
    "- Simulations that resulted in 8 heads are also included.\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## p-value interpretation\n",
    "\n",
    "- If Sarah was just randomly guessing, we would observe at least seven correct problems about 4.3% of the time.\n",
    "\n",
    "- If you used the applet on your own, would you get the same p-value? Will the p-value be close?\n",
    "\n",
    "# Null and alternative hypotheses\n",
    "\n",
    "## Two possible explanations\n",
    "\n",
    "- There were two potential explanation for why Sarah picked the correct photo in 7 out of 8 scenarios:\n",
    "\n",
    "    a. For any problem, Sarah tends to randomly pick one of the two photos. \n",
    "\n",
    "    b. For any problem, Sarah tends to pick the correct photo.\n",
    "\n",
    "- These potential explanations are called **hypotheses**.\n",
    "\n",
    "## Null hypothesis\n",
    "\n",
    "- The first explanation (Sarah tends to pick randomly) is called the **null hypothesis** and is abbreviated as $H_0$.\n",
    "- The null hypothesis corresponds to \"just chance\" or \"no effect.\"\n",
    "- Recall that $\\pi$ is the long-run frequency of Sarah selecting the correct photo.\n",
    "- Write the null hypothesis in terms of the parameter $\\pi$:\n",
    "- $H_0 : \\pi = 0.5$\n",
    "\n",
    "## Alternative hypothesis\n",
    "\n",
    "- The second explanation (Sarah tends to pick the correct photo) is called the **alternative hypothesis** and is abbreviated as $H_A$.\n",
    "- The alternative hypothesis corresponds to \"better than chance\" or \"an effect.\"\n",
    "- Write the alternative hypothesis in terms of the parameter $\\pi$:\n",
    "- $H_A : \\pi > 0.5$\n",
    "\n",
    "## p-values and hypothesis\n",
    "\n",
    "- p-value is \"small\" \u2192 Evidence against $H_0$ \u2192 Evidence for $H_A$.\n",
    "- p-value is \"not small\" \u2192 No evidence against $H_0$ \u2192 No evidence for $H_A$.\n",
    "- Smaller p-value \u2192 Stronger evidence against $H_0$ \u2192 Stronger evidence for $H_A$.\n",
    "\n",
    "## p-value thresholds\n",
    "\n",
    "\n",
    "- 0.10 < p-value \u2192 Not much evidence against $H_0$.\n",
    "- 0.05 < p-value < 0.10 \u2192 Moderate evidence against $H_0$.\n",
    "- 0.01 < p-value < 0.05 \u2192 Strong evidence against $H_0$.\n",
    "- p-value < 0.01 \u2192 Very strong evidence against $H_0$.\n",
    "\n",
    "- You do not need to memorize these thresholds. You do need to remember:\n",
    "    - Smaller p-value \u2192 Stronger evidence against $H_0$. \n",
    "\n",
    "## Study conclusions\n",
    "\n",
    "- The conclusions from the study can be stated in terms of the p-value and the null and alternative hypothesis:\n",
    "- The p-value (which is about 0.043) provides evidence against the null hypothesis, and evidence for the alternative hypothesis.\n",
    "- Thus, Sarah's data provide evidence that, in general, for any similar problems, Sarah would tend to select the correct photo.\n",
    "\n",
    "## Going beyond the study\n",
    "\n",
    "- Based on <a href = \"https://doi.org/10.1126/science.705342\"> Premack and Woodruff (1978)</a> do you have any questions related to chimpanzees' abilities to solve problems?\n",
    "\n",
    "\n",
    "\n",
    "## Summary - samples and parameters\n",
    "\n",
    "- **Statistically significant** means that the results are unlikely to have occurred by random chance alone.\n",
    "- The set of observational units on which data is collected is called the **sample**. \n",
    "- A **statistic** is a number summarizing the data in the sample. \n",
    "- For a random process, a **parameter** is a long-run numerical property of the process.\n",
    "\n",
    "\n",
    "## Summary - p-values and hypotheses\n",
    "\n",
    "\n",
    "- A **p-value** is the probability of finding a result *at least* as extreme/surprising, if outcomes happened by random chance alone.\n",
    "- The **null hypothesis** corresponds to \"just chance\" or \"no effect.\"\n",
    "- The **alternative hypothesis** corresponds to \"better than chance\" or \"an effect.\"\n",
    "- Small p-value \u2192 evidence against the null hypothesis \u2192 evidence for the alternative hypothesis \u2192 data is statistically significant.\n",
    "\n",
    "## Computing p-values\n",
    "\n",
    "- To compute a p-value, we need a model for what the results would have looked like if there was no effect.\n",
    "    - Example: flipping a fair coin, using an applet.\n",
    "- Next, we repeat or *simulate* the results many times (1,000 is usually enough).\n",
    "- Finally, we compute the number of times the simulation was at least as extreme as the results we actually observed.\n",
    "\n",
    "## Computing p-values\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/applet-chimps.png\" alt=\"\" style=\"width:80%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "## Looking ahead\n",
    "\n",
    "- In different studies the details will be different:\n",
    "    - Different setting (e.g. different number of trails).\n",
    "    - Different statistic (e.g. a mean instead of a proportion).\n",
    "    - Different model for \"no effect\" (e.g. 1/3 instead of 1/2).\n",
    "- But the core idea is the same:\n",
    "    - To determine statistical significance: compare the observed data to what we would happen if there was no effect.\n",
    "\n",
    "\n",
    "# Can dogs detect COVID?\n",
    "\n",
    "## Study background\n",
    "\n",
    "- Dogs have a <a href = \"https://www.youtube.com/watch?v=5LpaVYxTedE\"> remarkable sense of smell</a>.\n",
    "- Dogs can detect drugs, help with search and rescue and identify explosives.\n",
    "- A <a href = \"https://doi.org/10.1371/journal.pone.0243122\" > 2020 study </a> investigated whether dogs can detect COVID-19.\n",
    "\n",
    "\n",
    "\n",
    "## Study background\n",
    "\n",
    "\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- The dog would smell 4 sweat samples. \n",
    "- One sample was from a COVID positive person.\n",
    "\n",
    "    <figure style=\"text-align:center;\"><img src=\"../figures/samples_cones.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- The dog was trained to mark the positive sample.\n",
    "- The dog marked the sample by sitting in front of it.\n",
    "\n",
    "    <figure style=\"text-align:center;\"><img src=\"../figures/dog_marking.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "\n",
    "\n",
    "## Study results\n",
    "\n",
    "- One of the dogs was a Belgian Malinois\n",
    "Shepherd named Maika.\n",
    "- Maika correctly marked the covid positive sample in 32 out 38 trials.\n",
    "- What are two explanations for Maika's results?\n",
    "\n",
    "## One proportion applet\n",
    "\n",
    "- We can compute a p-value based on Maika's result.\n",
    "- This time we will go straight to the <a href = https://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm> One Proportion applet</a>.\n",
    "- What value should we use for \"probability of heads\"?\n",
    "- What about \"Number of tosses\"?\n",
    "\n",
    "## p-value\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/one-prop-dogs.png\" alt=\"\" style=\"width:80%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "- None of the simulated experiments results in a result as extreme as 32 out 38 heads.\n",
    "- We have very strong evidence against the null hypothesis."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
