{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 17: Hypothesis tests\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "# Recap\n",
    "\n",
    "## Statistical significance\n",
    "\n",
    "- **Statistically significant** means that the results are unlikely to have occurred by random chance alone.\n",
    "- The set of observational units on which data is collected is called the **sample**. A **statistic** is a number summarizing the data in the sample. \n",
    "- For a random process, a **parameter** is a long-run numerical property of the process.\n",
    "- Example: $\\pi$ is the long run probability that Sarah the Chimpanzee selects the correct photo (not 3.14...).\n",
    "\n",
    "## Hypotheses and p-values\n",
    "\n",
    "\n",
    "\n",
    "- A **p-value** is the probability of finding a result *at least* as extreme/surprising, if outcomes happened by random chance alone.\n",
    "- The **null hypothesis** corresponds to \"just chance\" or \"no effect.\"\n",
    "- The **alternative hypothesis** corresponds to \"better than chance\" or \"an effect.\"\n",
    "- Small p-value \u2192 evidence against the null hypothesis.\n",
    "- \"Small\" means less than 0.05.\n",
    "\n",
    "\n",
    "\n",
    "## p-value visualization\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/applet-chimps.png\" alt=\"\" style=\"width:75%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "The p-value is represented by the red area in the dotplot.\n",
    "\n",
    "## What a p-value is not\n",
    "\n",
    "- A **p-value** is the probability of finding a result *at least* as extreme/surprising, if outcomes happened by random chance alone.\n",
    "- Can be thought of as \n",
    "\n",
    "  $$\\mathrm{Pr}[\\text{result } | \\text{ null hypothesis is true}]$$\n",
    "\n",
    "  which is *not* the same as \n",
    "\n",
    "  $$\\mathrm{Pr}[\\text{null hypothesis is true } | \\text{ result}]$$\n",
    "\n",
    "## Computing p-values\n",
    "\n",
    "- To compute a p-value, we need a model for what the results would have looked like if there was no effect.\n",
    "- Three questions:\n",
    "    1. If the null was true, what would be the \"probability of success\"?\n",
    "    2. What should be the \"number of trials\" (also called the sample size)?\n",
    "    3. What value will you compare the simulated data to?\n",
    "\n",
    "\n",
    "# Did Dream cheat?\n",
    "\n",
    "## Dream\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- Dream is a Minecraft speedrunner.\n",
    "- A _speedrunner_ tries to beat a video game as quickly as possible.\n",
    "- In October 2020, Dream was accused of cheating during a Minecraft speedrun.\n",
    "    \n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/dream.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "    \n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## The evidence for cheating\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "- To beat Minecraft, you have to trade gold ingots with a piglin for ender pearls.\n",
    "- Each time you trade, there is a $\\frac{20}{423} \\approx .0473$ probability that the piglin will give you an ender pearl.\n",
    "- In 262 trades, Dream got ender pearls 42 times.\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/piglin.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "    \n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Hypotheses\n",
    "\n",
    "  \n",
    "- What are the null and alternative hypotheses for Dream's results?\n",
    "\n",
    "- The null hypothesis is that Dream just got lucky.\n",
    "- The alternative hypothesis is that Dream was cheating had a higher probability of receiving ender pearls.\n",
    "- If $\\pi$ is the parameter representing the probability Dream receives an Ender pearl, then the null hypothesis is $H_0 : \\pi = \\frac{20}{423}$ and the alternative hypothesis is $H_A : \\pi > \\frac{20}{423}$.\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "a. If the null was true, what would be the \"probability of success\"?\n",
    "\n",
    "b. What should be the \"number of trials\"?\n",
    "\n",
    "c. What value will we compare the simulated data to?\n",
    "\n",
    "\n",
    "a. $\\frac{20}{423}$ (the probability of getting an under pearl if the null is true). \n",
    "\n",
    "b. 262 (the number of times Dream traded).\n",
    "\n",
    "c. 42 (the number of times he received an Ender pearl).\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "\n",
    "\n",
    "- Let's use the <a href = https://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm> One Proportion applet</a>.\n",
    "\n",
    "- The p-value is essentially 0. There is very strong evidence that Dream was cheating.\n",
    "\n",
    "  <figure style=\"text-align:center;\"><img src=\"../figures/dream_p_value.png\" alt=\"\" style=\"width:75%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "## Aftermath\n",
    "\n",
    "- In May 2021, Dream admitted that he had cheated but claimed it was an accident.\n",
    "- For more information about this scandal, watch this [video](https://www.youtube.com/watch?v=8Ko3TdPy0TU).\n",
    "\n",
    "## More minecraft\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "- Suppose you did your own speedrun attempt.\n",
    "- In 400 piglin trades, you only got 15 ender pearls.\n",
    "- Is the game unfair against you?\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/piglin.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "    \n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Hypothesis\n",
    "\n",
    "\n",
    "- What are the null and alternative hypotheses for your results?\n",
    "\n",
    "- The null hypothesis is that the game is fair and the probability of receiving an ender pearl is $\\frac{20}{423}$.\n",
    "- The alternative hypothesis is that the game is unfair and the probability of receiving an ender pearl is less than $\\frac{20}{423}$.\n",
    "- In symbols: $H_0: \\pi=\\frac{20}{423}$ and $H_A : \\pi < \\frac{20}{423}$ where $\\pi$ is the probability of receiving an ender pearl.\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "a. If the null was true, what would be the \"probability of success\"?\n",
    "\n",
    "b. What should be the \"number of trials\"?\n",
    "\n",
    "c. What value will we compare the simulated data to?\n",
    "\n",
    "\n",
    "a. $\\frac{20}{423}$ (the probability of getting an ender pearl if the game is fair). \n",
    "\n",
    "b. 400 (the number of times you traded).\n",
    "\n",
    "c. 15 (the number of times you received an Ender pearl).\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "\n",
    "\n",
    "- In the <a href = https://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm> One Proportion applet</a> we have to change $\\ge$ to $\\le$ under **Count Samples**.\n",
    "\n",
    "- The p-value is around 0.21, we do not have evidence that the game is unfair.\n",
    "\n",
    "  <figure style=\"text-align:center;\"><img src=\"../figures/p-value-minecraft.png\" alt=\"\" style=\"width:75%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "# Test directions\n",
    "\n",
    "## Different alternative hypotheses\n",
    "\n",
    "- For Dream's results, the hypotheses were:\n",
    "  - $H_0 : \\pi = \\frac{20}{423}$\n",
    "  - $H_A : \\pi > \\frac{20}{423}$\n",
    "- For your results, the hypotheses were:\n",
    "  - $H_0 : \\pi = \\frac{20}{423}$\n",
    "  - $H_A : \\pi < \\frac{20}{423}$\n",
    "- The alternative hypotheses point in different directions!\n",
    "\n",
    "## Different comparisons\n",
    "\n",
    "The direction of the hypothesis changes what is \"extreme\".\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- For Dream, \"more extreme\" meant \"42 or *bigger*\"\n",
    "\n",
    "  <figure style=\"text-align:center;\"><img src=\"../figures/dream_p_value.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- For you, \"more extreme\" meant \"15 or *smaller*\"\n",
    "\n",
    "  <figure style=\"text-align:center;\"><img src=\"../figures/p-value-minecraft.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Determining the direction\n",
    "\n",
    "- Ask yourself: \"If the results weren't caused by random chance, what would the results look like?\"\n",
    "  - Dream: the alternative hypothesis is that Dream is cheating. We would expect him to get *more* Ender pearls than just by chance.\n",
    "  - You: the alternative hypothesis is that the game is unfair against you. We would expect you to get *fewer* Ender pearls than just by chance.\n",
    "- These are both examples of *directional* or *one-sided* hypotheses.\n",
    "\n",
    "## Non-directional hypotheses\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "- Sometimes, the alternative hypothesis does not have a clear direction.\n",
    "- *MythBusters* wanted to see if toast was more likely to land \"butter side up\" or \"butter side down.\"\n",
    "- They built a toast dropping rig and dropped 48 pieces of buttered toast.\n",
    "\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/Buttered_cat.png\" alt=\"\" style=\"width:40%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## Toast experiment\n",
    "\n",
    "\n",
    "- What is the null hypothesis?\n",
    "- Let $\\pi$ be the probability the toast lands butter side down. The null is $H_0 : \\pi = 0.5$.\n",
    "- What are some reasons why the null hypothesis could be false?\n",
    "\n",
    "  - The butter side is heavier and more likely to land butter side down $H_A : \\pi > 0.5$.\n",
    "  - Putting the butter on the toast makes it curved and more likely to land butter side up $H_A  : \\pi < 0.5$.\n",
    "- We can combine both of these reasons into a non-directional hypothesis: $H_A : \\pi \\neq 0.5$\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "- The MythBusters dropped 48 pieces of toast and 19 landed butter side down.\n",
    "- To compute the p-value:\n",
    "  a. If the null was true, what would be the \"probability of success\"?\n",
    "  b. What should be the \"number of trials\"?\n",
    "  c. What value will we compare the simulated data to?\n",
    "\n",
    "- The probability of success is 0.5, the number of trails is 48 and the number to compare to is 19.\n",
    "\n",
    "## Computing the p-value\n",
    "\n",
    "- Since we have a non-directional hypothesis, \"more extreme\" means \"more successes than expected\" *and* \"fewer successes than expected\".\n",
    "- When we compute the p-value we need to count simulations in both tails.\n",
    "- In the <a href = https://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm> One Proportion applet</a> we can select \"two-sided\"\n",
    "\n",
    "## p-value results\n",
    "\n",
    "\n",
    "<div class=\"layout\" style=\"display: flex; justify-content: space-around;\">\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "\n",
    "- The p-value is around 0.19.\n",
    "-  We do not have evidence against the null hypothesis that toast is equally likely to land butter side up or butter side down.\n",
    "\n",
    "</div>\n",
    "\n",
    "<div style=\"flex: 1;\" >\n",
    "\n",
    "<figure style=\"text-align:center;\"><img src=\"../figures/butter-p-value.png\" alt=\"\" style=\"width:100%;\"><figcaption> </figcaption></figure>\n",
    "\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "## One-sided hypothesis summary\n",
    "\n",
    "- Sometimes the alternative hypothesis has a direction:\n",
    "  - The number of successes is expected to be *bigger* ($>$) than what would happen by chance.\n",
    "  - The number of successes is expected to be *smaller* ($<$) than what would happen by chance.\n",
    "  - These called are **directional** or **one-sided** hypotheses.\n",
    "\n",
    "## Two-sided hypothesis summary\n",
    "\n",
    "\n",
    "- Sometimes the alternative hypothesis does not have direction:\n",
    "  - The number of success is expected to be **different** ($\\neq$) than what would happen by chance.\n",
    "  - These are **non-directional** or **two-sided** hypotheses.\n",
    "- The type of alternative hypothesis ($>$, $<$, $\\neq$) determines what is \"more extreme\" in the p-value calculation.\n",
    "\n",
    "## Two-sided as a default\n",
    "\n",
    "- If you are not sure whether to do a one-sided or two-sided test, a two-sided test is a good default.\n",
    "- You don't want to miss an interesting result just because the direction was different from what you were expecting.\n",
    "- The glue in [Post-It Notes](https://en.wikipedia.org/wiki/Post-it_note#History) was discovered by accident. A scientist at 3M was trying to create a strong glue but accidentally made a weak one!\n",
    "\n",
    "# Type 1 and type 2 errors\n",
    "\n",
    "## Types of errors\n",
    "\n",
    "\n",
    "- In hypothesis testing there are two potential errors that could be made:\n",
    "  \n",
    "  1. The null hypothesis is true, and we reject the null hypothesis.\n",
    "\n",
    "  2. The alternative hypothesis is true, and we do not reject the null hypothesis.\n",
    "\n",
    "- The first type of error is called a **type 1 error** and the second is called a **type 2 error**.\n",
    "\n",
    "\n",
    "## Types of errors visualized\n",
    "\n",
    "|        |                | **Truth** |        |\n",
    "|--------|----------------|----------:|-------:|\n",
    "|        |                | $H_0$     | $H_A$  |\n",
    "| **Decision** | reject $H_0$       | Type I error |      |\n",
    "|        | don't reject $H_0$ |            | Type II error |\n",
    "\n",
    "\n",
    "\n",
    "- **Type 1 error:** the null hypothesis is true, and we reject the null hypothesis. This is a \"false alarm\" or false positive.\n",
    "- **Type 2 error:** the null hypothesis is false, and we do not reject the null hypothesis. This is a \"missed opportunity\" or a false negative.\n",
    "\n",
    "\n",
    "\n",
    "## Types of errors for dream\n",
    "\n",
    "- In the Dream cheating example:\n",
    "\n",
    "    a. How would you describe a type 1 error in English?\n",
    "\n",
    "    b. How would you describe a type 2 error in English?\n",
    "\n",
    "a. A type 1 error would be accusing Dream of cheating when he is actually innocent.\n",
    "\n",
    "b. A type 2 error would be letting Dream get away with cheating.\n",
    "\n",
    "## Questions about errors\n",
    "\n",
    "- Can we know for sure if we have made a type 1 or type 2 error? Why or why not?\n",
    "- We cannot be certain that we haven't made a type 1 or type 2 error because we can't definitively know whether the null or alternative hypothesis is true.\n",
    "- How could we make sure that we *never* make a type 1 error? Would this be a good idea?\n",
    "- The only way to make sure we never make a type 1 error would be to never reject the null hypothesis. If we did this, we would make lots of type 2 errors.\n",
    "\n",
    "## Error rates\n",
    "\n",
    "- Since we cannot determine whether we have made an error, statisticians instead work with **error rates**.\n",
    "- The Type 1 error rate is:\n",
    "\n",
    "  $$\\frac{\\text{Number of times a type 1 error is made}}{\\text{Number of times the null hypothesis is true}}$$\n",
    "\n",
    "- The Type 2 error rate is:\n",
    "\n",
    "  $$\\frac{\\text{Number of times a type 2 error is made}}{\\text{Number of times the alternative hypothesis is true}}$$\n",
    "\n",
    "## p-values and type 1 error rate\n",
    "\n",
    "- Our rule: if the p-value is small (less than 0.05), then we reject the null hypothesis.\n",
    "- These rule makes the type 1 error rate at most 0.05.\n",
    "- If we made the threshold smaller (0.01 instead of 0.05), then what would happen to the type 1 and type 2 error rates?\n",
    "- The type 1 error rate would go down but the type 2 error rate would go up.\n",
    "\n",
    "## Type 2 error rate and power\n",
    "\n",
    "- The type 2 error rate is a bit more complicated.\n",
    "- The type 2 error rate depends on:\n",
    "  - The alternative value of $\\pi$.\n",
    "  - The sample size.\n",
    "- For a specific alternative $\\pi$ and sample size $n$, the power is the probability of (correctly) rejecting the null.\n",
    "\n",
    "  $$\\text{Power} = 1 - \\text{Type 2 error rate} $$\n",
    "\n",
    "## Example\n",
    "\n",
    "- On Monday, we saw an experiment where Sarah the chimpanzee solved 7 out of 8 problems. \n",
    "- We can use [an applet](https://www.rossmanchance.com/applets/2021/power/power.html) to study the power under different scenarios.\n",
    "- When researchers prepare to do a study, they often do a *power analysis* to determine the sample size.\n",
    "- For example: for a given alternative (maybe $\\pi=0.75$), how big does the sample size need to be so that the power is at least 80% or 90%?\n",
    "\n",
    "## Type 1 and 2 errors summary\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "|        |                | **Truth** |        |\n",
    "|--------|----------------|----------:|-------:|\n",
    "|        |                | $H_0$     | $H_A$  |\n",
    "| **Decision** | reject $H_0$       | Type I error |      |\n",
    "|        | don't reject $H_0$ |            | Type II error |\n",
    "\n",
    "- The type 1 error rate is the probability of incorrectly rejecting the null hypothesis. The type 1 error rate is equal to the p-value threshold (often 0.05).\n",
    "- The type 2 error rate is the probability of incorrectly not rejecting the null hypothesis. The type 2 error rate depends on the specific alternative and the sample size.\n",
    "\n",
    "## Hypothesis testing summary\n",
    "\n",
    "- For this week's quiz, you need to be able to:\n",
    "  - Describe null and alternative hypotheses in English and in terms of a parameter.\n",
    "  - Explain how you would use a simulation to compute a p-value.\n",
    "  - Interpret a given p-value in terms of the null hypothesis.\n",
    "- Other important topics:\n",
    "  - One-sided vs two-sided hypotheses.\n",
    "  - Type 1 and type 2 error rates."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}