{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lecture 21: Testing for Correlation\n",
    "\n",
    "STATS 60 / STATS 160 / PSYCH 10\n",
    "\n",
    "\n",
    "**Concepts and Learning Goals:**\n",
    "\n",
    "- Hypothesis test for correlation coefficients\n",
    "    - testing via simulation\n",
    "    - permutation test \n",
    "- Variability of the correlation coefficient\n",
    "    - via simulation! \"the bootstrap\"\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"right\"; flex-direction: column; align-items: \"right\";\">\n",
    "  <div>\n",
    "    <p style=\"font-size: smaller; text-align: \"right\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "\n",
    "## Taking inventory\n",
    "\n",
    "Suppose we have conducted an experiment and we have used our data to compute a summary statistic, $\\hat{T}$. \n",
    "\n",
    "We suspect that $\\hat{T}$ is indicative of a trend. But it could just be random noise...\n",
    "\n",
    "- Example 1: a student takes a 10-question True/False test, and $\\hat{T} = 8/10$ is the fraction answered correctly. \n",
    "    - It seems like the student knows the material!\n",
    "    - Is it possible they were just guessing randomly?\n",
    "\n",
    "- Example 2: we run a randomized controlled trial to see if retrieval practice helps students study, and $\\hat{T} = -1$ is the difference in mean scores between the treatment and control group.\n",
    "    - It seems like retrieval practice is *worse* for remembering the material!\n",
    "    - Is a $-1$ difference in average score really a lot? Could it just be noise?\n",
    "\n",
    "\n",
    "\n",
    "**Question:** \n",
    "\n",
    "1. Explain what a hypothesis test is, and why we would do it, in plain English, using 25 words or less.\n",
    "\n",
    "2. Explain what a null hypothesis is in plain English, using 25 words or less.\n",
    "\n",
    "3. Explain what a $p$-value is in plain English, using 25 words or less.\n",
    "\n",
    "\n",
    "## Hypothesis testing recap\n",
    "\n",
    "Suppose we have conducted an experiment and we have used our data to compute a summary statistic, $\\hat{T}$.\n",
    "\n",
    "We suspect that $\\hat{T}$ is indicative of a trend. But it could just be random noise...\n",
    "\n",
    "\n",
    "A **hypothesis test** is a thought experiment to help us figure out whether it is likely that our observation $\\hat{T}$ is just random noise.\n",
    "\n",
    "\n",
    "The **null hypothesis** is that our data is just random, with no trend. \n",
    "\n",
    "<font color=\"gray\">The specifics of the null hypothesis depend on the experiment we ran.</font>\n",
    "\n",
    "\n",
    "The **p-value** is the chance of observing $\\hat{T}$ or an even stronger trend under the null hypothesis, if the data were random.\n",
    "\n",
    "## Correlation\n",
    "\n",
    "Suppose that I have sampled $n$ individuals from my population, and for each I have measured the values $(x_i,y_i)$.\n",
    "\n",
    "\n",
    "For example:\n",
    "\n",
    "- Penguins: $x =$ body mass, $y=$ beak length\n",
    "\n",
    "- Health: $x=$ weight, $y=$ breakfast days/week\n",
    "\n",
    "- Economics: $x=$ years of education, $y=$ salary\n",
    "\n",
    "- College admissions: $x=$ SAT score, $y=$ sophomore-year GPA\n",
    "\n",
    "\n",
    "The *correlation coefficient* ($R$) of $x$ and $y$ is the slope of the best-fit line for the *standardized* datasets $x_1,\\ldots,x_n$ and $y_1,\\ldots,y_n$.\n",
    "\n",
    "![Correlation coefficient for Penguin body mass vs. beak length](../figures/mass-beak-standard.png)\n",
    "\n",
    "## Do students who study more sleep more?\n",
    "\n",
    "Let's look at some data from the course survey. \n",
    "Here is a scatterplot of your self-reported \"hours of sleep\" vs. \"hours of studying:\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/scatter-sleep-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "Does it look to you like there is a positive association, negative association, or neither?\n",
    "\n",
    "## Correlation coefficient for sleep vs. study\n",
    "\n",
    "Here is the best-fit line. $R = -.19$.\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-sleep-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "Is this a real trend or is it just noise?\n",
    "\n",
    "## Is this a significant correlation?\n",
    "\n",
    "How can we decide if the correlation coefficient $R$ is **large**? How can we decide if it **significant**?\n",
    "\n",
    "\n",
    "To decide if $R$ is **large**: compare to its max/minvalues, $1$ and $-1$.\n",
    "\n",
    "\n",
    "But $R$ can be **significant** (a real linear association) even when $|R|$ is smaller than 1.\n",
    "\n",
    "![Correlation coefficient for Penguin body mass vs. beak length](../figures/beak-depth-flipper-length.png)\n",
    "\n",
    "\n",
    "## Testing for correlation\n",
    "\n",
    "Suppose that we compute the correlation coefficient, and see that it has value $R \\neq 0$.\n",
    "\n",
    "\n",
    "Is this just a coincidence? Or is the correlation a real pattern?\n",
    "\n",
    "![](../figures/mass-beak-standard.png)\n",
    "\n",
    "\n",
    "Let's formulate this as a hypothesis testing problem:\n",
    "\n",
    "1. Null Hypothesis: there is no correlation.\n",
    "\n",
    "2. How can we compute the $p$-value?\n",
    "\n",
    "    - We'll use simulation! \n",
    "\n",
    "## Permuting the datapoints\n",
    "\n",
    "Let's assume, from now on, that our $x_i$ and $y_i$ are standardized.\n",
    "\n",
    "Suppose there really is a positive association between $(x_i,y_i)$. \n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/scatter-perf.png\" style=\"width:\"700\";\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "\n",
    "Now, what if we randomly shuffle or *permute* the $y_i$, so that they are matched to a random $x_j$?\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/permute.jpg\" style=\"width:\"300\";\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "\n",
    "There's almost certainly no correlation now!\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/scatter-perm.png\" style=\"width:\"700\";\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "## Null hypothesis based on shuffling\n",
    "\n",
    "Suppose we have computed $R$ from our data. \n",
    "\n",
    "We want to know if $R$ reflects an actual trend, or if it is just noise.\n",
    "\n",
    "\n",
    "**Null hypothesis:** the pairs $(x_i,y_i)$ are paired up totally randomly.\n",
    "\n",
    "\n",
    "1. **Question:** Why is this null hypothesis saying that there is no real correlation?\n",
    "\n",
    "2. **Question:** In plain English, what is the $p$-value of $R$ for this null hypothesis?\n",
    "\n",
    "- The null is saying there is no real correlation because if the pairing of $x_i$ and $y_i$ is arbitrary/random, there would almost certainly not be a linear relationship (as long as $n > 2$, two points always make a line!).\n",
    "\n",
    "- The $p$-value is the chance that you'd get this value of $R$, or a more extreme one, if the data were paired up by randomly shuffling.\n",
    "\n",
    "## Permutation test for correlation\n",
    "\n",
    "Suppose we have computed $R$ from our data. \n",
    "\n",
    "We want to know if $\\hat R$ reflects an actual trend, or if it is just noise.\n",
    "\n",
    "\n",
    "**Null hypothesis:** the pairs $(x_i,y_i)$ are paired up totally randomly.\n",
    "\n",
    "**$p$-value:** the chance that you'd get this value of $R$, or a more extreme one, if the data were paired up by randomly shuffling.\n",
    "\n",
    "\n",
    "We'll compute the $p$-value using a **Permutation test,** a test based on simulation:\n",
    "\n",
    "1. Do some large number $T$ of repetitions of the following experiment:\n",
    "\n",
    "    a. Randomly permute/shuffle the $y_i$ so that each is paired with some random $x_j$\n",
    "\n",
    "    b. Compute and record the correlation coefficient for the shuffled dataset\n",
    "\n",
    "2. Make a histogram of the correlation coefficient values for all $T$ trials.\n",
    "\n",
    "3. Decide the $p$-value for $R$ based on how extreme it is relative to the histogram:\n",
    "\n",
    "    - If $R$ is positive, the $p$-value for $R$ is the fraction of histogram values larger than $R$ (or $1/T$, if none are larger).\n",
    "\n",
    "    - If $R$ is negative, the $p$-value for $R$ is the fraction of histogram values smaller than $R$ (or $1/T$, if none are smaller).\n",
    "\n",
    "## The penguins\n",
    "\n",
    "In our correlation lecture, we computed a correlation coefficient of $\\hat R = 0.67$ for Gentoo Penguin body mass vs. beak length.\n",
    "\n",
    "![](../figures/mass-beak-standard.png)\n",
    "\n",
    "\n",
    "What is the $p$-value?\n",
    "\n",
    "## Permutation test for penguin correlation\n",
    "\n",
    "I ran a simulation with $T = 10,000$ trials. \n",
    "\n",
    "In each trial, I chose a new random permutation of the $y_i$, and computed the correlation coefficient.\n",
    "\n",
    "\n",
    "Here is trial 1:\n",
    "\n",
    "\n",
    "![](../figures/penguin-random-perm.png)\n",
    "\n",
    "\n",
    "Here is trial 2:\n",
    "\n",
    "![](../figures/penguin-random-perm-2.png)\n",
    "\n",
    "\n",
    "Etc.\n",
    "\n",
    "## Aggregating the trials\n",
    "\n",
    "Below is a histogram of the dataset of the correlation coefficients from each of the $T$ trials.\n",
    "\n",
    "\n",
    "![](../figures/perm-test-penguin.png)\n",
    "\n",
    "\n",
    "Using the permutation test, the $p$-value is at most $.0001$:\n",
    "\n",
    "We conclude that $R$ is statistically significant at level $\\alpha = 0.05$ (or smaller).\n",
    "\n",
    "## GDP in 1960 vs. 2000\n",
    "\n",
    "In the correlation lecture we also computed the correlation coefficient for the GDP of countries in 1960 vs. 2000. \n",
    "\n",
    "![](../figures/GDP-1960-2000.png)\n",
    "\n",
    "We can do a permutation test to check if this value of $R$ is statistically significant.\n",
    "\n",
    "## P-value for GDP correlation\n",
    "\n",
    "For $T = 10,000$ permutations, the $p$-value is for the correlation coefficient of 1960 vs. 2000 GDP is $1/10,000$:\n",
    "\n",
    "![](../figures/perm-test-gdp.png)\n",
    "\n",
    "## Back to sleep vs. study\n",
    "\n",
    "Let's return to our course survey data, about hours of sleep vs. hours of study.\n",
    "\n",
    "We calculated $R = -.19$ for this data. Is it significant?\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-sleep-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "## P-value for sleep vs. study\n",
    "\n",
    "Via simulation, for a $T = 10,000$-trial permutation test, we verify that the $p$-value is $\\approx 0.13$, so the trend is *not* statistically significant at level $\\alpha = .05$.\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/perm-test-sleep-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "## SAT score vs. study\n",
    "\n",
    "Let's look at a different correlation in our course survey dataset: SAT score vs. average number of hours a week spent studying.\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/scatter-sat-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "## Correlation coefficient for SAT score vs. study\n",
    "\n",
    "The correlation coefficient is positive (unsurprisingly?), $R = .3$\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-sat-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "## Statistically significant?\n",
    "\n",
    "Again we do $T = 10,000$ permutation tests:\n",
    "\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/permtest-sat-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "Statistically significant at level $\\alpha = 0.05$!\n",
    "\n",
    "## Variability\n",
    "\n",
    "We computed $R$ for our data.\n",
    "But what about its variability?\n",
    "\n",
    "- Is $R$ being influenced by outliers?\n",
    "\n",
    "- If our data had been sampled a bit differently, would my value of $R$ be dramatically different?\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-sat-study.png\" style=\"width:700;\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "\n",
    "The permutation-test $p$-values only give us a sense of statistical significance of a correlation: \n",
    "- we can see how extreme $R$ is relative to a random shuffling of the data (no correlation)\n",
    "- we cannot see how $R$ would vary if we had a different sample from data with the *same* type of associative relationship.\n",
    "\n",
    "## How much variability in SAT score vs. study?\n",
    "\n",
    "The SAT score vs. study trend:\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-sat-study.png\" />\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "- Would the value of $R$ changed to a (smaller or larger) negative value if we had sampled differently? There appear to be some outliers.\n",
    "\n",
    "- How much smaller or larger?\n",
    "\n",
    "## Estimating variability with simulation\n",
    "\n",
    "We can use simulation to estimate *variability* too. \n",
    "\n",
    "- The best case scenario would be, if we have access to *new* samples from our population, just collect more samples and compute a fresh correlation coefficient a bunch of times.\n",
    "\n",
    "- But what if we don't have access to new samples?\n",
    "\n",
    "\n",
    "The following approach is called the *bootstrap*:\n",
    "\n",
    "1. Start with our dataset $(x_1,y_1),\\ldots,(x_n,y_n)$.\n",
    "\n",
    "2. For some large number of trials $T$:\n",
    "\n",
    "    a. Sample $n$ pairs *independently with replacement* from the dataset: $$(x_{i_1},y_{i_1}),\\ldots,(x_{i_n},y_{i_n})$$\n",
    "\n",
    "    b. Compute and record the correlation coefficient of these pairs.\n",
    "    \n",
    "3. Form a histogram of the correlation coefficients from all $T$ trials.\n",
    "\n",
    "## The Bootstrap\n",
    "\n",
    "\n",
    "To get a sense of the variability of $R$:\n",
    "\n",
    "1. Start with our dataset $(x_1,y_1),\\ldots,(x_n,y_n)$.\n",
    "\n",
    "2. For some large number of trials $T$:\n",
    "\n",
    "    a. Sample $n$ pairs *independently with replacement* from the dataset: $$(x_{i_1},y_{i_1}),\\ldots,(x_{i_n},y_{i_n})$$\n",
    "\n",
    "    b. Compute and record the correlation coefficient of these pairs.\n",
    "    \n",
    "3. Form a histogram of the correlation coefficients from all $T$ trials.\n",
    "\n",
    "\n",
    "**Question:** why could this simulation give us a good sense of variability? Will it account for outliers?\n",
    "    \n",
    "- There's a reasonable chance that we'll avoid any specific outlier in a trial when we sample with replacement: $$\\Pr[\\text{avoid i }] = (1-\\frac{1}{n})^n \\approx e^{-1} \\approx \\frac{1}{3} \\text{ when }n\\text{ large.}$$ \n",
    "\n",
    "\n",
    "**Question:** will this always give us a good idea of the variability of $R$?\n",
    "\n",
    "- Not necessarily; our sample could just be really weird.\n",
    "\n",
    "## Simulation for variability in sat vs. study\n",
    "\n",
    "\n",
    "If we do a bootstrap simulation with $T= 10,000$ trials, we can see that the confidence interval around $R$ is actually quite small:\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/bootstrap-sat-study.png\" style=\"width:\"700\";\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "This gives us some sense of the variability of $R$.\n",
    "\n",
    "## Variability of the best-fit line\n",
    "\n",
    "Here you can see the original scatterplot and best-fit line, with the lines corresponding to the mean +/- a standard deviation\n",
    "\n",
    "The line doesn't too change much! Variability of $R$ is reasonably low.\n",
    "\n",
    "<div style=\"display: flex; justify-content: \"center\"; flex-direction: column; align-items: \"center\";\">\n",
    "  <div>\n",
    "    <img src=\"../figures/R-std-sat-study.png\" style=\"width:\"700\";\"/>\n",
    "    <p style=\"font-size: smaller; text-align: \"center\"; margin-top: 4px;\"></p>\n",
    "  </div>\n",
    "</div>\n",
    "\n",
    "\n",
    "## Recap\n",
    "\n",
    "- Testing for correlation\n",
    "    - Using simulation: permutation tests\n",
    "- Variability of correlation\n",
    "    - Using simulation: \"the bootstrap\"\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Phython (JB)",
   "language": "python",
   "name": "jb-python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}