{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: 'Classification'\n",
    "author: \"Sergio Bacallado, Jonathan Taylor\"\n",
    "subtitle: \"[web.stanford.edu/class/stats202](http://web.stanford.edu/class/stats202)\"\n",
    "date: \"Autumn 2020\"\n",
    "output:\n",
    "  slidy_presentation:\n",
    "    css: styles.css\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Supervised learning with a **qualitative or categorical** response.\n",
    "\n",
    "# Basic approach\n",
    "\n",
    "Just as common, if not more common than regression:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. *Medical diagnosis:* Given the symptoms a patient shows, predict which of 3 conditions they are attributed to.\n",
    "\n",
    "2. *Online banking:* Determine whether a transaction is fraudulent or not, on the basis of the IP address, client's history, etc.\n",
    "\n",
    "3. *Web searching:* Based on a user's history, location, and the string of a web search, predict which link a person is likely to click. \n",
    "\n",
    "4. *Online advertising:* Predict whether a user will click on an ad or not. \n",
    "\n",
    "## Review: Bayes classifier\n",
    "\n",
    "- Suppose $P(Y\\mid X)$ is known. Then, given an input $x_0$, we predict the response\n",
    "\n",
    "$$\\hat y_0 = \\text{argmax}_{\\;y}\\; P(Y=y \\mid X=x_0).$$  \n",
    "\n",
    "- The Bayes classifier minimizes the expected 0-1 loss:\n",
    "\n",
    "$$ E\\left[ \\frac{1}{m} \\sum_{i=1}^m \\mathbf{1}(\\hat y_i \\neq y_i) \\right]$$\n",
    "\n",
    "- This minimum 0-1 loss (the best we can hope for) is the  **Bayes error rate**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic strategy: estimate $P(Y\\mid X)$\n",
    "\n",
    "<ul>\n",
    "<li>If we have a good estimate for the conditional probability $\\hat P(Y\\mid X)$, we can use the classifier:\n",
    "\n",
    "$$\\hat y_0 = \\text{argmax}_{\\;y}\\; \\hat P(Y=y \\mid X=x_0).$$  \n",
    "\n",
    "<li> Suppose $Y$ is a binary variable. Could we use a linear model?\n",
    "\n",
    "$$P(Y=1 | X) = \\beta_0 + \\beta_1X_1 + \\dots+ \\beta_1X_p $$\n",
    "\n",
    "<li> \n",
    "Problems:\n",
    "<ul>\n",
    "<li> This would allow probabilities $<0$ and $>1$. \n",
    "<li> Difficult to extend to more than 2 categories.\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic regression\n",
    "\n",
    "- We model the joint probability as:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(Y=1 \\mid X) & \\frac{e^{\\beta_0 + \\beta_1 X_1 +\\dots+\\beta_p X_p}}{1+e^{\\beta_0 + \\beta_1 X_1 +\\dots+\\beta_p X_p}} \n",
    "P(Y=0 \\mid X) &= \\frac{1}{1+e^{\\beta_0 + \\beta_1 X_1 +\\dots+\\beta_p X_p}}.\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "This is the same as using a linear model for the log odds:\n",
    "\n",
    "$$\\log\\left[\\frac{P(Y=1 \\mid X)}{P(Y=0 \\mid X)}\\right] = \\beta_0 + \\beta_1 X_1 +\\dots+\\beta_p X_p.$$\n",
    "\n",
    "## Fitting logistic regression\n",
    "\n",
    "- The training data is a list of pairs $(y_1,x_1), (y_2,x_2), \\dots, (y_n,x_n)$. \n",
    "\n",
    "- We don't observe the left hand side in the model \n",
    "\n",
    "$$\\log\\left[\\frac{P(Y=1 \\mid X)}{P(Y=0 \\mid X)}\\right] = \\beta_0 + \\beta_1 X_1 +\\dots+\\beta_p X_p,$$\n",
    "\n",
    "- <font color=\"red\">$\\implies$</font> We cannot use a least squares fit."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Likelihood\n",
    "\n",
    "- **Solution:**\n",
    "The likelihood is the probability of the training data, for a fixed set of coefficients $\\beta_0,\\dots,\\beta_p$:\n",
    "\n",
    "$$\\prod_{i=1}^n P(Y=y_i \\mid X=x_i) $$\n",
    "\n",
    "- We can rewrite as \\onslide<2->{\n",
    "\n",
    "$$\n",
    "\\underbrace{\\prod_{i; y_i=1}\\frac{e^{\\beta_0 + \\beta_1 x_{i1} +\\dots+\\beta_p x_{ip}}}{1+e^{\\beta_0 + \\beta_1 x_{i1} +\\dots+\\beta_p x_{ip}}}}_\\text{Probability of $Y$ = 1 given $X$'s}\n",
    "\\underbrace{\\prod_{j; y_j=0}\\frac{1}{1+e^{\\beta_0 + \\beta_1 x_{j1} +\\dots+\\beta_p x_{jp}}}}_\\text{Probability of $Y$ = 0 given $X$'s}\n",
    "$$\n",
    "\n",
    "- Choose estimates $\\hat \\beta_0, \\dots,\\hat \\beta_p$ which maximize the likelihood.\n",
    "\n",
    "- Solved with numerical methods (e.g. Newton's algorithm).\n",
    " \n",
    "## Logistic regression in R"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(ISLR)\n",
    "glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,\n",
    "              family=binomial, data=Smarket)\n",
    "summary(glm.fit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inference for logistic regression \n",
    "\n",
    "1. We can estimate the Standard Error of each coefficient.\n",
    "\n",
    "2. The $z$-statistic is the equivalent of the $t$-statistic in linear regression:\n",
    "\n",
    "$$z = \\frac{\\hat \\beta_j}{\\text{SE}(\\hat\\beta_j)}.$$\n",
    "\n",
    "3. The $p$-values are test of the null hypothesis $\\beta_j=0$ (Wald's test).\n",
    "\n",
    "4. Other possible hypothesis tests: likelihood ratio test (chi-square distribution). \n",
    "\n",
    "## Example: Predicting credit card `default`\n",
    "\n",
    "Predictors:\n",
    "\n",
    "- `student`: 1 if student, 0 otherwise\n",
    "\n",
    "- `balance`: credit card balance\n",
    "\n",
    "- `income`: person's income.\n",
    "\n",
    "## Confounding\n",
    "\n",
    "In this dataset, there is *confounding*, but little collinearity.\n",
    "\n",
    "- Students tend to have higher balances. So, `balance` is explained by `student`, but not very well. \n",
    "\n",
    "- People with a high `balance` are more likely to default.\n",
    "\n",
    "- Among people with a given `balance`, students are less likely to default. \n",
    "\n",
    "## Results: predicting credit card `default`\n",
    " \n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.3.png\" height=\"600\">\n",
    "</div>\n",
    " \n",
    "## Using only `balance`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(ISLR) # where Default data is stored\n",
    "summary(glm(default ~ balance,\n",
    "        family=binomial, data=Default))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using only `student`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary(glm(default ~ student,\n",
    "        family=binomial, data=Default))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using both `balance` and `student`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary(glm(default ~ balance + student,\n",
    "        family=binomial, data=Default))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using all 3 predictors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary(glm(default ~ balance + income + student,\n",
    "        family=binomial, data=Default))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multinomial logistic regression\n",
    "\n",
    "- Extension of logistic regression to more than 2 categories\n",
    "\n",
    "- Suppose $Y$ takes values in $\\{1,2,\\dots,K\\}$, then we can use a linear model for the log odds against a baseline category (e.g. 1): for $j \\neq 1$\n",
    "\n",
    "$$\\log\\left[\\frac{P(Y=j \\mid X)}{P(Y=1 \\mid X)}\\right] = \\beta_{0,j} + \\beta_{1,j} X_1 +\\dots+\\beta_{p,j} X_p$$\n",
    "\n",
    "- In this case $\\beta \\in \\mathbb{R}^{p \\times (K-1)}$ is a *matrix* of coefficients.\n",
    "\n",
    "## Some potential problems\n",
    "\n",
    "-  The coefficients become unstable when there is collinearity. Furthermore, this affects the convergence of the fitting algorithm.\n",
    "\n",
    "-  When the classes are well separated, the coefficients become unstable. This is always the case when $p\\geq n-1$.\n",
    "In this case, prediction error is low, but $\\hat{\\beta}$ is very variable.\n",
    "\n",
    "# Linear Discriminant Analysis (LDA)\n",
    "\n",
    "<ul>\n",
    "<li>**Strategy:** Instead of estimating $P(Y\\mid X)$ directly, we could estimate:\n",
    "<ol>\n",
    "<li>$\\hat P(X \\mid Y)$: Given the response, what is the distribution of the inputs.\n",
    "<li>$\\hat P(Y)$: How likely are each of the categories.\n",
    "</ol>\n",
    "<li>Then, we use \\emph{Bayes rule} to obtain the estimate:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\hat P(Y = k \\mid X = x) &= \\frac{\\hat P(X = x \\mid Y = k) \\hat P(Y = k)}{\\hat P(X=x)} \\\\\n",
    "&= \\frac{\\hat P(X = x \\mid Y = k) \\hat P(Y = k)}{\\sum_{j=1}^K\\hat P(X = x \\mid Y=j) \\hat P(Y=j)}\n",
    "\\end{aligned}$$\n",
    "</ul>\n",
    "\n",
    "## LDA: multivariate normal with equal covariance\n",
    "\n",
    "- LDA is the special case of the above strategy when $P(X \\mid Y=k) = N(\\mu_k, \\mathbf\\Sigma)$.\n",
    "\n",
    "- That is, within each class the features have multivariate normal distribution with center depending on the class\n",
    "and **common covariance $\\mathbf\\Sigma$.**\n",
    "\n",
    "- The probabilities $P(Y=k)$ are estimated by the fraction of training samples of class $k$.\n",
    "\n",
    "## Decision boundaries\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.6.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "## LDA has (piecewise) linear decision boundaries\n",
    "\n",
    "Suppose that:\n",
    "\n",
    "1. We know $P(Y=k) = \\pi_k$ exactly.\n",
    "\n",
    "2. $P(X=x | Y=k)$ is Mutivariate Normal with density:\n",
    "\n",
    "$$f_k(x) = \\frac{1}{(2\\pi)^{p/2}|\\mathbf\\Sigma|^{1/2}} e^{-\\frac{1}{2}(x-\\mu_k)^T \\mathbf{\\Sigma}^{-1}(x-\\mu_k)}$$\n",
    "\n",
    "3. Above: $\\mu_k: $ Mean of the inputs for category $k$\n",
    "and $\\mathbf\\Sigma:$ isovariance matrix <font color=\"red\">(common to all categories)</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, what is the Bayes classifier?\n",
    "\n",
    "## LDA has linear decision boundaries\n",
    "\n",
    "- By Bayes rule, the probability of category $k$, given the input $x$ is:\n",
    "\n",
    "$$P(Y=k \\mid X=x) = \\frac{f_k(x) \\pi_k}{P(X=x)}$$\n",
    "\n",
    "- The denominator does not depend on the response $k$, so we can write it as a constant:\n",
    "\n",
    "$$P(Y=k \\mid X=x) = C \\times f_k(x) \\pi_k$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Now, expanding $f_k(x)$:\n",
    "\n",
    "$$P(Y=k \\mid X=x) =  \\frac{C\\pi_k}{(2\\pi)^{p/2}|\\mathbf\\Sigma|^{1/2}} e^{-\\frac{1}{2}(x-\\mu_k)^T \\mathbf{\\Sigma}^{-1}(x-\\mu_k)}$$\n",
    "\n",
    "- Let's absorb everything that does not depend on $k$ into a constant $C'$:\n",
    "\n",
    "$$P(Y=k \\mid X=x) =  C'\\pi_k e^{-\\frac{1}{2}(x-\\mu_k)^T \\mathbf{\\Sigma}^{-1}(x-\\mu_k)}$$\n",
    "\n",
    "- Take the logarithm of both sides:\n",
    "$$\\log P(Y=k \\mid X=x) =  \\color{Red}{\\log C'} + \\color{blue}{\\log \\pi_k  - \\frac{1}{2}(x-\\mu_k)^T \\mathbf{\\Sigma}^{-1}(x-\\mu_k)}.$$\n",
    "\n",
    "- <font color=\"red\">This is the same for every category, $k$.</font>\n",
    "\n",
    "- <font color=\"blue\">We want to find the maximum of this expression over $k$.</font>\n",
    "\n",
    "## LDA has linear decision boundaries\n",
    "\n",
    "- Goal is to maximize the following over $k$:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "&\\log \\pi_k  - \\frac{1}{2}(x-\\mu_k)^T \\mathbf{\\Sigma}^{-1}(x-\\mu_k).\\\\\n",
    "=&\\log \\pi_k  - \\frac{1}{2}\\left[x^T \\mathbf{\\Sigma}^{-1}x + \\mu_k^T \\mathbf{\\Sigma}^{-1}\\mu_k\\right] + x^T \\mathbf{\\Sigma}^{-1}\\mu_k  \\\\\n",
    "=& C'' +  \\log \\pi_k  - \\frac{1}{2}\\mu_k^T \\mathbf{\\Sigma}^{-1}\\mu_k + x^T \\mathbf{\\Sigma}^{-1}\\mu_k\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "- We define the objectives (called *discriminant functions*):\n",
    "$$\\delta_k(x) = \\log \\pi_k  - \\frac{1}{2}\\mu_k^T \\mathbf{\\Sigma}^{-1}\\mu_k + x^T \\mathbf{\\Sigma}^{-1}\\mu_k$$\n",
    "At an input $x$, we predict the response with the highest $\\delta_k(x)$.\n",
    "\n",
    "## LDA has linear decision boundaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- What is the decision boundary? It is the set of points $x$ in which 2 classes do just as well (i.e. the discriminant\n",
    "functions of the two classes agree at $x$):\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\delta_k(x) &= \\delta_\\ell(x) \\\\\n",
    "\\log \\pi_k  - \\frac{1}{2}\\mu_k^T \\mathbf{\\Sigma}^{-1}\\mu_k + x^T \\mathbf{\\Sigma}^{-1}\\mu_k \n",
    "& =\n",
    "\\log \\pi_\\ell  - \\frac{1}{2}\\mu_\\ell^T \\mathbf{\\Sigma}^{-1}\\mu_\\ell + x^T \\mathbf{\\Sigma}^{-1}\\mu_\\ell\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "- This is a linear equation in $x$.\n",
    "\n",
    "## Decision boundaries revisited\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.6.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "## Estimating $\\pi_k$\n",
    "\n",
    "$$\\hat \\pi_k = \\frac{\\#\\{i\\;;\\;y_i=k\\}}{n}$$\n",
    "\n",
    "In English: the fraction of training samples of class $k$.\n",
    "\n",
    "## Estimating the parameters of $f_k(x)$\n",
    "\n",
    "<ul>\n",
    "<li>Estimate the center of each class $\\mu_k$:\n",
    "\n",
    "$$\\hat\\mu_k = \\frac{1}{\\#\\{i\\;;\\;y_i=k\\}}\\sum_{i\\;;\\; y_i=k} x_i$$\n",
    "\n",
    "<li>Estimate the common covariance matrix $\\mathbf{\\Sigma}$:\n",
    "<ul>\n",
    "<li> One predictor ($p=1$):\n",
    "\n",
    "$$\\hat \\sigma^2 = \\frac{1}{n-K}\\sum_{k=1}^K \\;\\sum_{i:y_i=k} (x_i-\\hat\\mu_k)^2.$$\n",
    "\n",
    "<li> Many predictors ($p>1$): Compute the vectors of deviations $(x_1 -\\hat \\mu_{y_1}),(x_2 -\\hat \\mu_{y_2}),\\dots,(x_n -\\hat \\mu_{y_n})$ and\n",
    "use an unbiased estimate of its covariance matrix, $\\mathbf\\Sigma$.\n",
    "</ul>\n",
    "</ul>\n",
    "\n",
    "## Estimating the parameters of $f_k(x)$\n",
    "\n",
    "- For an input $x$, predict the class with the largest:\n",
    "\n",
    "$$\\hat\\delta_k(x) = \\log \\hat\\pi_k  - \\frac{1}{2}\\hat\\mu_k^T \\mathbf{\\hat\\Sigma}^{-1}\\hat\\mu_k + x^T \\mathbf{\\hat\\Sigma}^{-1}\\hat\\mu_k$$\n",
    "\n",
    "- The decision boundaries are defined by $\\left\\{x: \\delta_k(x) = \\delta_{\\ell}(x) \\right\\}, 1 \\leq j,\\ell \\leq K$.\n",
    "\n",
    "# Quadratic discriminant analysis (QDA)\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.9.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "- The assumption that the inputs of every class have the same covariance $\\mathbf{\\Sigma}$ can be quite restrictive:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QDA: multivariate normal with differing covariance\n",
    "\n",
    "- In **quadratic discriminant analysis** we estimate a mean $\\hat\\mu_k$ and a covariance matrix $\\hat{\\mathbf \\Sigma}_k$ for each class separately.\n",
    "\n",
    "- Given an input, it is easy to derive an objective function:\n",
    "$$\\delta_k(x) = \\log \\pi_k  - \\frac{1}{2}\\mu_k^T \\mathbf{\\Sigma}_k^{-1}\\mu_k + x^T \\mathbf{\\Sigma}_k^{-1}\\mu_k -\n",
    "\\frac{1}{2}x^T \\mathbf{\\Sigma}_k^{-1}x -\\frac{1}{2}\\log |\\mathbf{\\Sigma}_k|$$\n",
    "\n",
    "- This objective is now quadratic in $x$ and so the decision boundaries are 0s of quadratic functions. \n",
    "\n",
    "## QDA decision boundarieskQuadratic discriminant analysis (QDA)\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.9.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "Bayes boundary (<font color=\"#800080\">-- -- --</font>), LDA ($\\cdot\\cdot\\cdot$), QDA (<font color=\"#008080\">--------</font>).\n",
    "\n",
    "## Evaluating a classification method\n",
    "\n",
    "- We have talked about the 0-1 loss:\n",
    "\n",
    "$$\\frac{1}{m}\\sum_{i=1}^m \\mathbf{1}(y_i \\neq \\hat y_i).$$\n",
    "\n",
    "- It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn't tell you anything about this.\n",
    "\n",
    "- A much more informative summary of the error is a **confusion matrix**:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/confusion/confusion-abstract.png\" height=\"250\">\n",
    "</div>\n",
    "\n",
    "## Evaluating a classification method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(MASS) # where the `lda` function lives\n",
    "lda.fit = predict(lda(default ~ balance + student, data=Default))\n",
    "table(lda.fit$class, Default$default)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.  The error rate among people who do **not** default (false positive rate) is very low.\n",
    "\n",
    "2. However, the rate of false negatives is 76%. \n",
    "\n",
    "3. It is possible that false negatives are a bigger source of concern!\n",
    "\n",
    "4. One possible solution: Change the <font color=\"red\">threshold</font>\n",
    "\n",
    "## Evaluating a classification method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new.class = rep(\"No\", length(Default$default))\n",
    "new.class[lda.fit$posterior[,\"Yes\"] > 0.2] = \"Yes\"\n",
    "table(new.class, Default$default)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Predicted `Yes` if $P(\\mathtt{default}=\\text{yes} | X) > \\color{Red}{0.2}$.\n",
    "\n",
    "- Changing the threshold to 0.2 makes it easier to classify to `Yes`.\n",
    "\n",
    "- Note that the rate of false positives became higher! That is the price to pay for fewer false negatives. \n",
    "\n",
    "## Evaluating a classification method\n",
    "\n",
    "Let's visualize the dependence of the error on the threshold:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.7.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "<font color=\"#0000FF\">-- -- --  False negative rate </font> (error for defaulting customers),\n",
    "<font color=\"#FFA500\">$\\cdot\\cdot\\cdot$ False positive rate </font> (error for non-defaulting customers),\n",
    "<font color=\"#000000\">-------- Overall error rate</font>.\n",
    "\n",
    "## The ROC curve\n",
    "\n",
    "<div align=\"center\">\n",
    "<table>\n",
    "<tr>\n",
    "<td>\n",
    "<img src=\"figs/Chapter4/4.8.png\" height=\"600\">\n",
    "</td>\n",
    "<td style=\"vertical-align:top\">\n",
    "<ul>\n",
    "<li>Displays the performance of the method for any choice of threshold.\n",
    "<li>The area under the curve (AUC) measures the quality of the classifier:\n",
    "<ul>\n",
    "<li>0.5 is the AUC for a random classifier\n",
    "<li>The closer the AUC is to 1, the better.\n",
    "</ul>\n",
    "</ul>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparing classification methods through simulation\n",
    "\n",
    "<ul>\n",
    "<li>Simulate data from several different known distributions with $2$ predictors and a binary response variable.\n",
    "<li>Compare the test error (0-1 loss) for the following methods:\n",
    "<ul>\n",
    "<li>KNN-1\n",
    "<li>KNN-CV (\"optimally tuned\" KNN)\n",
    "<li>Logistic regression\n",
    "<li>Linear discriminant analysis (LDA)\n",
    "<li>Quadratic discriminant analysis (QDA)\n",
    "</ul>\n",
    "</ul>\n",
    "\n",
    "## Scenario 1\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/classification_simulation/scenario1.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "- $X_1,X_2$ normal with identical variance.\n",
    "\n",
    "- No correlation in either class.\n",
    "\n",
    "## Scenario 2\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/classification_simulation/scenario2.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "- $X_1,X_2$ normal with identical variance.\n",
    "\n",
    "-  Correlation is -0.5 in both classes.\n",
    "\n",
    "## Scenario 3\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/classification_simulation/scenario3.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "- $X_1,X_2$ student $T$.\n",
    "\n",
    "- No correlation in either class.\n",
    "\n",
    "## Results for first 3 scenarios\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.10.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "## Scenario 4\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/classification_simulation/scenario4.png\" height=\"600\">\n",
    "</div>\n",
    "\n",
    "- $X_1, X_2$ normal with identical variance.\n",
    "\n",
    "- First class has correlation 0.5, second class has correlation -0.5.\n",
    "\n",
    "## Scenario 5\n",
    "\n",
    "- $X_1, X_2$ normal with identical variance.\n",
    "\n",
    "-  Response $Y$ was sampled from:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(Y=1 \\mid X) &=\n",
    "  \\frac{e^{\\beta_0+\\beta_1 X_1^2+\\beta_2X_2^2+\\beta_3X_1X_2}}{1+e^{\\beta_0+\\beta_1X_1^2+\\beta_2X_2^2+\\beta_3X_1X_2}}.\n",
    "  \\end{aligned}\n",
    "$$\n",
    "\n",
    "- The true decision boundary is quadratic but this is not QDA model. (Why?)\n",
    "\n",
    "## Scenario 6\n",
    "\n",
    "- $X_1, X_2$ normal with identical variance.\n",
    "\n",
    "- Response $Y$ was sampled from:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "P(Y=1 \\mid X) &=\n",
    "  \\frac{e^{f_\\text{nonlinear}(X_1,X_2)}}{1+e^{f_\\text{nonlinear}(X_1,X_2)}}.\n",
    "  \\end{aligned}\n",
    "$$\n",
    "\n",
    "- The true decision boundary is very rough.\n",
    "\n",
    "## Results for scenarios 4-6\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"figs/Chapter4/4.11.png\" height=\"600\">\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "jupytext": {
   "cell_metadata_filter": "all,-slideshow",
   "formats": "ipynb,md:myst,Rmd"
  },
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}