Evaluating a classification method#

  • We have talked about the 0-1 loss:

\[\frac{1}{m}\sum_{i=1}^m \mathbf{1}(y_i \neq \hat y_i).\]
  • It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn’t tell you anything about this.

  • A much more informative summary of the error is a confusion matrix:

Table 4.6

Fig. 20 Confusion matrix for a 2 class problem#


Confusion matrix for Default example#

library(MASS) # where the `lda` function lives
library(ISLR) # where `Default` lives
lda.fit = predict(lda(default ~ balance + student, data=Default))
table(lda.fit$class, Default$default)
  1. The error rate among people who do not default (false positive rate) is very low.

  2. However, the rate of false negatives is 76%.

  3. It is possible that false negatives are a bigger source of concern!

  4. One possible solution: Change the threshold


Changing decision rule#

new.class = rep("No", length(Default$default))
new.class[lda.fit$posterior[,"Yes"] > 0.2] = "Yes"
table(new.class, Default$default)
  • Predicted Yes if \(P(\mathtt{default}=\text{yes} | X) > \color{Red}{0.2}\).

  • Changing the threshold to 0.2 makes it easier to classify to Yes.

  • Note that the rate of false positives became higher! That is the price to pay for fewer false negatives.


Let’s visualize the dependence of the error on the threshold:

Fig 4.7

Fig. 21 Error rates for LDA classifier on Default dataset.#

\(-- -- --\) False negative rate (error for defaulting customers), \(\cdot\cdot\cdot\) False positive rate (error for non-defaulting customers), \(--------\) Overall error rate.


The ROC curve#

Fig 4.8

Fig. 22 ROC curve for LDA classifier on Default dataset.#


  • Displays the performance of the method for any choice of threshold.

  • The area under the curve (AUC) measures the quality of the classifier:

    1. 0.5 is the AUC for a random classifier

    2. The closer the AUC is to 1, the better.


Comparing classification methods through simulation#

  • Simulate data from several different known distributions with \(2\) predictors and a binary response variable.

  • Compare the test error (0-1 loss) for the following methods:

    1. KNN-1

    2. KNN-CV (“optimally tuned” KNN)

    3. Logistic regression

    4. Linear discriminant analysis (LDA)

    5. Quadratic discriminant analysis (QDA)


Scenario 1#

Scenario 1

Fig. 23 Instance for simulation scenario #1.#

  • \(X_1,X_2\) normal with identical variance.

  • No correlation in either class.


Scenario 2#

Scenario 2

Fig. 24 Instance for simulation scenario #2.#

  • \(X_1,X_2\) normal with identical variance.

  • Correlation is -0.5 in both classes.


Scenario 3#

Scenario 3

Fig. 25 Instance for simulation scenario #3.#

  • \(X_1,X_2\) student \(T\).

  • No correlation in either class.


Results for first 3 scenarios#

Figure 4.10

Fig. 26 Simulation results for linear scenarios #1-3.#


Scenario 4#

Scenario 4

Fig. 27 Instance for simulation scenario #4.#

  • \(X_1, X_2\) normal with identical variance.

  • First class has correlation 0.5, second class has correlation -0.5.


Scenario 5#

  • \(X_1, X_2\) normal with identical variance.

  • Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{\beta_0+\beta_1 X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}{1+e^{\beta_0+\beta_1X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}. \end{aligned} \)$

  • The true decision boundary is quadratic but this is not QDA model. (Why?)


Scenario 6#

  • \(X_1, X_2\) normal with identical variance.

  • Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{f_\text{nonlinear}(X_1,X_2)}}{1+e^{f_\text{nonlinear}(X_1,X_2)}}. \end{aligned} \)$

  • The true decision boundary is very rough.


Results for scenarios 4-6#

Figure 4.11

Fig. 28 Simulation results for nonlinear scenarios #4-6.#