Evaluating a classification method#

  • We have talked about the 0-1 loss:

\[\frac{1}{m}\sum_{i=1}^m \mathbf{1}(y_i \neq \hat y_i).\]
  • It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn’t tell you anything about this.

  • A much more informative summary of the error is a confusion matrix:

Fig. 20 Confusion matrix for a 2 class problem#

Confusion matrix for Default example#

library(MASS) # where the `lda` function lives
library(ISLR) # where `Default` lives
lda.fit = predict(lda(default ~ balance + student, data=Default))
table(lda.fit$class, Default$default)
  1. The error rate among people who do not default (false positive rate) is very low.

  2. However, the rate of false negatives is 76%.

  3. It is possible that false negatives are a bigger source of concern!

  4. One possible solution: Change the threshold

Changing decision rule#

new.class = rep("No", length(Default$default))
new.class[lda.fit$posterior[,"Yes"] > 0.2] = "Yes"
table(new.class, Default$default)
  • Predicted Yes if \(P(\mathtt{default}=\text{yes} | X) > \color{Red}{0.2}\).

  • Changing the threshold to 0.2 makes it easier to classify to Yes.

  • Note that the rate of false positives became higher! That is the price to pay for fewer false negatives.

Let’s visualize the dependence of the error on the threshold:

Fig. 21 Error rates for LDA classifier on Default dataset.#

\(-- -- --\) False negative rate (error for defaulting customers), \(\cdot\cdot\cdot\) False positive rate (error for non-defaulting customers), \(--------\) Overall error rate.

The ROC curve#

Fig. 22 ROC curve for LDA classifier on Default dataset.#

  • Displays the performance of the method for any choice of threshold.

  • The area under the curve (AUC) measures the quality of the classifier:

    1. 0.5 is the AUC for a random classifier

    2. The closer the AUC is to 1, the better.

Comparing classification methods through simulation#

  • Simulate data from several different known distributions with \(2\) predictors and a binary response variable.

  • Compare the test error (0-1 loss) for the following methods:

    1. KNN-1

    2. KNN-CV (“optimally tuned” KNN)

    3. Logistic regression

    4. Linear discriminant analysis (LDA)

    5. Quadratic discriminant analysis (QDA)

Scenario 1#

Fig. 23 Instance for simulation scenario #1.#

  • \(X_1,X_2\) normal with identical variance.

  • No correlation in either class.

Scenario 2#

Fig. 24 Instance for simulation scenario #2.#

  • \(X_1,X_2\) normal with identical variance.

  • Correlation is -0.5 in both classes.

Scenario 3#

Fig. 25 Instance for simulation scenario #3.#

  • \(X_1,X_2\) student \(T\).

  • No correlation in either class.

Results for first 3 scenarios#

Fig. 26 Simulation results for linear scenarios #1-3.#

Scenario 4#

Fig. 27 Instance for simulation scenario #4.#

  • \(X_1, X_2\) normal with identical variance.

  • First class has correlation 0.5, second class has correlation -0.5.

Scenario 5#

  • \(X_1, X_2\) normal with identical variance.

  • Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{\beta_0+\beta_1 X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}{1+e^{\beta_0+\beta_1X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}. \end{aligned} \)$

  • The true decision boundary is quadratic but this is not QDA model. (Why?)

Scenario 6#

  • \(X_1, X_2\) normal with identical variance.

  • Response \(Y\) was sampled from: $\( \begin{aligned} P(Y=1 \mid X) &= \frac{e^{f_\text{nonlinear}(X_1,X_2)}}{1+e^{f_\text{nonlinear}(X_1,X_2)}}. \end{aligned} \)$

  • The true decision boundary is very rough.

Results for scenarios 4-6#

Fig. 28 Simulation results for nonlinear scenarios #4-6.#