Evaluating a classification method#

We have talked about the 0-1 loss:

\[\frac{1}{m}\sum_{i=1}^m \mathbf{1}(y_i \neq \hat y_i).\]

It is possible to make the wrong prediction for some classes more often than others. The 0-1 loss doesn’t tell you anything about this.
A much more informative summary of the error is a confusion matrix:

Table 4.6 — Fig. 20 Confusion matrix for a 2 class problem#

Confusion matrix for `Default` example#

library(MASS) # where the `lda` function lives
library(ISLR) # where `Default` lives
lda.fit = predict(lda(default ~ balance + student, data=Default))
table(lda.fit$class, Default$default)

The error rate among people who do not default (false positive rate) is very low.
However, the rate of false negatives is 76%.
It is possible that false negatives are a bigger source of concern!
One possible solution: Change the threshold

Changing decision rule#

new.class = rep("No", length(Default$default))
new.class[lda.fit$posterior[,"Yes"] > 0.2] = "Yes"
table(new.class, Default$default)

Predicted Yes if $P(\mathtt{default}=\text{yes} | X) > \color{Red}{0.2}$.
Changing the threshold to 0.2 makes it easier to classify to Yes.
Note that the rate of false positives became higher! That is the price to pay for fewer false negatives.

Let’s visualize the dependence of the error on the threshold:

Fig 4.7 — Fig. 21 Error rates for LDA classifier on `Default` dataset.#

$-- -- --$ False negative rate (error for defaulting customers), $\cdot\cdot\cdot$ False positive rate (error for non-defaulting customers), $--------$ Overall error rate.

The ROC curve#

Fig 4.8 — Fig. 22 ROC curve for LDA classifier on `Default` dataset.#

Displays the performance of the method for any choice of threshold.
The area under the curve (AUC) measures the quality of the classifier:
1. 0.5 is the AUC for a random classifier
2. The closer the AUC is to 1, the better.

Comparing classification methods through simulation#

Simulate data from several different known distributions with $2$ predictors and a binary response variable.
Compare the test error (0-1 loss) for the following methods:
1. KNN-1
2. KNN-CV (“optimally tuned” KNN)
3. Logistic regression
4. Linear discriminant analysis (LDA)
5. Quadratic discriminant analysis (QDA)

Scenario 1#

$X_1,X_2$ normal with identical variance.
No correlation in either class.

Scenario 2#

$X_1,X_2$ normal with identical variance.
Correlation is -0.5 in both classes.

Scenario 3#

$X_1,X_2$ student $T$.
No correlation in either class.

Results for first 3 scenarios#

Figure 4.10 — Fig. 26 Simulation results for linear scenarios #1-3.#

Scenario 4#

$X_1, X_2$ normal with identical variance.
First class has correlation 0.5, second class has correlation -0.5.

Scenario 5#

$X_1, X_2$ normal with identical variance.
Response $Y$ was sampled from: $$ \begin{aligned} P(Y=1 \mid X) &= \frac{e^{\beta_0+\beta_1 X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}{1+e^{\beta_0+\beta_1X_1^2+\beta_2X_2^2+\beta_3X_1X_2}}. \end{aligned} $$
The true decision boundary is quadratic but this is not QDA model. (Why?)

Scenario 6#

$X_1, X_2$ normal with identical variance.
Response $Y$ was sampled from: $$ \begin{aligned} P(Y=1 \mid X) &= \frac{e^{f_\text{nonlinear}(X_1,X_2)}}{1+e^{f_\text{nonlinear}(X_1,X_2)}}. \end{aligned} $$
The true decision boundary is very rough.

Results for scenarios 4-6#

Figure 4.11 — Fig. 28 Simulation results for nonlinear scenarios #4-6.#

STATS 202

Evaluating a classification method

Contents

Evaluating a classification method#

Confusion matrix for `Default` example#

Changing decision rule#

The ROC curve#