Maximal margin classifier
Contents
Maximal margin classifier#
Suppose we have a classification problem with response \(Y=-1\) or \(Y=1\).
If the classes can be separated, most likely, there will be an infinite number of hyperplanes separating the classes.
Draw the largest possible empty margin around the hyperplane.
Out of all possible hyperplanes that separate the 2 classes, choose the one with the widest margin.
Optimization problem#
This can be written as an optimization problem:
The quantity \(y_i(\beta_0 +\beta_1x_{i1}+\dots+\beta_p x_{ip})\) is distance from \(x_i\) from the hyperplane.
\(M\) is simply the width of the margin in either direction.
Ultimately, we will see that this problem can be transformed to something similar to logistic regression…
Reformulation#
We can reformulate the problem by defining a vector \(w=(w_1,\dots,w_p) = \beta/M\):
This is a quadratic optimization problem. Having found \((\hat{\beta}_0, \hat{w})\) we can recover \(\hat{\beta}=\hat{w}/\|\hat{w}\|_2, M=1/\|\hat{w}\|_2\).
Introducing Karush-Kuhn-Tucker (KKT) multipliers, \(\alpha_1,\dots,\alpha_n\), this is equivalent to:
Setting the partial derivatives with respect to \(w\) and \(\beta_0\) to 0, we get: $\(\hat w = \sum_{i=1}^n \alpha_i y_i x_i, \quad \sum_{i=1}^n\alpha_iy_i = 0\)$
Furthermore, one of the KKT conditions yields \(\alpha_i>0\) if and only if \(y_i(\beta_0 +w\cdot x_i)=1\), that is, if \(x_i\) falls on the margin.
Support vectors#
The vectors that fall on the margin and determine the solution are called support vectors
Summary#
The solution is \(\hat w = \sum_{i=1}^n \alpha_i y_i x_i\), and \(\sum_{i=1}^n \alpha_iy_i=0\) so we can plug this in above to obtain the dual problem:
We’ve reduced the problem of finding \(w\), which describes the hyperplane and the size of the margin, to finding a set of coefficients \(\alpha_1,\dots,\alpha_n\) through:
This only depends on the training sample inputs through the inner products \(x_i\cdot x_j\) for every pair \(i,j\).
Whatever \(p\) is, this is an optimization problem in \(n\) variables. \(\to\) could be \(p \gg n\).