Local linear regression#

Fig 7.9
  • Sample points nearer \(x\) are weighted higher in corresponding regression.


Algorithm#

To predict the regression function \(f\) at an input \(x\):

  • Assign a weight \(K_i(x)\) to the training point \(x_i\), such that:

    • \(K_i(x)=0\) unless \(x_i\) is one of the \(k\) nearest neighbors of \(x\) (not strictly necessary).

    • \(K_i(x)\) decreases when the distance \(d(x,x_i)\) increases.

  • Perform a weighted least squares regression; i.e. find \((\beta_0,\beta_1)\) which minimize

\[\hat{\beta}(x) = \text{argmin}_{(\beta_0, \beta_1)} \sum_{i=i}^n K_i(x) (y_i - \beta_0 -\beta_1 x_i)^2.\]
  • Predict \(\hat f(x) = \hat \beta_0(x) + \hat \beta_1(x) x\).


Generalized nearest neighbors#

  • Set \(K_i(x)=1\) if \(x_i\) is one of \(x\)’s \(k\) nearest neighbors.

  • Perform a regression with only an intercept; i.e. find \(\beta_0\) which minimizes

\[\hat{\beta}_0(x) = \text{argmin}_{\beta_0} \sum_{i=i}^n K_i(x)(y_i - \beta_0)^2.\]
  • Predict \(\hat f(x) = \hat \beta_0(x)\).

Gaussian (radial basis function) kernel#

  • Common choice that is smoother than nearest neighbors

\[K_i(x) = \exp(-\|x-x_i\|^2/2\lambda)\]

Local linear regression#

Fig 7.10
  • The span \(k/n\), is chosen by cross-validation.