Boosting#

  • Another ensemble method (i.e. uses a collection of learners)

  • Instead of randomizing each learner, each learner fits to the residual (not that unlike backfitting)


Boosting regression trees#

  1. Set \(\hat f(x) = 0\), and \(r_i=y_i\) for \(i=1,\dots,n\).

  2. For \(b=1,\dots,B\), iterate:

    1. Fit a regression tree \(\hat f^b\) with \(d\) splits to the response \(r_1,\dots,r_n\).

    2. Update the prediction to:

    \[\hat f(x) \leftarrow \hat f(x) + \lambda \hat f^b(x).\]
    1. Update the residuals,

    \[ r_i \leftarrow r_i - \lambda \hat f^b(x_i).\]
  3. Output the final model:

\[\hat f(x) = \sum_{b=1}^B \lambda \hat f^b(x).\]

Boosting classification trees#

  • Can be done with appropriately defined residual for classification based on offset in log-odds.

Some intuition#

  • Boosting learns slowly

  • We first use the samples that are easiest to predict, then slowly down weigh these cases, moving on to harder samples.


Fig 8.11
  • The parameter \(\lambda=0.01\) in each case.

  • We can tune the model by CV using \(\lambda, d, B.\)