Contents

Boosting

Contents

Boosting#

Another ensemble method (i.e. uses a collection of learners)
Instead of randomizing each learner, each learner fits to the residual (not that unlike backfitting)

Boosting regression trees#

Set \(\hat f(x) = 0\), and \(r_i=y_i\) for \(i=1,\dots,n\).
For \(b=1,\dots,B\), iterate:
1. Fit a regression tree \(\hat f^b\) with \(d\) splits to the response \(r_1,\dots,r_n\).
2. Update the prediction to:
\[\hat f(x) \leftarrow \hat f(x) + \lambda \hat f^b(x).\]
1. Update the residuals,
\[ r_i \leftarrow r_i - \lambda \hat f^b(x_i).\]
Output the final model:

\[\hat f(x) = \sum_{b=1}^B \lambda \hat f^b(x).\]

Boosting classification trees#

Can be done with appropriately defined residual for classification based on offset in log-odds.

Some intuition#

Boosting learns slowly
We first use the samples that are easiest to predict, then slowly down weigh these cases, moving on to harder samples.

Fig 8.11

The parameter \(\lambda=0.01\) in each case.
We can tune the model by CV using \(\lambda, d, B.\)