Contents

Some details

Contents

Some details#

How do we deal with categorical predictors?#

If there are only 2 categories, then the split is obvious. We don’t have to choose the splitting point \(s\), as for a numerical variable.
If there are more than 2 categories:
- Order the categories according to the average of the response: \(\mathtt{ChestPain:a} > \mathtt{ChestPain:c} > \mathtt{ChestPain:b}\)
- Treat as a numerical variable with this ordering, and choose a splitting point \(s\).
One can show that this is the optimal way of partitioning.

How do we deal with missing data?#

Suppose we can assign every sample to a leaf \(R_i\) despite the missing data.
When choosing a new split with variable \(X_j\) (growing the tree):
- Only consider the samples which have the variable \(X_j\).
- In addition to choosing the best split, choose a second best split using a different variable, and a third best, …
To propagate a sample down the tree, if it is missing a variable to make a decision, try the second best decision, or the third best: surrogate splitting

Some advantages of trees#

Very easy to interpret!
Closer to human decision-making.
Easy to visualize graphically (for shallow ones)
They easily handle qualitative predictors and missing data.

Downside: they don’t necessarily fit that well!