Some details#

How do we deal with categorical predictors?#

  • If there are only 2 categories, then the split is obvious. We don’t have to choose the splitting point \(s\), as for a numerical variable.

  • If there are more than 2 categories:

    • Order the categories according to the average of the response: \(\mathtt{ChestPain:a} > \mathtt{ChestPain:c} > \mathtt{ChestPain:b}\)

    • Treat as a numerical variable with this ordering, and choose a splitting point \(s\).

  • One can show that this is the optimal way of partitioning.


How do we deal with missing data?#

  • Suppose we can assign every sample to a leaf \(R_i\) despite the missing data.

  • When choosing a new split with variable \(X_j\) (growing the tree):

    • Only consider the samples which have the variable \(X_j\).

    • In addition to choosing the best split, choose a second best split using a different variable, and a third best, …

  • To propagate a sample down the tree, if it is missing a variable to make a decision, try the second best decision, or the third best: surrogate splitting


Some advantages of trees#

  • Very easy to interpret!

  • Closer to human decision-making.

  • Easy to visualize graphically (for shallow ones)

  • They easily handle qualitative predictors and missing data.

Downside: they don’t necessarily fit that well!