{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some details\n",
    "\n",
    "### How do we deal with categorical predictors?\n",
    "\n",
    "- If there are only 2 categories, then the split is obvious. We don't have to choose the splitting point $s$, as for a numerical variable.\n",
    "- If there are more than 2 categories:\n",
    "    - Order the categories according to the average of the response:\n",
    "    $$\\mathtt{ChestPain:a} > \\mathtt{ChestPain:c} > \\mathtt{ChestPain:b}$$\n",
    "    - Treat as a numerical variable with this ordering, and choose a splitting point $s$.\n",
    "- One can show that this is the optimal way of partitioning.\n",
    "\n",
    "---\n",
    "\n",
    "### How do we deal with missing data?\n",
    "\n",
    "- Suppose we can assign every sample to a leaf $R_i$ despite the missing data.\n",
    "- When choosing a new split with variable $X_j$ (growing the tree):\n",
    "    - Only consider the samples which have the variable $X_j$.\n",
    "    - In addition to choosing the best split, choose a second best split using a different variable, and a third best, ...\n",
    "- To propagate a sample down the tree, if it is missing a variable to make a decision,\n",
    "try the second best decision, or the third best: **surrogate splitting**\n",
    "\n",
    "---\n",
    "\n",
    "### Some advantages of trees\n",
    "\n",
    "- Very easy to interpret!\n",
    "- Closer to human decision-making.\n",
    "- Easy to visualize graphically (for shallow ones)\n",
    "- They easily handle qualitative predictors and missing data.\n",
    "\n",
    "<font color=\"red\">Downside: they don't necessarily fit that well!</font>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "jupytext": {
   "cell_metadata_filter": "all,-slideshow",
   "formats": "ipynb,Rmd,md:myst"
  },
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
