Bootstrap#

  • Another resampling technique often seen in practice.

Cross-validation vs. the Bootstrap#

  • Cross-validation: provides estimates of the (test) error

  • The Bootstrap: provides the (standard) error of estimates


Bootstrap#

Prof. Brad Efron
  • One of the most important techniques in all of Statistics.

  • Computer intensive method.

  • Popularized by Brad Efron \(\leftarrow\) Stanford pride!


Standard errors in linear regression from a sample of size \(n\)#


Classical way to compute Standard Errors#

  • Example: Estimate the variance of a sample \(x_1,x_2,\dots,x_n\):

  • Unbiased estimate of \(\sigma^2\): $\(\hat \sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\overline x)^2.\)$

  • What is the Standard Error of \(\hat \sigma^2\)?

    • Assume that \(x_1,\dots,x_n\) are normally distributed with common mean \(\mu\) and variance \(\sigma^2\).

    • Then \(\hat \sigma^2(n-1)\) has a \(\chi\)-squared distribution with \(n-1\) degrees of freedom.

    • For large \(n\), \(\hat{\sigma}^2\) is normally distributed around \(\sigma^2\).

    • The SD of this sampling distribution is the Standard Error.


Limitations of the classical approach#

  • This approach has served statisticians well for many years; however, what happens if:

    • The distributional assumption — for example, \(x_1,\dots,x_n\) being normal — breaks down?

    • The estimator does not have a simple form and its sampling distribution cannot be derived analytically?

  • Bootstrap can handle these departures from the usual assumptions!


Example: Investing in two assets#

Fig 5.9
  • Suppose that \(X\) and \(Y\) are the returns of two assets.

  • These returns are observed every day: \((x_1,y_1),\dots,(x_n,y_n)\).


  • We have a fixed amount of money to invest and we will invest a fraction \(\alpha\) on \(X\) and a fraction \((1-\alpha)\) on \(Y\).

  • Therefore, our return will be

\[\alpha X + (1-\alpha) Y.\]
  • Our goal will be to minimize the variance of our return as a function of \(\alpha\).

  • One can show that the optimal \(\alpha\) is:

\[\alpha = \frac{\sigma_Y^2 - \text{Cov}(X,Y)}{\sigma_X^2 + \sigma_Y^2 -2\text{Cov}(X,Y)}.\]
  • Proposal: Use an estimate:

\[\widehat \alpha = \frac{\widehat \sigma_Y^2 - \widehat{ \text{Cov}}(X,Y)}{\widehat \sigma_X^2 + \widehat \sigma_Y^2 -2\widehat{ \text{Cov}}(X,Y)}.\]

  • Suppose we compute the estimate \(\widehat\alpha = 0.6\) using the samples \((x_1,y_1),\dots,(x_n,y_n)\).

  • How sure can we be of this value? (A little vague of a question.)

  • If we had sampled the observations in a different 100 days, would we get a wildly different \(\widehat \alpha\)? (A more precise question.)


Resampling the data from the true distribution#

Fig 5.10
  • In this thought experiment, we know the actual joint distribution \(P(X,Y)\), so we can resample the \(n\) observations to our hearts’ content.


Computing the standard error of \(\widehat \alpha\)#

  • We will use \(S\) samples to estimate the standard error of \(\widehat{\alpha}\).

  • For each sampling of the data, for \(1 \leq s \leq S\)

\[(x_1^{(s)},\dots,x_n^{(s)})\]

we can compute a value of the estimate \(\widehat \alpha^{(1)},\widehat \alpha^{(2)},\dots\).

  • The Standard Error of \(\widehat \alpha\) is approximated by the standard deviation of these values.


In reality, we only have \(n\) samples#

A single panel of Fig 5.9
  • However, these samples can be used to approximate the joint distribution of \(X\) and \(Y\).


  • The Bootstrap: Sample from the empirical distribution:

\[\widehat P(X,Y) = \frac{1}{n}\sum_{i=1}^{n} \delta_{(x_i,y_i)}.\]
  • Equivalently, resample the data by drawing \(n\) samples with replacement from the actual observations.

  • Why it works: variances computed under the empirical distribution are good approximations of variances computed under the true distribution (in many cases).


A schematic of the Bootstrap#

A single dataset

Comparing Bootstrap sampling to sampling from the true distribution#

Fig 5.10
  • Left panel is population distribution of \(\widehat{\alpha}\) – centered (approximately) around the true \(\alpha\).

  • Middle panel is bootstrap distribution of \(\widehat{\alpha}\) – centered (approximately) around observed \(\widehat{\alpha}\).