Leeb and Potscher (2006)

Fixed \(X \in \mathbb{R}^{n \times p}\) regression context with \(K+1\) models. Let’s say \(K=10\):
1. ~ V
2. ~ V + X_1
3. ~ V + X_1 + ... + X_10
Model selection is essentially backward stepwise \[ \hat{E} = \max \{k: |T_k| > c_k, 1 \leq k \leq K\} \] with \(T_k\) the \(T\) statistic testing \[ H_0: {\cal M}_{k-1} \qquad \text{vs} \qquad H_a: {\cal M}_k \]
With \(\hat{\beta}_{E}\) denote 0-padded OLS estimator in model \({\cal M}_k\), they consider for some fixed \(k\) and fixed \(A \in \mathbb{R}^{q \times K}\) \[ {\cal L}\left(\sqrt{n} A (\hat{\beta}_{E} - \beta^*) | \hat{E}=k\right) \] in the statistical model \[ Y | X \sim N(X\beta^*, \sigma^2 I). \]

Leeb and Potscher (2006)

Without selection with \(\hat{E}=k\), this distribution is some \(N(\theta_k, \Sigma_k)\) where \(\theta\) is expressible in terms of \(E[X^TY]\) with \[ X^TY | X \sim N(X^TX\beta^*, \sigma^2 I). \] Further, \(\hat{E}=\hat{E}(X^TY)\).
We could stack \[ \begin{pmatrix} \sqrt{n} A (\hat{\beta}_{E} - \beta^*) \\ n^{-1/2}X^T(Y-X\beta^*) \end{pmatrix} = \begin{pmatrix} \sqrt{n} (\hat{\theta} - \theta_{\cal T}) \\ D \end{pmatrix} \sim N(?, ?) \]
That is, we are interested in parameter \(\theta_{\cal T}(\beta^*)\) and selection is a function of data \(D\) and we condition on \(\hat{E}(D)=E\).

Leeb and Potscher (2006)

Main result

Note: there is no omitted variable bias here: \(\theta \in M_{p-1}\).

Leeb and Potscher (2006)

An implication

For same choice of \((\delta_0, \rho_0)\) we have \[ \sup_{\vartheta \in M_p: \|\vartheta - \theta\| < \rho_0 / \sqrt{n}} |G_{n,\theta,\sigma}(t|p) - G_{n,\vartheta,\sigma}(t|p)| > \delta_0 / 2 \] for all \(n\) sufficiently large.
Let \(G_{n,\theta,\sigma}(t)\) denote the CDF without selection. Clearly, \[ G_{n,\theta,\sigma}(t) = G_{n,\vartheta,\sigma}(t) \qquad \forall \theta, \vartheta \]
Changes of size \(O(n^{-1/2})\) in \(\theta\) yield changes of size \(O(1)\) in the conditional distributions!
The implication above shows that this will hold even in the limits (when these limits exist…).
This behavior is generic (unless selection is independent of target).

Leeb and Potscher (2006)

A simple example

Suppose \[ \mathbb{R}^2 \ni Z \sim N(\mu, I_2) \] and we want to report inference for \(\mu_1\) when \(Z_1 > Z_2\).
We are going to consider \[ F^*_{\mu}(t) = P_{\mu}(Z_1 - \mu_1 \leq t | Z_1 > Z_2) \]
Implication above is essentially the observation that for any \(\rho > 0\), \[ \sup_{\zeta: \|\zeta - \mu\| \leq \rho} |F^*_{\mu}(t) - F_{\zeta}^*(t)| > 0 \]

Leeb and Potscher (2006)

In the unconditional model, we might look at \[ F_{\mu}(t) = P_{\mu}(Z_1 - \mu_1 \leq t) \] and clearly \[ \sup_{\zeta: \|\zeta - \mu\| \leq \rho} |F_{\mu}(t) - F_{\zeta}(t)| = 0. \]
Another way of saying this is that the laws of \((Z_1-\mu_1)\) do not form a location family under \({\cal M}^*\) while they are a location family under \({\cal M}\).

Leeb and Potscher (2006)

Suggests resampling procedures and even score tests will have difficulty here.

Exercise

Consider our selection of \(Z_1 > Z_2\) but consider a pre-asymptotic scenario where our choice is based on \[ (Z_{1,n_1}, Z_{2,n_2}) = \left(n_1^{1/2}\bar{X}_{1,n_1}, n_2^{1/2} \bar{X}_{2,n_2}\right) \] where \(X_1=(X_{1,1}, \dots, X_{1,n_1}) \overset{IID}{\sim} F_1\) and \(X_2 = (X_{2,1}, \dots, X_{2,n_2}) \overset{IID}{\sim} F_2\).

How might you bootstrap this experiment (ignoring selection)?
Suppose we restrict the bootstrap samples so \(Z_{1,n_1}^* > Z_{2,n_2}^*\) (i.e. bootstrapped sample mean in arm 1 beats arm 2), what does Leeb and Potscher’s result say about the quantiles of \[ Z_{1,n_1}^* - Z_{1,n_1} | X_1, X_2, Z_{1,n_1}^* > Z_{2,n_2}^*?\]
Compare this to what we could say about the pre-selection quantiles of \[ Z_{1,n_1}^* - Z_{1,n_1} | X_1, X_2.\] Do you think the bootstrap quantile interval will work in the conditional setting?
Describe how you might carry out valid conditional inference in this setting. Try your procedure out using a few choices for \((F_1, F_2)\) with non-normal errors and unknown variance.
Are all bests of if we had paired data? That is, \[ \mathbb{R}^2 \ni W_i \overset{IID}{\sim} F \] and \[ \begin{aligned} \bar{Z}_{1,n} &= n^{1/2} \sum_{i=1}^n W_{i,1} \\ \bar{Z}_{2,n} = n^{1/2} \sum_{i=1}^n W_{i,2}. \end{aligned} \]

Brief notes on readings #4

web.stanford.edu/class/stats364/

Leeb and Potscher (2006)

Leeb and Potscher (2006)

Leeb and Potscher (2006)

Leeb and Potscher (2006)

Main result

Leeb and Potscher (2006)

An implication

Leeb and Potscher (2006)

A simple example

Leeb and Potscher (2006)

Leeb and Potscher (2006)

Exercise

Lee et al. (2016)