Fixed \(X \in \mathbb{R}^{n \times p}\) regression context with \(K+1\) models. Let’s say \(K=10\):
~ V
~ V + X_1
~ V + X_1 + ... + X_10
Model selection is essentially backward stepwise \[ \hat{E} = \max \{k: |T_k| > c_k, 1 \leq k \leq K\} \] with \(T_k\) the \(T\) statistic testing \[ H_0: {\cal M}_{k-1} \qquad \text{vs} \qquad H_a: {\cal M}_k \]
With \(\hat{\beta}_{E}\) denote 0-padded OLS estimator in model \({\cal M}_k\), they consider for some fixed \(k\) and fixed \(A \in \mathbb{R}^{q \times K}\) \[ {\cal L}\left(\sqrt{n} A (\hat{\beta}_{E} - \beta^*) | \hat{E}=k\right) \] in the statistical model \[ Y | X \sim N(X\beta^*, \sigma^2 I). \]
Without selection with \(\hat{E}=k\), this distribution is some \(N(\theta_k, \Sigma_k)\) where \(\theta\) is expressible in terms of \(E[X^TY]\) with \[ X^TY | X \sim N(X^TX\beta^*, \sigma^2 I). \] Further, \(\hat{E}=\hat{E}(X^TY)\).
We could stack \[ \begin{pmatrix} \sqrt{n} A (\hat{\beta}_{E} - \beta^*) \\ n^{-1/2}X^T(Y-X\beta^*) \end{pmatrix} = \begin{pmatrix} \sqrt{n} (\hat{\theta} - \theta_{\cal T}) \\ D \end{pmatrix} \sim N(?, ?) \]
That is, we are interested in parameter \(\theta_{\cal T}(\beta^*)\) and selection is a function of data \(D\) and we condition on \(\hat{E}(D)=E\).
For same choice of \((\delta_0, \rho_0)\) we have \[ \sup_{\vartheta \in M_p: \|\vartheta - \theta\| < \rho_0 / \sqrt{n}} |G_{n,\theta,\sigma}(t|p) - G_{n,\vartheta,\sigma}(t|p)| > \delta_0 / 2 \] for all \(n\) sufficiently large.
Let \(G_{n,\theta,\sigma}(t)\) denote the CDF without selection. Clearly, \[ G_{n,\theta,\sigma}(t) = G_{n,\vartheta,\sigma}(t) \qquad \forall \theta, \vartheta \]
Changes of size \(O(n^{-1/2})\) in \(\theta\) yield changes of size \(O(1)\) in the conditional distributions!
The implication above shows that this will hold even in the limits (when these limits exist…).
This behavior is generic (unless selection is independent of target).
Suppose \[ \mathbb{R}^2 \ni Z \sim N(\mu, I_2) \] and we want to report inference for \(\mu_1\) when \(Z_1 > Z_2\).
We are going to consider \[ F^*_{\mu}(t) = P_{\mu}(Z_1 - \mu_1 \leq t | Z_1 > Z_2) \]
Implication above is essentially the observation that for any \(\rho > 0\), \[ \sup_{\zeta: \|\zeta - \mu\| \leq \rho} |F^*_{\mu}(t) - F_{\zeta}^*(t)| > 0 \]
In the unconditional model, we might look at \[ F_{\mu}(t) = P_{\mu}(Z_1 - \mu_1 \leq t) \] and clearly \[ \sup_{\zeta: \|\zeta - \mu\| \leq \rho} |F_{\mu}(t) - F_{\zeta}(t)| = 0. \]
Another way of saying this is that the laws of \((Z_1-\mu_1)\) do not form a location family under \({\cal M}^*\) while they are a location family under \({\cal M}\).
Consider our selection of \(Z_1 > Z_2\) but consider a pre-asymptotic scenario where our choice is based on \[ (Z_{1,n_1}, Z_{2,n_2}) = \left(n_1^{1/2}\bar{X}_{1,n_1}, n_2^{1/2} \bar{X}_{2,n_2}\right) \] where \(X_1=(X_{1,1}, \dots, X_{1,n_1}) \overset{IID}{\sim} F_1\) and \(X_2 = (X_{2,1}, \dots, X_{2,n_2}) \overset{IID}{\sim} F_2\).
How might you bootstrap this experiment (ignoring selection)?
Suppose we restrict the bootstrap samples so \(Z_{1,n_1}^* > Z_{2,n_2}^*\) (i.e. bootstrapped sample mean in arm 1 beats arm 2), what does Leeb and Potscher’s result say about the quantiles of \[ Z_{1,n_1}^* - Z_{1,n_1} | X_1, X_2, Z_{1,n_1}^* > Z_{2,n_2}^*?\]
Compare this to what we could say about the pre-selection quantiles of \[ Z_{1,n_1}^* - Z_{1,n_1} | X_1, X_2.\] Do you think the bootstrap quantile interval will work in the conditional setting?
Describe how you might carry out valid conditional inference in this setting. Try your procedure out using a few choices for \((F_1, F_2)\) with non-normal errors and unknown variance.
Are all bests of if we had paired data? That is, \[ \mathbb{R}^2 \ni W_i \overset{IID}{\sim} F \] and \[ \begin{aligned} \bar{Z}_{1,n} &= n^{1/2} \sum_{i=1}^n W_{i,1} \\ \bar{Z}_{2,n} = n^{1/2} \sum_{i=1}^n W_{i,2}. \end{aligned} \]