Fithian, Sun, T. (2014)

General conditional approach

Defines notion of selective Type I error, selective coverage
Suppose ${\cal Q}={\cal Q}(D)$ generates null hypotheses to test: i.e. pairs $({\cal M}, H_0)$ with $H_0 \subset {\cal M}$.
Set $q=({\cal M}, H_0)$. A test $\phi = \phi_{({\cal M},H_0)}(D) \in \{0,1\}$ (with 1 a rejection) has selective type I error at $F \in H_0$ \[ P_{F}(\phi(D) = 1 \vert q \in {\cal Q}(D)) \] and controls selective type I error at $\alpha$ if \[ \sup_{F \in H_0}P_{F}(\phi(D) = 1 \vert q \in {\cal Q}(D)). \]

Fithian, Sun, T. (2014)

Instead of choosing null hypotheses, ${\cal Q}$ might generate parameters for which confidence intervals or regions are desired, i.e. $({\cal M}, \theta: {\cal M} \rightarrow \mathbb{R}^k)$.
Set $q = ({\cal M}, \theta: {\cal M} \rightarrow \mathbb{R}^k)$. A confidence region $CI = CI_q(D)$ is a $(1-\alpha) \cdot 100\%$ confidence region for $\theta$ if \[ P_F ( \theta(F) \in CI_q(D) \vert q \in {\cal Q}(D)) \geq 1 - \alpha, \qquad \forall F \in {\cal M}. \]
Note: parameters are functions on ${\cal M}$.

Fithian, Sun, T. (2014)

Examples of ${\cal Q}$

Linear regression with saturated model and $X$ fixed ($\sigma^2=1$ and known) \[ {\cal M}_s = \left\{N(\mu, I): \mu \in \mathbb{R}^n \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_s, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\mu) = e_j^TX_E^{\dagger}\mu \]
Linear regression with full model $p < n$ and $X$ fixed ($\sigma^2=1$ and known) \[ {\cal M}_f = \left\{N(X\beta, I): \beta \in \mathbb{R}^p \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_f, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] \[ \beta_{j|E}(\beta) = e_j^TX_E^{\dagger}X\beta \]
Linear regression with full model $p < n$ and $X$ fixed ($\sigma^2$ unknown) \[ {\cal M}_U = \left\{N(X\beta, \sigma^2 I): \beta \in \mathbb{R}^p, \sigma^2 > 0 \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_U, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] \[ \beta_{j|E}(\beta) = e_j^TX_E^{\dagger}X\beta \]

Fithian, Sun, T. (2014)

Full model targets with full model $p < n$ and $X$ fixed ($\sigma^2$ unknown) \[ {\cal M}_f = \left\{N(X\beta, I): \beta \in \mathbb{R}^p, \sigma^2 > 0 \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_f, \beta_{j}): j \in \hat{E}(D)\right\} \] \[ \beta_{j}(\beta) = e_j^T\beta \]

Fithian, Sun, T. (2014)

Examples of ${\cal Q}$: selected models

Linear regression with a selected model and $X$ fixed ($\sigma^2=1$ and known) \[ {\cal M}_E = \left\{N(X_E\beta_E, I): \beta_E \in \mathbb{R}^E \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{\hat{E}(D)}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\beta_E) = e_j^T\beta_E \]
Linear regression with a selected model and $X$ fixed and $\sigma^2$ unknown \[ {\cal M}_{E,u} = \left\{N(X_E\beta_E, I): \beta_E \in \mathbb{R}^E, \sigma^2 > 0 \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{\hat{E}(D)}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\beta_E) = e_j^T\beta_E \]

Fithian, Sun, T. (2014)

Examples of ${\cal Q}$: nonparametric models

Linear regression in the pairs model \[ {\cal M}_{NP} = \left\{ {\cal L}(X, Y): (X[i,], Y[i]) \overset{IID}{\sim} F, 1 \leq i \leq n \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{NP}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \]
Targets \[ \beta_{j|E}(F) = e_j' E_F[X_EX_E^T]^{-1} E_F[X_E \cdot Y]. \]

Fithian, Sun, T. (2014)

Is it OK to use a selected model?

set.seed(0)
X = matrix(rnorm(500), 100, 5)
Y = rnorm(100)
M = lm(Y ~ X, subset=1:50)
E = (abs(coef(summary(M))[,3]) > 1.5)[-1]
print(E)

##    X1    X2    X3    X4    X5 
##  TRUE FALSE FALSE FALSE FALSE

summary(lm(Y ~ X[,E], subset=51:100))

## 
## Call:
## lm(formula = Y ~ X[, E], subset = 51:100)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5089 -0.6060 -0.1135  0.6509  1.8467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.12880    0.12253  -1.051    0.298
## X[, E]      -0.07858    0.14438  -0.544    0.589
## 
## Residual standard error: 0.8661 on 48 degrees of freedom
## Multiple R-squared:  0.006134,   Adjusted R-squared:  -0.01457 
## F-statistic: 0.2962 on 1 and 48 DF,  p-value: 0.5888

If (X, Y)[1:50,] is collected before (X, Y)[51:100,] this is just scientific method.

Fithian, Sun, T. (2014)

What to condition on?

The event $A_q = \left\{D:q \in {\cal Q}(D)\right\}$ is the selection event for $q$.
We will assume that $P_F(A_q) > 0 \ \forall F \in {\cal M}_q$.
Coverage and type I error can be rephrased in terms of $A_q$, e.g. a selective confidence region for $({\cal M}_q, \theta_q:{\cal M}_q \rightarrow \mathbb{R}^{k_q})$ satisfies \[ P_F ( \theta_q(F) \in CI_q(D) \vert A_q) \geq 1 - \alpha, \qquad \forall F \in {\cal M}_q. \]
This yields a selected conditional model (really a truncated model) \[ {\cal M}^*_q = \left\{F^*: \frac{dF^*}{dF} \propto 1_{A_q}, F \in {\cal M}_q \right\} \] even if ${\cal M}_q$ doesn’t depend on ${\cal M}$.
Auxiliary information is referred to as a selection variable and is denoted as $S=S_q$.
With $s_q$ the observed value of $S_q$ this leads to a final model \[ {\cal M}^*_{q,s_q} = \left\{F^* | {\cal L}_{F^*}(D \vert S_q=s_q): F^* \in {\cal M}_q^* \right\} \]

Fithian, Sun, T. (2014)

Marginalization

Suppose we construct valid confidence intervals for the family \[ \bigcup_{s_q}{\cal M}^*_{q,s_q}, \] i.e. \[ P_{F}(\theta_q(F) \in CI(D, q,s_q) \vert A_q ,S_q=s_q) \overset{a.s.}{\geq} 1- \alpha \qquad \forall F \in {\cal M}_q. \]
Then, these confidence intervals are valid on ${\cal M}^*_q$, i.e. \[ P_{F}(\theta_q(F) \in CI(D,q,s_q) \vert A_q) \geq 1- \alpha \qquad \forall F \in {\cal M}_q. \]
We could marginalize over ${\cal Q}$ as well, of course.

Fithian, Sun, T. (2014)

Exponential familes are preserved

Suppose ${\cal M}_q$ is an exponential family with sufficient statistic $T(D)$, natural parameter $\eta$ and reference measure $m_0$: \[ \frac{dF}{dm_0}(D) = \frac{dF_{\eta}}{dm_0}(D) \propto \exp\left(\eta'T(D)\right) \]
Then, ${\cal M}^*_q$ is again an exponential family with the same sufficient statistic and natural parameter with truncated reference measure $m_{0,q}$ \[ \frac{dm_{0,q}}{dm_0}(D) = 1_{A_q}(D). \]
Suppose $\theta_q(\eta) = L_q'\eta$ is a $k$-dimensional linear functional. Then, we can write \[ \eta'T(D) = \eta'P_{q}T(D) + \eta'(I-P_{q})T(D) \] with $P_{q}$ projection onto $\text{col}(L_q)$.
Setting $S_q={\cal N}_q = (I-P_q)T(D)$, standard exponential family results imply that (for every $s_q$) the conditional model ${\cal M}^*_{q,s_q}$ is an exponential family with natural parameter $\theta_q$, sufficient statistic $L_q^{\dagger}T$.
To test $H_0:\theta_q = \theta_{0,q}$ take $F \in {\cal M}_q$ with $\theta_q(F) = \theta_{0,q}$ and draw samples from $F^* \in {\cal M}^*_{q,s_q}$ and use whatever test statistic you like.
Optimal tests can be constructed using tools from exponential family theory (e.g. UMPU).

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Suppose $m_0 = N(0, \Sigma_q)$, i.e. under $\eta=0$, $ ^p D N(0,_q)$
Sufficient statistic: $T_q(D)=D$.
Mean parameters: $\beta_q \in \text{row}(\Sigma_q)$ with natural parameters $\eta_q=\Sigma^{\dagger}_q\beta_q$. (If $\Sigma_q$ is degenerate, $\text{row}(\Sigma_q)$ should probably not depend on $q$.)
Suppose we want to test $H_0:L_q'\beta_q=L_q'\Sigma_q\eta_q= \theta_{0,q}$ (wlog we assume $\text{row}(L_q) \subset \text{row}(\Sigma_q)$.
Set \[ \hat{\theta}_q(D) = L_q'D \]

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Removing dependence on nuisance parameters

Set $P_q = L_qL_q^{\dagger}$ and $P_q^{\perp} = I - P_q$ (could also take $P_q^{\perp} = P_{\text{row}(\Sigma_q)} - P_q$.
Claim: the laws $N(\beta_q, \Sigma_q)$ agree on the sigma algebras \[ {\cal F}_1 = \sigma(P_q^{\perp}\Sigma_q^{\dagger}D), \qquad {\cal F}_2 = \sigma(D - \Gamma_q \hat{\theta}_q) \] with \[ \Gamma_q = \Sigma_q L_q (L_q'\Sigma_q L_q)^{-1} \]
Further, these laws depend only on $L_q'\beta_q$.

Proof (wlog assume $\text{rank}(\Sigma_q)=p, \text{rank}(L_q'\Sigma_q L_q)=k$)

Density of $N(\beta_q, \Sigma_q)$ on $\text{row}(\Sigma_q)$ is proportional to \[ \begin{aligned} \exp\left(\beta_q'\Sigma_q^{-1} D \right) &= \exp\left(\beta_q'P_q\Sigma_q^{-1} D + \beta_q'P_q^{\perp}\Sigma_q^{-1} D \right) \end{aligned} \]
If we condition this law on $P_q^{\perp}\Sigma_q^{-1}D$ it will depend on $\beta_q$ only through $L_q'\beta_q$. This corresponds to ${\cal F}_2$ above.
If we restrict these conditional laws to $A_q$ they will again depend on $\beta_q$ only through $L_q'\beta_q$.
We need to show that conditioning on ${\cal N}_q(D) = D - \Gamma_q \hat{\theta}_q$ is the same.
Note that \[ \begin{aligned} \text{Cov}(\hat{\theta}_q(D), {\cal N}_q(D)) &= 0\\ \text{Cov}(\hat{\theta}_q(D), P_q^{\perp}\Sigma_q^{-1}D) &= 0 \end{aligned} \] and $\text{rank}(\text{Cov}({\cal N}_q(D))) = \text{rank}(\text{Cov}(P_q^{\perp}\Sigma_q^{\dagger})) = p - k$.
We can choose $C_q$ an inverse square-root of $\Sigma_q$ so that \[ C_qD \sim N(C_q\beta_q, I_{\text{rank}(\Sigma_q)}) = \] with $\hat{\theta}_q = A_q (C_qD)_{[1:k]}$.
We conclude that ${\cal N}_q(D)$ and $P_q^{\perp}\Sigma_q^{-1}D$ are linear functions of $(C_qD)_{[(k+1):p]}$.

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Generalization of polyhedral lemma

Setting $S_q = {\cal N}_q(D) = D - \Sigma(L_q'\Sigma L_q)^{\dagger} \hat{\theta}_q$ the final conditional model is equivalent to the restriction of the laws \[ {\cal L}_F(D \vert {\cal N}_q), \qquad F \in {\cal M}_q \] to the event \[ \begin{aligned} \left\{d: d \in A_q, {\cal N}(d_q)=s_q\right\} = \left\{d: s_q + \Gamma_q \hat{\theta}_q(d) \in A_q, {\cal N}(d)=s_q\right\}. \end{aligned} \]
These laws are equivalent to the laws \[ {\cal L}_F(\hat{\theta}_q(D) \vert {\cal N}_q + \Gamma_q \hat{\theta}_q \in A_q, {\cal N}_q), \qquad F \in {\cal M}_q \]
To test $H_0:L_q'\beta_q=\theta_{0,q}$ take any $F \in {\cal M}_q$ such that $L_q'\beta_q=\theta_{0,q}$ and sample from the above conditional distribution. Use an appropriate test statistic.

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on $\eta$

Often, $\text{dim}(\eta) \neq \text{dim}(\text{row}(\Sigma))$. For example in ${\cal M}_f$ when $n > p$, $\text{dim}(\eta)=p$ while $\Sigma=I_{n \times n}$.
Usually constraints are expressed in mean parameters $\beta$ rather than natural parameters $\eta$.
In this case, we might write \[ {\cal M}_q = \left\{N(\beta_q, \Sigma_q): \beta_q \in \text{row}(\Sigma_q), R_{{\cal M}_q}\beta_q=0 \right\}. \] for some residual projection $R_{{\cal M},q}$ in our model ${\cal M}_q$.
In construction the reference distribution it is not necessary to condition on this residual as its distribution is known.
For $\beta_q$ such that $R_q\beta_q=0$, density is proportional to \[ \exp(\beta_q'\Sigma^{-1}_qD) = \exp(\beta_q'P_{{\cal M}_q}\Sigma^{-1}_qD) \] where $P_{{\cal M}_q} = I - R_{{\cal M}_q}$.
For $L_q$ such that $R_{{\cal M}_q}L_q=0$ we might test $H_0:L_q'\beta_q=\delta$.
Our target estimator is $\hat{\theta}_q(D) = L_q'D$ and our nuisance sufficient statistic is going to be \[ {\cal N}_q(D) = {\cal P}_{{\cal M}_q}D - P_{{\cal M}_q}\Sigma_q L_q (L_q'\Sigma_qL_q)^{-1} \hat{\theta}_q(D) \]
This effectively decomposes $D$ into three independent pieces $(\hat{\theta}_q, {\cal N}_q, \Delta_q)$ with \[ \Delta_q = R_{{\cal M}_q}\Sigma^{-1}_qD \]
We can linearly decompose \[ D = a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \] where $a_q, b_q$ depend on $(L_q,\Sigma_q, R_{{\cal M}_q})$ and $c_q$ depends only on $(\Sigma^{-1}_q, R_{{\cal M}_q})$.
Conditioning an $F \in {\cal M}_q$ on ${\cal N}_q$ leaves $(\hat{\theta}_q, R_{{\cal M}_q}\Sigma_q^{-1}D)$ independent with \[ \Delta_q \sim N(0, R_{{\cal M}_q}\Sigma_q^{-1}R_{{\cal M}_q}) \]

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on $\eta$

Generalization of polyhedral lemma

Setting $S_q = {\cal N}_q(D)$ the final conditional model is equivalent to the restriction of the laws \[ {\cal L}_F(D \vert {\cal N}_q), \qquad F \in {\cal M}_q \] to the event \[ \begin{aligned} \left\{d: d \in A_q, {\cal N}(d_q)=s_q\right\} \end{aligned} \]
As $\hat{\theta}_q$ is sufficient, we can marginalize over $\Delta_q$ yielding the laws \[ {\cal L}_F(\hat{\theta}_q(D) \vert a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \in A_q, {\cal N}_q), \qquad F \in {\cal M}_q \]
The resulting density under $H_0:L_q'\beta_q=\delta$ is proportional to \[ t \mapsto \phi_{(\delta, L_q'\Sigma_qL_q)}(t) \cdot \pi_q(t,n_q) \] with \[ \pi_q(t,n_q) = P_F\left( a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \in A_q, \hat{\theta}_q=t, {\cal N}_q=n\right) \]
To test $H_0:L_q'\beta_q=\theta_{0,q}$ take any $F \in {\cal M}_q$ such that $L_q'\beta_q=\theta_{0,q}$ and sample from the above conditional distribution. Use an appropriate test statistic.

Fithian, Sun, T. (2014)

Does this matter?

Let’s go back to our first “example” of this conditional approach, the covTest of Lockhart et al.
Let $i^*(X,Y)$ denote the first variable to enter the LASSO path with sign $s^*$.
Let’s take the saturated model \[ {\cal M} = \left\{N(\mu, I): \mu \in \mathbb{R}^n\right\} \]
The framework laid out above says that (in order to avoid perils of plugin) we must look at \[ {\cal L}(sX_i'Y | (i^*,s^*)=(i,s), (I - P_i)Y) \overset{H_0:X_1^T\mu=0}{=} \text{TN}(0, [M(i,s), \infty)) \] with \[ M(i,s) = M(i,s, (I - P_i)Y) = \max_{l \neq j, s' \in \pm 1} \frac{s' X_l^T(I - P_{i})Y}{1 - s' X_l^TX_{i}} \]

Fithian, Sun, T. (2014)

Does this matter?

We really want this test to tell us when to stop the LASSO. We might instead look at the model \[ {\cal M}_i = \left\{N(X_i\beta_i, I): \beta_i \in \mathbb{R}\right\}. \]
In this case, the correct reference law is \[ {\cal L}(sX_i'Y | sX_i'Y \geq M(i,s, (I-P_i)Y)) \]
In this case ${\cal N}=0$ and \[ \pi(t,n) = P_F(M(i,s,(I-P_i)Y) \leq t). \]
Note that, $(I-P_i)Y \sim N(0, I-P_i)$ and can be simulated.
In simple case $X^TX=I$ this gives the (exact) Bonferroni simultaneous test!

Fithian, Sun, T. (2014)

Does this matter?

Rather than just focusing on exponential family model, we can certainly use the above test as a goodness of fit test for $H_0:\mu=0$.
For a selection event $A_q$ the general setup above says that a goodness of fit in exponential family model ${\cal M}_q$ can be constructed using \[ {\cal L}_F(D \vert T(D)=T_{obs}, D \in A_q) \]
We note that sampling such distributions can be challenging (c.f. Diaconis et al. (2013))

Fithian, Sun, T. (2014)

Does this matter?

What about the next step?
Suppose we have found feature $i$ is the best in first stage with sign $s$.
A reasonable next question to ask is $H_0:(I-P_i)\mu=0$. For model we should allow for $P_i\mu \neq 0$ so we take model ${\cal M}_i$.
To test this goodness of fit test, we might look at $M(i,s)$ (this would be covTest approach) or simply forward stepwise \[ T_{\max} = \max_{j \neq i} \frac{|X_j'(I-P_i)Y|}{\|(I-P_i)X_j\|_2}. \]
For this test, to avoid perils of plugin we should look at \[ {\cal L}(T_{\max} | (i^*,s^*)=(i,s), X_i'Y). \]
An alternative test might condition on the identity $(j^*, \tilde{s}^*)$ and winning sign in $T_{\max}$, perhaps expressed as a test $H_0:\beta_j=0$ in the model \[ {\cal M}_{(i,j)} = \left\{N(X_i\beta_i + X_j \beta_j, I): \beta_i, \beta_j \in \mathbb{R} \right\} \].
In this case, appropriate reference would be \[ {\cal L}\left(\frac{\tilde{s}X_j'(I-P_i)Y}{\|(I-P_i)X_j\|_2} \biggl \vert X_i'Y, (i^*,s^*)=(i,s), (j^*,\tilde{s}^*)=(j,\tilde{s}) \right). \]
In both cases the statistic we use for this step is the same, though the reference is slightly different.

Fithian, Sun, T. (2014)

Does this matter?

Let’s look again at case $X'X=I$.
The goodness of fit test uses reference \[ {\cal L}\left(\max_{j \neq i}|Z_j| \biggl \vert \ |Z_j| \leq |Z_i|, Z_i\right) \] with $(Z_j)_{j \neq i} \overset{IID}{\sim} N(0,1)$.
While the second test uses reference \[ {\cal L}\left(|Z_j| \biggl \vert \max_{k \not \in \{i,j\}} \ |Z_k| \leq |Z_j| \leq |Z_i|, Z_i\right) \]
The covTest uses reference \[ {\cal L}\left(|Z_j| \biggl \vert \ \max_{k \not \in \{i,j\}} |Z_k| \leq |Z_j|, (Z_k)_{k \neq \in \{i,j\}}\right). \]

Fithian, Sun, T. (2014)

A simulation example

p = 100 # number of features; orthogonal design

def bonferroni_sample(p, alpha=0.05, B=10000, steps=1):
    # At step j, Bonferroni will use distribution
    # of largest p-j+1 absolute Z N(0,1)'s
    Z = np.empty((B, steps))
    for j in range(steps):
        for i in range(B):
            Z[i,j] = np.fabs(np.random.standard_normal(p - j)).max()
    return Z

bonf_ref = bonferroni_sample(p, steps=20)

Fithian, Sun, T. (2014)

A simulation example


def tnorm(lower, upper, p, B=10000):
    sample = np.zeros(B)
    U = np.random.sample((B, p))
    sample = normal_dbn.ppf(normal_dbn.cdf(lower) + 
                            U * (normal_dbn.cdf(upper) - 
                                 normal_dbn.cdf(lower)))
    return np.fabs(sample).max(1)

Fithian, Sun, T. (2014)

A simulation example


def simulate(truth, B=10000, steps=1, bonf_ref=None):
    # Compute simultaneous test and conditional test
    # of goodness of fit H_0:after step k everything 
    # is mean 0
    
    p = truth.shape[0]
    
    if bonf_ref is None:
        bonf_ref = bonferroni_reference(p)
    
    Z = np.random.standard_normal(truth.shape) + truth
    orderZ = np.argsort(-np.fabs(Z))
    sortZ = -np.sort(-np.fabs(Z))
    GOF = []
    bonferroni = []
    covTest = []
    bound = np.inf
    for i in range(steps):
        if i >= 1:
            bound = sortZ[i-1]

        GOF_ref = tnorm(-bound, bound, p - i - 1)
        GOF.append(np.mean(GOF_ref >= sortZ[i]))
        bonferroni.append(np.mean(bonf_ref[:,i] >= sortZ[i]))
        covTest.append(normal_dbn.sf(sortZ[i]) / normal_dbn.sf(sortZ[i+1]))
        
    return pd.DataFrame({'step':np.arange(steps)+1,
                         'feature':orderZ[:steps], 
                         'GOF': GOF,
                         'Bonferroni':bonferroni,
                         'covTest':covTest})
null = np.zeros(p)
simulate(np.zeros(p), bonf_ref=bonf_ref, steps=3)

##    step  feature     GOF  Bonferroni   covTest
## 0     1       52  0.1816      0.1768  0.209240
## 1     2       64  0.5286      0.6057  0.705575
## 2     3       88  0.3259      0.7391  0.779091

Fithian, Sun, T. (2014)

First step under null

Fithian, Sun, T. (2014)

Second step under null

Fithian, Sun, T. (2014)

Third step under null

Fithian, Sun, T. (2014)

Fourth step under null

Fithian, Sun, T. (2014)

Adding some signal

Fithian, Sun, T. (2014)

First step some signal

Fithian, Sun, T. (2014)

Second step some signal

Fithian, Sun, T. (2014)

Third step some signal

Fithian, Sun, T. (2014)

Fourth step some signal

Fithian, Sun, T. (2014)

After having discovered truth: step 7

Fithian, Sun, T. (2014)

After having discovered truth: step 8

Fithian, Sun, T. (2014)

After having discovered truth: step 9

Fithian, Sun, T. (2014)

After having discovered truth: step 10

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general $X$ case?

Forward stepwise

Step 1: \[ i_1^*(X, Y) = \text{argmax}_{i \in \{1,\dots,p\}} \frac{|X_j^TY|}{\|X_j\|_2} \]
Step 2: \[ i_2^*(X, Y, \{i_1\}) = \text{argmax}_{i \in \{1,\dots,p\} \setminus \{i_1\}} \frac{|X_j(I-P_1)^TY|}{\|(I-P_1)X_j\|_2} \]
Step $K+1$: \[ i_{K+1}^*(X, Y, E^*_k) = \text{argmax}_{i \in \{1,\dots,p\} \setminus E_k} \frac{|X_j(I-P_{E_k})^TY|}{\|(I-P_{E^*_k})X_j\|_2} \]

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general $X$ case?

Should we stop after finding $E_k$?

Goodness of fit test for model \[ {\cal M}_{E_k} = \left\{N(X_{E_k}\beta_{E_k}, \sigma^2 I): \beta_{E_k} \in \mathbb{R}^{E_k}\right\} \]
We will have to condition on $P_{E_k}^TY$ to remove nuisance parameters.
Key observation: \[ \left\{z: E_k^*(z)=E_k, P_{E_k}z=(P_{E_k}Y)_{obs} \right\} \left\{z: (I-P_{E_k})z \in R((P_{E_k}Y)_{obs}), P_{E_k}z=(P_{E_k}Y)_{obs} \right\} \]
Reference distribution: sample $W \sim N(0, \sigma^2 (I - P_{E_k}))$ restricted to $R((P_{E_k}Y)_{obs})$.
Pick a test statistic $U$ and reject if observed test statistic different from distribution under reference.

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general $X$ case?

Special case: use test statistic \[ U_{E_k}(W) = \max_{j \in \{1, \dots, p\} \setminus E_k} \frac{|X_j^T(I-P_{E_k})W|}{\|(I-P_{E_k})X_j\|_2} \]
Fixing $i^*_{k+1}=i_{k+1}$ this test statistic is a function of $X_{i_{k+1}}^TY$.
This sufficient statistic gets added to model ${\cal M}_{E_{k+1}}$.
Variance can be estimated or condition on $\|(I-P_{E_k})Y\|_2$.

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general $X$ case?

Consider full model \[ {\cal M}_f = \left\{N(X\beta, \sigma^2 I): \beta \in \mathbb{R}^p \right\} \]
For permutation $\{i_1,\dots,i_p\}$ \[ k_0(\beta, i_1,\dots,i_p) = \min \left\{k: P_{\{i_1, \dots, i_k\}}X\beta = X\beta \right\} \]
Theorem: $p$-values as computed above after $k_0(\beta, i^*_1(X,Y), \dots, i^*_p(X,Y))$ are IID $\text{Unif}(0,1)$.

Fithian, T., Tibshirani^2 (2015)

Stopping rules

Let $(p_1, \dots, p_p)$ denote the $p$-values constructed as above.
Set \[ \hat{k}_0(p_1,\dots,p_p) = \min \{k: p_k > \alpha\}. \]

Sequential FWER

\[ P_{\beta}(\hat{k}_0(p_1, \dots, p_p) > k_0(\beta, i^*_1,\dots, i^*_p)) \leq \alpha. \]

Sequential FDR?

Let $\hat{k}(p_1,\dots,p_p)$ be some stopping rule.
A “candidate” for FDR is \[ E_{\beta} \left( \frac{\max(\hat{k}(p_1, \dots, p_p) - k_0(\beta, i^*_1,\dots, i^*_p), 0)}{\max(\hat{k}(p_1, \dots, p_p), 1)}\right) \]
Leads us into FDR with structure (will discuss next week).

Fithian, Sun, T. (2014)

Binomial drop the losers

Take $K$ arms and with $n$ Bernoulli measurements per arm \[ i^* = \text{argmax}_{1 \leq i \leq K} \hat{p}_i \]
Collect $n'$ more Bernoulli measurements in arm $i^*$ yielding $Y'_{i^*}$
Sufficient statistic $(Y_{i^*} + Y'_{i^*}, (Y_j)_{j \neq i^*})$.

Fithian, Sun, T. (2014)

Binomial drop the losers

Selection event expressible in terms of $(Y_{i^*}, (Y_j)_{j \neq i^*})$
Conditioning on winner and $(Y_j)_{j \neq i^*}$ leads to law $\text{Binomial}(n+n', p_{i^*})$ modified by \[ \pi(t, (Y_j)_{j \neq i^*}) = P(Y_{i^*} > \max_{j \neq i^*}Y_j | \max_{j \neq i^*}Y_j, Y_{i^*} + Y'_{i^*}=t) \]
Law of $Y_{i^*} | Y_{i^*} + Y_{i^*}'$ is hypergeometric…
Can be modified for unequal sample sizes in arms, etc.
Can also apply this to stopping rules for PCA…

Brief notes on readings #5

web.stanford.edu/class/stats364/

Fithian, Sun, T. (2014)

General conditional approach

Fithian, Sun, T. (2014)

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\)

Fithian, Sun, T. (2014)

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\): selected models

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\): nonparametric models

Fithian, Sun, T. (2014)

Is it OK to use a selected model?

Fithian, Sun, T. (2014)

What to condition on?

Fithian, Sun, T. (2014)

Marginalization

Fithian, Sun, T. (2014)

Exponential familes are preserved

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Removing dependence on nuisance parameters

Proof (wlog assume \(\text{rank}(\Sigma_q)=p, \text{rank}(L_q'\Sigma_q L_q)=k\))

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Generalization of polyhedral lemma

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on \(\eta\)

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on \(\eta\)

Generalization of polyhedral lemma

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

A simulation example

Fithian, Sun, T. (2014)

A simulation example

Fithian, Sun, T. (2014)

A simulation example

Fithian, Sun, T. (2014)

First step under null

Fithian, Sun, T. (2014)

Second step under null

Fithian, Sun, T. (2014)

Third step under null

Fithian, Sun, T. (2014)

Fourth step under null

Fithian, Sun, T. (2014)

Adding some signal

Fithian, Sun, T. (2014)

First step some signal

Fithian, Sun, T. (2014)

Second step some signal

Fithian, Sun, T. (2014)

Third step some signal

Fithian, Sun, T. (2014)

Fourth step some signal

Fithian, Sun, T. (2014)

After having discovered truth: step 7

Fithian, Sun, T. (2014)

After having discovered truth: step 8

Fithian, Sun, T. (2014)

After having discovered truth: step 9

Fithian, Sun, T. (2014)

After having discovered truth: step 10

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general \(X\) case?

Forward stepwise

Fithian, T., Tibshirani^2 (2015)