Defines notion of selective Type I error, selective coverage
Suppose \({\cal Q}={\cal Q}(D)\) generates null hypotheses to test: i.e. pairs \(({\cal M}, H_0)\) with \(H_0 \subset {\cal M}\).
Set \(q=({\cal M}, H_0)\). A test \(\phi = \phi_{({\cal M},H_0)}(D) \in \{0,1\}\) (with 1 a rejection) has selective type I error at \(F \in H_0\) \[ P_{F}(\phi(D) = 1 \vert q \in {\cal Q}(D)) \] and controls selective type I error at \(\alpha\) if \[ \sup_{F \in H_0}P_{F}(\phi(D) = 1 \vert q \in {\cal Q}(D)). \]
Instead of choosing null hypotheses, \({\cal Q}\) might generate parameters for which confidence intervals or regions are desired, i.e. \(({\cal M}, \theta: {\cal M} \rightarrow \mathbb{R}^k)\).
Set \(q = ({\cal M}, \theta: {\cal M} \rightarrow \mathbb{R}^k)\). A confidence region \(CI = CI_q(D)\) is a \((1-\alpha) \cdot 100\%\) confidence region for \(\theta\) if \[ P_F ( \theta(F) \in CI_q(D) \vert q \in {\cal Q}(D)) \geq 1 - \alpha, \qquad \forall F \in {\cal M}. \]
Note: parameters are functions on \({\cal M}\).
Linear regression with saturated model and \(X\) fixed (\(\sigma^2=1\) and known) \[ {\cal M}_s = \left\{N(\mu, I): \mu \in \mathbb{R}^n \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_s, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\mu) = e_j^TX_E^{\dagger}\mu \]
Linear regression with full model \(p < n\) and \(X\) fixed (\(\sigma^2=1\) and known) \[ {\cal M}_f = \left\{N(X\beta, I): \beta \in \mathbb{R}^p \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_f, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] \[ \beta_{j|E}(\beta) = e_j^TX_E^{\dagger}X\beta \]
Linear regression with full model \(p < n\) and \(X\) fixed (\(\sigma^2\) unknown) \[ {\cal M}_U = \left\{N(X\beta, \sigma^2 I): \beta \in \mathbb{R}^p, \sigma^2 > 0 \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_U, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] \[ \beta_{j|E}(\beta) = e_j^TX_E^{\dagger}X\beta \]
Linear regression with a selected model and \(X\) fixed (\(\sigma^2=1\) and known) \[ {\cal M}_E = \left\{N(X_E\beta_E, I): \beta_E \in \mathbb{R}^E \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{\hat{E}(D)}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\beta_E) = e_j^T\beta_E \]
Linear regression with a selected model and \(X\) fixed and \(\sigma^2\) unknown \[ {\cal M}_{E,u} = \left\{N(X_E\beta_E, I): \beta_E \in \mathbb{R}^E, \sigma^2 > 0 \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{\hat{E}(D)}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \] and \[ \beta_{j|E}(\beta_E) = e_j^T\beta_E \]
Linear regression in the pairs model \[ {\cal M}_{NP} = \left\{ {\cal L}(X, Y): (X[i,], Y[i]) \overset{IID}{\sim} F, 1 \leq i \leq n \right\} \] and \[ {\cal Q}(D) = \left\{({\cal M}_{NP}, \beta_{j|\hat{E}(D)}): j \in \hat{E}(D)\right\} \]
Targets \[ \beta_{j|E}(F) = e_j' E_F[X_EX_E^T]^{-1} E_F[X_E \cdot Y]. \]
set.seed(0)
X = matrix(rnorm(500), 100, 5)
Y = rnorm(100)
M = lm(Y ~ X, subset=1:50)
E = (abs(coef(summary(M))[,3]) > 1.5)[-1]
print(E)## X1 X2 X3 X4 X5
## TRUE FALSE FALSE FALSE FALSE
##
## Call:
## lm(formula = Y ~ X[, E], subset = 51:100)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5089 -0.6060 -0.1135 0.6509 1.8467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.12880 0.12253 -1.051 0.298
## X[, E] -0.07858 0.14438 -0.544 0.589
##
## Residual standard error: 0.8661 on 48 degrees of freedom
## Multiple R-squared: 0.006134, Adjusted R-squared: -0.01457
## F-statistic: 0.2962 on 1 and 48 DF, p-value: 0.5888
(X, Y)[1:50,] is collected before (X, Y)[51:100,] this is just scientific method.The event \(A_q = \left\{D:q \in {\cal Q}(D)\right\}\) is the selection event for \(q\).
We will assume that \(P_F(A_q) > 0 \ \forall F \in {\cal M}_q\).
Coverage and type I error can be rephrased in terms of \(A_q\), e.g. a selective confidence region for \(({\cal M}_q, \theta_q:{\cal M}_q \rightarrow \mathbb{R}^{k_q})\) satisfies \[ P_F ( \theta_q(F) \in CI_q(D) \vert A_q) \geq 1 - \alpha, \qquad \forall F \in {\cal M}_q. \]
This yields a selected conditional model (really a truncated model) \[ {\cal M}^*_q = \left\{F^*: \frac{dF^*}{dF} \propto 1_{A_q}, F \in {\cal M}_q \right\} \] even if \({\cal M}_q\) doesn’t depend on \({\cal M}\).
Auxiliary information is referred to as a selection variable and is denoted as \(S=S_q\).
With \(s_q\) the observed value of \(S_q\) this leads to a final model \[ {\cal M}^*_{q,s_q} = \left\{F^* | {\cal L}_{F^*}(D \vert S_q=s_q): F^* \in {\cal M}_q^* \right\} \]
Suppose we construct valid confidence intervals for the family \[ \bigcup_{s_q}{\cal M}^*_{q,s_q}, \] i.e. \[ P_{F}(\theta_q(F) \in CI(D, q,s_q) \vert A_q ,S_q=s_q) \overset{a.s.}{\geq} 1- \alpha \qquad \forall F \in {\cal M}_q. \]
Then, these confidence intervals are valid on \({\cal M}^*_q\), i.e. \[ P_{F}(\theta_q(F) \in CI(D,q,s_q) \vert A_q) \geq 1- \alpha \qquad \forall F \in {\cal M}_q. \]
We could marginalize over \({\cal Q}\) as well, of course.
Suppose \({\cal M}_q\) is an exponential family with sufficient statistic \(T(D)\), natural parameter \(\eta\) and reference measure \(m_0\): \[ \frac{dF}{dm_0}(D) = \frac{dF_{\eta}}{dm_0}(D) \propto \exp\left(\eta'T(D)\right) \]
Then, \({\cal M}^*_q\) is again an exponential family with the same sufficient statistic and natural parameter with truncated reference measure \(m_{0,q}\) \[ \frac{dm_{0,q}}{dm_0}(D) = 1_{A_q}(D). \]
Suppose \(\theta_q(\eta) = L_q'\eta\) is a \(k\)-dimensional linear functional. Then, we can write \[ \eta'T(D) = \eta'P_{q}T(D) + \eta'(I-P_{q})T(D) \] with \(P_{q}\) projection onto \(\text{col}(L_q)\).
Setting \(S_q={\cal N}_q = (I-P_q)T(D)\), standard exponential family results imply that (for every \(s_q\)) the conditional model \({\cal M}^*_{q,s_q}\) is an exponential family with natural parameter \(\theta_q\), sufficient statistic \(L_q^{\dagger}T\).
To test \(H_0:\theta_q = \theta_{0,q}\) take \(F \in {\cal M}_q\) with \(\theta_q(F) = \theta_{0,q}\) and draw samples from \(F^* \in {\cal M}^*_{q,s_q}\) and use whatever test statistic you like.
Optimal tests can be constructed using tools from exponential family theory (e.g. UMPU).
Suppose \(m_0 = N(0, \Sigma_q)\), i.e. under \(\eta=0\), $ ^p D N(0,_q)$
Sufficient statistic: \(T_q(D)=D\).
Mean parameters: \(\beta_q \in \text{row}(\Sigma_q)\) with natural parameters \(\eta_q=\Sigma^{\dagger}_q\beta_q\). (If \(\Sigma_q\) is degenerate, \(\text{row}(\Sigma_q)\) should probably not depend on \(q\).)
Suppose we want to test \(H_0:L_q'\beta_q=L_q'\Sigma_q\eta_q= \theta_{0,q}\) (wlog we assume \(\text{row}(L_q) \subset \text{row}(\Sigma_q)\).
Set \[ \hat{\theta}_q(D) = L_q'D \]
Set \(P_q = L_qL_q^{\dagger}\) and \(P_q^{\perp} = I - P_q\) (could also take \(P_q^{\perp} = P_{\text{row}(\Sigma_q)} - P_q\).
Claim: the laws \(N(\beta_q, \Sigma_q)\) agree on the sigma algebras \[ {\cal F}_1 = \sigma(P_q^{\perp}\Sigma_q^{\dagger}D), \qquad {\cal F}_2 = \sigma(D - \Gamma_q \hat{\theta}_q) \] with \[ \Gamma_q = \Sigma_q L_q (L_q'\Sigma_q L_q)^{-1} \]
Further, these laws depend only on \(L_q'\beta_q\).
Density of \(N(\beta_q, \Sigma_q)\) on \(\text{row}(\Sigma_q)\) is proportional to \[ \begin{aligned} \exp\left(\beta_q'\Sigma_q^{-1} D \right) &= \exp\left(\beta_q'P_q\Sigma_q^{-1} D + \beta_q'P_q^{\perp}\Sigma_q^{-1} D \right) \end{aligned} \]
If we condition this law on \(P_q^{\perp}\Sigma_q^{-1}D\) it will depend on \(\beta_q\) only through \(L_q'\beta_q\). This corresponds to \({\cal F}_2\) above.
If we restrict these conditional laws to \(A_q\) they will again depend on \(\beta_q\) only through \(L_q'\beta_q\).
We need to show that conditioning on \({\cal N}_q(D) = D - \Gamma_q \hat{\theta}_q\) is the same.
Note that \[ \begin{aligned} \text{Cov}(\hat{\theta}_q(D), {\cal N}_q(D)) &= 0\\ \text{Cov}(\hat{\theta}_q(D), P_q^{\perp}\Sigma_q^{-1}D) &= 0 \end{aligned} \] and \(\text{rank}(\text{Cov}({\cal N}_q(D))) = \text{rank}(\text{Cov}(P_q^{\perp}\Sigma_q^{\dagger})) = p - k\).
We can choose \(C_q\) an inverse square-root of \(\Sigma_q\) so that \[ C_qD \sim N(C_q\beta_q, I_{\text{rank}(\Sigma_q)}) = \] with \(\hat{\theta}_q = A_q (C_qD)_{[1:k]}\).
We conclude that \({\cal N}_q(D)\) and \(P_q^{\perp}\Sigma_q^{-1}D\) are linear functions of \((C_qD)_{[(k+1):p]}\).
Setting \(S_q = {\cal N}_q(D) = D - \Sigma(L_q'\Sigma L_q)^{\dagger} \hat{\theta}_q\) the final conditional model is equivalent to the restriction of the laws \[ {\cal L}_F(D \vert {\cal N}_q), \qquad F \in {\cal M}_q \] to the event \[ \begin{aligned} \left\{d: d \in A_q, {\cal N}(d_q)=s_q\right\} = \left\{d: s_q + \Gamma_q \hat{\theta}_q(d) \in A_q, {\cal N}(d)=s_q\right\}. \end{aligned} \]
These laws are equivalent to the laws \[ {\cal L}_F(\hat{\theta}_q(D) \vert {\cal N}_q + \Gamma_q \hat{\theta}_q \in A_q, {\cal N}_q), \qquad F \in {\cal M}_q \]
To test \(H_0:L_q'\beta_q=\theta_{0,q}\) take any \(F \in {\cal M}_q\) such that \(L_q'\beta_q=\theta_{0,q}\) and sample from the above conditional distribution. Use an appropriate test statistic.
Often, \(\text{dim}(\eta) \neq \text{dim}(\text{row}(\Sigma))\). For example in \({\cal M}_f\) when \(n > p\), \(\text{dim}(\eta)=p\) while \(\Sigma=I_{n \times n}\).
Usually constraints are expressed in mean parameters \(\beta\) rather than natural parameters \(\eta\).
In this case, we might write \[ {\cal M}_q = \left\{N(\beta_q, \Sigma_q): \beta_q \in \text{row}(\Sigma_q), R_{{\cal M}_q}\beta_q=0 \right\}. \] for some residual projection \(R_{{\cal M},q}\) in our model \({\cal M}_q\).
In construction the reference distribution it is not necessary to condition on this residual as its distribution is known.
For \(\beta_q\) such that \(R_q\beta_q=0\), density is proportional to \[ \exp(\beta_q'\Sigma^{-1}_qD) = \exp(\beta_q'P_{{\cal M}_q}\Sigma^{-1}_qD) \] where \(P_{{\cal M}_q} = I - R_{{\cal M}_q}\).
For \(L_q\) such that \(R_{{\cal M}_q}L_q=0\) we might test \(H_0:L_q'\beta_q=\delta\).
Our target estimator is \(\hat{\theta}_q(D) = L_q'D\) and our nuisance sufficient statistic is going to be \[ {\cal N}_q(D) = {\cal P}_{{\cal M}_q}D - P_{{\cal M}_q}\Sigma_q L_q (L_q'\Sigma_qL_q)^{-1} \hat{\theta}_q(D) \]
This effectively decomposes \(D\) into three independent pieces \((\hat{\theta}_q, {\cal N}_q, \Delta_q)\) with \[ \Delta_q = R_{{\cal M}_q}\Sigma^{-1}_qD \]
We can linearly decompose \[ D = a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \] where \(a_q, b_q\) depend on \((L_q,\Sigma_q, R_{{\cal M}_q})\) and \(c_q\) depends only on \((\Sigma^{-1}_q, R_{{\cal M}_q})\).
Conditioning an \(F \in {\cal M}_q\) on \({\cal N}_q\) leaves \((\hat{\theta}_q, R_{{\cal M}_q}\Sigma_q^{-1}D)\) independent with \[ \Delta_q \sim N(0, R_{{\cal M}_q}\Sigma_q^{-1}R_{{\cal M}_q}) \]
Setting \(S_q = {\cal N}_q(D)\) the final conditional model is equivalent to the restriction of the laws \[ {\cal L}_F(D \vert {\cal N}_q), \qquad F \in {\cal M}_q \] to the event \[ \begin{aligned} \left\{d: d \in A_q, {\cal N}(d_q)=s_q\right\} \end{aligned} \]
As \(\hat{\theta}_q\) is sufficient, we can marginalize over \(\Delta_q\) yielding the laws \[ {\cal L}_F(\hat{\theta}_q(D) \vert a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \in A_q, {\cal N}_q), \qquad F \in {\cal M}_q \]
The resulting density under \(H_0:L_q'\beta_q=\delta\) is proportional to \[ t \mapsto \phi_{(\delta, L_q'\Sigma_qL_q)}(t) \cdot \pi_q(t,n_q) \] with \[ \pi_q(t,n_q) = P_F\left( a_q \hat{\theta}_q + b_q {\cal N}_q + c_q \Delta_q \in A_q, \hat{\theta}_q=t, {\cal N}_q=n\right) \]
To test \(H_0:L_q'\beta_q=\theta_{0,q}\) take any \(F \in {\cal M}_q\) such that \(L_q'\beta_q=\theta_{0,q}\) and sample from the above conditional distribution. Use an appropriate test statistic.
Let’s go back to our first “example” of this conditional approach, the covTest of Lockhart et al.
Let \(i^*(X,Y)\) denote the first variable to enter the LASSO path with sign \(s^*\).
Let’s take the saturated model \[ {\cal M} = \left\{N(\mu, I): \mu \in \mathbb{R}^n\right\} \]
The framework laid out above says that (in order to avoid perils of plugin) we must look at \[ {\cal L}(sX_i'Y | (i^*,s^*)=(i,s), (I - P_i)Y) \overset{H_0:X_1^T\mu=0}{=} \text{TN}(0, [M(i,s), \infty)) \] with \[ M(i,s) = M(i,s, (I - P_i)Y) = \max_{l \neq j, s' \in \pm 1} \frac{s' X_l^T(I - P_{i})Y}{1 - s' X_l^TX_{i}} \]
We really want this test to tell us when to stop the LASSO. We might instead look at the model \[ {\cal M}_i = \left\{N(X_i\beta_i, I): \beta_i \in \mathbb{R}\right\}. \]
In this case, the correct reference law is \[ {\cal L}(sX_i'Y | sX_i'Y \geq M(i,s, (I-P_i)Y)) \]
In this case \({\cal N}=0\) and \[ \pi(t,n) = P_F(M(i,s,(I-P_i)Y) \leq t). \]
Note that, \((I-P_i)Y \sim N(0, I-P_i)\) and can be simulated.
In simple case \(X^TX=I\) this gives the (exact) Bonferroni simultaneous test!
Rather than just focusing on exponential family model, we can certainly use the above test as a goodness of fit test for \(H_0:\mu=0\).
For a selection event \(A_q\) the general setup above says that a goodness of fit in exponential family model \({\cal M}_q\) can be constructed using \[ {\cal L}_F(D \vert T(D)=T_{obs}, D \in A_q) \]
We note that sampling such distributions can be challenging (c.f. Diaconis et al. (2013))
What about the next step?
Suppose we have found feature \(i\) is the best in first stage with sign \(s\).
A reasonable next question to ask is \(H_0:(I-P_i)\mu=0\). For model we should allow for \(P_i\mu \neq 0\) so we take model \({\cal M}_i\).
To test this goodness of fit test, we might look at \(M(i,s)\) (this would be covTest approach) or simply forward stepwise \[ T_{\max} = \max_{j \neq i} \frac{|X_j'(I-P_i)Y|}{\|(I-P_i)X_j\|_2}. \]
For this test, to avoid perils of plugin we should look at \[ {\cal L}(T_{\max} | (i^*,s^*)=(i,s), X_i'Y). \]
An alternative test might condition on the identity \((j^*, \tilde{s}^*)\) and winning sign in \(T_{\max}\), perhaps expressed as a test \(H_0:\beta_j=0\) in the model \[ {\cal M}_{(i,j)} = \left\{N(X_i\beta_i + X_j \beta_j, I): \beta_i, \beta_j \in \mathbb{R} \right\} \].
In this case, appropriate reference would be \[ {\cal L}\left(\frac{\tilde{s}X_j'(I-P_i)Y}{\|(I-P_i)X_j\|_2} \biggl \vert X_i'Y, (i^*,s^*)=(i,s), (j^*,\tilde{s}^*)=(j,\tilde{s}) \right). \]
In both cases the statistic we use for this step is the same, though the reference is slightly different.
Let’s look again at case \(X'X=I\).
The goodness of fit test uses reference \[ {\cal L}\left(\max_{j \neq i}|Z_j| \biggl \vert \ |Z_j| \leq |Z_i|, Z_i\right) \] with \((Z_j)_{j \neq i} \overset{IID}{\sim} N(0,1)\).
While the second test uses reference \[ {\cal L}\left(|Z_j| \biggl \vert \max_{k \not \in \{i,j\}} \ |Z_k| \leq |Z_j| \leq |Z_i|, Z_i\right) \]
The covTest uses reference \[ {\cal L}\left(|Z_j| \biggl \vert \ \max_{k \not \in \{i,j\}} |Z_k| \leq |Z_j|, (Z_k)_{k \neq \in \{i,j\}}\right). \]
p = 100 # number of features; orthogonal design
def bonferroni_sample(p, alpha=0.05, B=10000, steps=1):
# At step j, Bonferroni will use distribution
# of largest p-j+1 absolute Z N(0,1)'s
Z = np.empty((B, steps))
for j in range(steps):
for i in range(B):
Z[i,j] = np.fabs(np.random.standard_normal(p - j)).max()
return Z
bonf_ref = bonferroni_sample(p, steps=20)
def simulate(truth, B=10000, steps=1, bonf_ref=None):
# Compute simultaneous test and conditional test
# of goodness of fit H_0:after step k everything
# is mean 0
p = truth.shape[0]
if bonf_ref is None:
bonf_ref = bonferroni_reference(p)
Z = np.random.standard_normal(truth.shape) + truth
orderZ = np.argsort(-np.fabs(Z))
sortZ = -np.sort(-np.fabs(Z))
GOF = []
bonferroni = []
covTest = []
bound = np.inf
for i in range(steps):
if i >= 1:
bound = sortZ[i-1]
GOF_ref = tnorm(-bound, bound, p - i - 1)
GOF.append(np.mean(GOF_ref >= sortZ[i]))
bonferroni.append(np.mean(bonf_ref[:,i] >= sortZ[i]))
covTest.append(normal_dbn.sf(sortZ[i]) / normal_dbn.sf(sortZ[i+1]))
return pd.DataFrame({'step':np.arange(steps)+1,
'feature':orderZ[:steps],
'GOF': GOF,
'Bonferroni':bonferroni,
'covTest':covTest})
null = np.zeros(p)
simulate(np.zeros(p), bonf_ref=bonf_ref, steps=3)## step feature GOF Bonferroni covTest
## 0 1 52 0.1816 0.1768 0.209240
## 1 2 64 0.5286 0.6057 0.705575
## 2 3 88 0.3259 0.7391 0.779091
Step 1: \[ i_1^*(X, Y) = \text{argmax}_{i \in \{1,\dots,p\}} \frac{|X_j^TY|}{\|X_j\|_2} \]
Step 2: \[ i_2^*(X, Y, \{i_1\}) = \text{argmax}_{i \in \{1,\dots,p\} \setminus \{i_1\}} \frac{|X_j(I-P_1)^TY|}{\|(I-P_1)X_j\|_2} \]
Step \(K+1\): \[ i_{K+1}^*(X, Y, E^*_k) = \text{argmax}_{i \in \{1,\dots,p\} \setminus E_k} \frac{|X_j(I-P_{E_k})^TY|}{\|(I-P_{E^*_k})X_j\|_2} \]
Goodness of fit test for model \[ {\cal M}_{E_k} = \left\{N(X_{E_k}\beta_{E_k}, \sigma^2 I): \beta_{E_k} \in \mathbb{R}^{E_k}\right\} \]
We will have to condition on \(P_{E_k}^TY\) to remove nuisance parameters.
Key observation: \[ \left\{z: E_k^*(z)=E_k, P_{E_k}z=(P_{E_k}Y)_{obs} \right\} \left\{z: (I-P_{E_k})z \in R((P_{E_k}Y)_{obs}), P_{E_k}z=(P_{E_k}Y)_{obs} \right\} \]
Reference distribution: sample \(W \sim N(0, \sigma^2 (I - P_{E_k}))\) restricted to \(R((P_{E_k}Y)_{obs})\).
Pick a test statistic \(U\) and reject if observed test statistic different from distribution under reference.
Special case: use test statistic \[ U_{E_k}(W) = \max_{j \in \{1, \dots, p\} \setminus E_k} \frac{|X_j^T(I-P_{E_k})W|}{\|(I-P_{E_k})X_j\|_2} \]
Fixing \(i^*_{k+1}=i_{k+1}\) this test statistic is a function of \(X_{i_{k+1}}^TY\).
This sufficient statistic gets added to model \({\cal M}_{E_{k+1}}\).
Variance can be estimated or condition on \(\|(I-P_{E_k})Y\|_2\).
Consider full model \[ {\cal M}_f = \left\{N(X\beta, \sigma^2 I): \beta \in \mathbb{R}^p \right\} \]
For permutation \(\{i_1,\dots,i_p\}\) \[ k_0(\beta, i_1,\dots,i_p) = \min \left\{k: P_{\{i_1, \dots, i_k\}}X\beta = X\beta \right\} \]
Theorem: \(p\)-values as computed above after \(k_0(\beta, i^*_1(X,Y), \dots, i^*_p(X,Y))\) are IID \(\text{Unif}(0,1)\).
Let \((p_1, \dots, p_p)\) denote the \(p\)-values constructed as above.
Set \[ \hat{k}_0(p_1,\dots,p_p) = \min \{k: p_k > \alpha\}. \]
\[ P_{\beta}(\hat{k}_0(p_1, \dots, p_p) > k_0(\beta, i^*_1,\dots, i^*_p)) \leq \alpha. \]
Let \(\hat{k}(p_1,\dots,p_p)\) be some stopping rule.
A “candidate” for FDR is \[ E_{\beta} \left( \frac{\max(\hat{k}(p_1, \dots, p_p) - k_0(\beta, i^*_1,\dots, i^*_p), 0)}{\max(\hat{k}(p_1, \dots, p_p), 1)}\right) \]
Leads us into FDR with structure (will discuss next week).
Take \(K\) arms and with \(n\) Bernoulli measurements per arm \[ i^* = \text{argmax}_{1 \leq i \leq K} \hat{p}_i \]
Collect \(n'\) more Bernoulli measurements in arm \(i^*\) yielding \(Y'_{i^*}\)
Sufficient statistic \((Y_{i^*} + Y'_{i^*}, (Y_j)_{j \neq i^*})\).
Selection event expressible in terms of \((Y_{i^*}, (Y_j)_{j \neq i^*})\)
Conditioning on winner and \((Y_j)_{j \neq i^*}\) leads to law \(\text{Binomial}(n+n', p_{i^*})\) modified by \[ \pi(t, (Y_j)_{j \neq i^*}) = P(Y_{i^*} > \max_{j \neq i^*}Y_j | \max_{j \neq i^*}Y_j, Y_{i^*} + Y'_{i^*}=t) \]
Law of \(Y_{i^*} | Y_{i^*} + Y_{i^*}'\) is hypergeometric…
Can be modified for unequal sample sizes in arms, etc.
Can also apply this to stopping rules for PCA…