Selection through a black box

By now we should now that…

If our model ${\cal M}$ admits an estimator $\hat{\theta}(D) \approx N(\theta(F), \sigma^2(F))$ and we have access to $\pi(t,n)$.
Then to test $H_0:\theta(F)=\theta_0$ we sample from density \[ t \mapsto \phi((t - \theta_0) / \sigma) \cdot \pi(t, {\cal N}_{obs}) \]
Requires our usual decomposition \[ D = {\cal N} + \Gamma \hat{\theta} \] where \[ \Gamma = \Gamma(F) = \text{Cov}_F(D, \hat{\theta}(D)) \text{Cov}_F(\hat{\theta}(D))^{-1}. \]

Selection through a black box

What do we really need to do this?

We really need \[ \pi(t,n) = P(\hat{\cal Q}(n + \Gamma t)=\hat{\cal Q}(D,\omega)=q \vert {\cal N}=n, \hat{\theta}=t) \] with $\omega$ implicit randomness in the algorithm.

A simple idea

If we have access in silico to $\hat{\cal Q}$ we can draw samples \[ T_i, Y_i=1_{\hat{\cal Q}({\cal N} + \Gamma (\cdot))=q}(T_i) \]
Then, we can learn $t \mapsto \pi(t, {\cal N})$ using any flexible probability learner.
Given this function, use it in reference density above.

How to draw samples?

Bootstrap an obvious candidate…

Selection through a black box

A simple example: drop the losers

def winning_arm(*samples):
    return np.argmax([np.mean(sample) for sample in samples])

truth = [1,0.9,1.1]
samples = [np.random.standard_normal(20) + truth[0], 
           np.random.standard_normal(20) + truth[1],
           np.random.standard_normal(20) + truth[2]]
i_star = winning_arm(*samples)
followup = np.random.standard_normal(20) + truth[i_star]
i_star

## 0

Selection through a black box

A simple example: drop the losers

np.random.seed(2)
def generate_learning_data(i_star,
                           samples,
                           followup,
                           winning_arm=winning_arm,
                           B=5000):
    samples_cp = copy(samples)
    Y, T = [], []
    n_star = len(samples_cp[i_star])
    full_data = np.hstack([samples_cp[i_star], followup])
    for b in range(B):
        full_data_star = np.random.choice(full_data, full_data.shape[0], replace=True)
        samples_cp[i_star] = np.random.choice(full_data_star, n_star, replace=False)
        Y.append(winning_arm(*samples_cp) == i_star)
        T.append(full_data_star.mean())
    return np.array(Y), np.array(T).reshape((-1,1))

Y, T = generate_learning_data(i_star, samples, followup)
Y.shape, T.shape

## ((5000,), (5000, 1))

Fit the probability learner

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(random_state=0)
clf.fit(T, Y);

Selection through a black box

A simple example: drop the losers

Bootstrap may not capture full range

Exercise

What should this curve look like?
Assume $i^*=1$. Denote What would happen if we instead learnt selection as a function of $(\bar{D}_1, \bar{D}_{1,1})$ instead of only $\bar{D}_1$? Can you use such a function to do the correct adjustment?

import matplotlib.pyplot as plt
# %matplotlib inline
pi_hat = clf.predict_proba(T)
plt.scatter(T[:,0], pi_hat[:,1])

Selection through a black box

How to vary range of $\bar{D}_{i}$ for $i=i^*$?

We must maintain \[ \Gamma = \frac{\text{Cov}(\bar{D}_{1,i}, \bar{D}_i)}{\text{Var}(\bar{D}_i)} \] and \[ \text{Var}(\bar{D}_{1,i}) \]
Bootstrapping with any smaller sample size greater than $n_{1,i}$ will do…

def generate_learning_data_frac(i_star, 
                                samples, 
                                followup, 
                                winning_arm=winning_arm,
                                B=5000, 
                                frac=0.6):
    samples_cp = copy(samples)
    Y, T = [], []
    n_star = len(samples_cp[i_star])
    full_data = np.hstack([samples_cp[i_star], followup])
    for b in range(B):
        full_data_star = np.random.choice(full_data,
                                          int(frac * full_data.shape[0]),
                                          replace=True)
        samples_cp[i_star] = np.random.choice(full_data_star, n_star, replace=False)
        Y.append(winning_arm(*samples_cp) == i_star)
        T.append(full_data_star.mean())
    return np.array(Y), np.array(T).reshape((-1,1))

Y2, T2 = generate_learning_data_frac(i_star, samples, followup)
clf2 = GradientBoostingClassifier(random_state=0)
clf2.fit(T2, Y2);
pi_hat2 = clf2.predict_proba(T2)
plt.scatter(T2[:,0], pi_hat2[:,1])

T3, Y3 = np.vstack([T, T2]), np.hstack([Y, Y2])
clf3 = GradientBoostingClassifier(random_state=0)
clf3.fit(T3, Y3);
pi_hat3 = clf3.predict_proba(T3)
plt.scatter(T3[:,0], pi_hat3[:,1])

Y4, T4 = generate_learning_data_frac(i_star, samples, followup, frac=0.7, B=5000)
T4, Y4 = np.vstack([T4, T3]), np.hstack([Y4, Y3])
clf4 = GradientBoostingClassifier(random_state=0)
clf4.fit(T4, Y4);
pi_hat4 = clf4.predict_proba(T4)
plt.scatter(T4[:,0], pi_hat4[:,1])

def pvalue(full_data_mean, 
           se, 
           pi_hat, 
           B=10000,
           truth=1):
    # test H_a: mean in winning arm is > truth
    Z = np.random.standard_normal(B) * se + truth
    W = pi_hat(Z)
    sf_ = np.sum(W * (Z > full_data_mean)) / np.sum(W)
    return sf_

full_data_winning = np.hstack([samples[i_star], followup])
pvalue(full_data_winning.mean(), 
       full_data_winning.std() / np.sqrt(full_data_winning.shape[0]),
       lambda t: clf4.predict_proba(t.reshape((-1,1)))[:,1])

## 0.6218785858108452

def simulate(sizes=[20, 20, 20],
             size_followup=20,
             fracs=[1,0.6,0.7],
             truth=[0.9, 1, 1.1],
             winning_arm=winning_arm):
    
    # run our trial
    
    samples = [np.random.standard_normal(size) + truth[i] for i, size in enumerate(sizes)]
    i_star = winning_arm(*samples)
    followup = np.random.standard_normal(size_followup) + truth[i_star]
    
    # generate our learning data
    
    Y, T = [], []
    for frac in fracs:
        y, t = generate_learning_data_frac(i_star, samples, followup, frac=frac)
        Y.append(y); T.append(t)
    Y = np.hstack(Y)
    T = np.vstack(T)
    
    # learn the selection probability
    
    clf = GradientBoostingClassifier(random_state=0)
    clf.fit(T, Y)
    
    # use selection probability for inference
    
    full_data_winning = np.hstack([samples[i_star], followup])
    pi_hat = lambda t: clf.predict_proba(t.reshape((-1,1)))[:,1]
    return pvalue(full_data_winning.mean(), 
                  full_data_winning.std() / np.sqrt(full_data_winning.shape[0]),
                  pi_hat,
                  truth=truth[i_star])
simulate()

## 0.2442717377465183

P = []
for _ in range(50):
    P.append(simulate())

pivot_plot(P);

Selection through a black box

Can we just bootstrap all the data?

Generating $Y$’s from the bootstrap distribution will result in variation in $\bar{D}_{1,j}, j \neq i^*$.
We can learn \[ (t_1, t_2, t_3) = \tilde{\pi}(t_1, t_2, t_3) = P\left (i^*(D)=1 \bigl \vert \bar{D}_1=t_1, \bar{D}_{2,1}=t_2, \bar{D}_{3,1}=t_3 \right) \]
If $i^*=1$ \[ t \mapsto P\left(i^*(D)=1 \bigl \vert \bar{D}_1=t, \bar{D}_{j,1}=\bar{D}_{j,1,obs} \forall j \geq 2 \right) \]
Can be recovered as \[ t \mapsto \tilde{\pi}(t, \bar{D}_{2,1,obs}, \bar{D}_{3,1,obs}) \]

def generate_learning_data_boot(i_star,
                                samples,
                                followup,
                                winning_arm=winning_arm,
                                B=5000,
                                frac=0.7):

    samples_cp = copy(samples)
    Y, T = [], []
    n_star = len(samples_cp[i_star])
    full_data = np.hstack([samples[i_star], followup])
    for b in range(B):
        full_data_star = np.random.choice(full_data, int(frac * full_data.shape[0]), replace=True)
        samples_cp[i_star] = np.random.choice(full_data_star, n_star, replace=False)
        for j in range(len(samples)):
            if j != i_star:
                samples_cp[j] = np.random.choice(samples[j], int(frac * samples[j].shape[0]), replace=True)
        Y.append(winning_arm(*samples_cp) == i_star)
        T.append([full_data_star.mean()] + [samples_cp[j].mean() for j in range(len(samples)) if j != i_star])
    return np.array(Y), np.array(T)

Y, T = generate_learning_data_boot(i_star, samples, followup)

T.shape

## (5000, 3)

def simulate_boot(sizes=[20, 20, 20],
                  size_followup=20,
                  fracs=[1,0.6,0.7],
                  truth=[0.9, 1, 1.1],
                  winning_arm=winning_arm):
    
    # run our trial
    
    samples = [np.random.standard_normal(size) + truth[i] for i, size in enumerate(sizes)]
    i_star = winning_arm(*samples)
    followup = np.random.standard_normal(size_followup) + truth[i_star]
    
    # generate our learning data
    
    Y, T = [], []
    for frac in fracs:
        y, t = generate_learning_data_boot(i_star, samples, followup, frac=frac)
        Y.append(y); T.append(t)
    Y = np.hstack(Y)
    T = np.vstack(T)
    
    # learn the selection probability
    
    clf = GradientBoostingClassifier(random_state=0)
    clf.fit(T, Y)
    
    # use selection probability for inference
    
    other_means = [samples[j].mean() for j in range(len(samples)) if j != i_star]
    full_data_winning = np.hstack([samples[i_star], followup])
    
    def pi_hat(t):
        full_t = np.zeros((t.shape[0], len(samples)))
        full_t[:,0] = t
        full_t[:,1:] = other_means
        return clf.predict_proba(full_t)[:,1]

    return pvalue(full_data_winning.mean(), 
                  full_data_winning.std() / np.sqrt(full_data_winning.shape[0]),
                  pi_hat,
                  truth=truth[i_star])

simulate_boot()

## 0.32429574652109217

P = []
for _ in range(50):
    P.append(simulate_boot())

Selection through a black box

Does it depend heavily on our selection?

# center them around 0
for sample in samples:
    sample -= 1
    
def winning_arm_abs(*samples):
    return np.argmax([np.fabs(np.mean(sample)) for sample in samples])

i_star_abs = winning_arm_abs(*samples)
Y4, T4 = generate_learning_data_frac(i_star_abs, samples, followup, frac=0.6, B=25000,
                                     winning_arm=winning_arm_abs)
clf4 = GradientBoostingClassifier(random_state=0)
clf4.fit(T4, Y4);
pi_hat4 = clf4.predict_proba(T4)
plt.scatter(T4[:,0], pi_hat4[:,1])

## <matplotlib.collections.PathCollection object at 0x1c2efbb8d0>

P = []
for _ in range(50):
    P.append(simulate(winning_arm=winning_arm_abs))

pivot_plot(P);

Selection through a black box

A three-stage trial

Let’s start with 10, take winning 3 for followup and a final winner for the last stage.

stage1_samples = [np.random.standard_normal(20) + 1 for _ in range(5)]
def stage1(*samples):
    return np.argsort([np.mean(sample) for sample in samples])[-3:]
stage1_winners = stage1(*stage1_samples)
stage1_winners

## array([2, 0, 4])

stage2_samples = [np.hstack([np.random.standard_normal(20) + 1, 
                             stage1_samples[i]]) for i in stage1_winners]
def stage2(*samples):
    return np.argmax([np.mean(sample) for sample in samples])
stage2_winner = stage2(*samples)
overall_winner = stage1_winners[stage2_winner]
stage3_sample = np.hstack([stage2_samples[stage2_winner],
                           np.random.standard_normal(20) + 1])
overall_winner

## 2

Generating learning data

A little more complex to write, but conceptually straightforward – fix all samples but overall_winner.
Bootstrap overall_winner full sample, sampling appropriately for its stage1 and stage2 data.

def generate_learning_multi(stage1_samples,
                            stage1_winners,
                            stage2_samples,
                            stage2_winner,
                            overall_winner,
                            stage3_sample,
                            frac=0.7,
                            B=5000):
    stage2_samples_cp = copy(stage2_samples)
    stage1_samples_cp = copy(stage1_samples)
    
    Y, T = [], []
    
    for _ in range(B):
        overall_winner_star = np.random.choice(stage3_sample, int(frac * stage3_sample.shape[0]),
                                               replace=True)
        stage2_samples_cp[stage2_winner] = np.random.choice(overall_winner_star,
                                                            stage2_samples[stage2_winner].shape[0],
                                                            replace=False)
        stage1_samples_cp[overall_winner] = np.random.choice(stage2_samples_cp[stage2_winner],
                                                             stage1_samples_cp[overall_winner].shape[0],
                                                             replace=False)
        
        stage1_winners_star = stage1(*stage1_samples_cp)
        if overall_winner in stage1_winners_star:
            stage2_winner_star = stage2(*stage2_samples_cp)
            overall_winner_ = stage1_winners_star[stage2_winner_star]
            y = overall_winner_ == overall_winner
        else:
            y = 0
        Y.append(y)
        T.append(overall_winner_star.mean())
    return np.array(Y), np.array(T).reshape((-1,1))
      
Y, T = generate_learning_multi(stage1_samples,
                               stage1_winners,
                               stage2_samples,
                               stage2_winner,
                               overall_winner,
                               stage3_sample)

clf = GradientBoostingClassifier(random_state=0)
clf.fit(T, Y);
pi_hat = clf4.predict_proba(T)
plt.scatter(T[:,0], pi_hat[:,1])

## <matplotlib.collections.PathCollection object at 0x1c2db09e50>

def simulate_multi(stage1_size=[20]*10,
                   stage2_size=[20]*3,
                   stage3_size=20,
                   fracs=[1,0.7,0.8],
                   truth=[1]*10):
    
    # run our trial
    
    stage1_samples = [np.random.standard_normal(size) + truth[i] for i, size in enumerate(stage1_size)]
    stage1_winners = stage1(*stage1_samples)
    
    stage2_samples = [np.hstack([np.random.standard_normal(size) + truth[i], 
                      stage1_samples[i]]) for i, size in zip(stage1_winners, stage2_size)]
    stage2_winner = stage2(*samples)
    overall_winner = stage1_winners[stage2_winner]
    stage3_sample = np.hstack([stage2_samples[stage2_winner],
                               np.random.standard_normal(stage3_size) + truth[overall_winner]])
    
    # generate our learning data
    
    Y, T = [], []
    for frac in fracs:
        y, t = generate_learning_multi(stage1_samples,
                                       stage1_winners,
                                       stage2_samples,
                                       stage2_winner,
                                       overall_winner,
                                       stage3_sample,
                                       frac=frac)
        Y.append(y); T.append(t)
    Y = np.hstack(Y)
    T = np.vstack(T)
    
    # learn the selection probability
    
    clf = GradientBoostingClassifier(random_state=0)
    clf.fit(T, Y)
    
    # use selection probability for inference
    
    pi_hat = lambda t: clf.predict_proba(t.reshape((-1,1)))[:,1]
    return pvalue(stage3_sample.mean(), 
                  stage3_sample.std() / np.sqrt(stage3_sample.shape[0]),
                  pi_hat,
                  truth=truth[overall_winner])

simulate_multi()

## 0.9980666296040841

P = []
for _ in range(50):
    P.append(simulate_multi())

pivot_plot(P);

Selection through a black box

Is this a black box?

The sampling here required knowing the structure of the trial, though not the decision functions stage1 and stage2.
This sampling is needed to maintain proper variance of implicit randomization in the different arms…

Selection through a black box

Regression

Suppose we are working in a regression model \[ {\cal M} = \left\{Y|X \sim N(X\beta, \sigma^2 I): \beta \in \mathbb{R}^p, \sigma^2 > 0 \right\} \] and our selection \[ \hat{\cal Q} = \hat{\cal Q}(X^TY) \] returns a selected set $\hat{E}$ of features.
Target is \[ \beta_j = \beta_{j|\{1,\dots,p\}} \]
The one-at-a-time approach for some $j$ requires rerunning $\hat{\cal Q}$ on \[ (X^TY) + (T_b - X_j^TY) \cdot e_j \] with \[ Y_b = \hat{\cal Q}((X^TY) + (T_b - X_j^TY) \cdot e_j) \ni j \]
A many-at-a-time approach for some $E$ requires rerunning $\hat{\cal Q}$ on \[ (X^TY) + I[,E] (T_b - X_E^TY) \] with \[ Y^j_b = \hat{\cal Q}((X^TY) + (X^TY) + I[,E] (T_b - X_E^TY)) \ni j, \qquad j \in E \]
Results in $E$ learning problems yielding $\hat{\pi}_j, j \in E$.
Reference density for $H_0:\beta_j=\beta_{j,0}$ expressible as usual density modified by (something like) \[ t \mapsto \hat{\pi}_j(t, (X_i^TY)_{i \in E \setminus \{j\}}). \]
Can really take any distribution for $T$’s on $\mathbb{R}^E$
Optimal choice could be cast as active learning problem.

Selection through a black box

Method of Liu et al. (2019) does not correct for effect of cross-validation…

Selection through a black box

Covariance / precision

Another common task is modelling covariance (or precision) with e.g. graphical LASSO $$ (S) = _{} (S) - (())
||_{*,1} \[ with \] S=X^T(I-11T )X $$ a sample covariance matrix.
Penalty usually doesn’t charge the diagonal.
Alternatively, imagine we have some other algorithm to select edges $(i,j)$ and we want to test $\Theta_{i,j}=0$ in model \[ {\cal M} = \left\{\text{Wishart}(\Theta^{-1},n-1): \Theta^{-1} \geq 0 \right\}. \]
One-at-a-time reruns algorithm $\hat{\cal Q}$ on \[ S^*_b = S + (T^*_b + S_{ij}) E_{ij} \] (i.e. ${\cal N} = (S_{kl})_{(k,l) \neq (i,j)}$ accounting for symmetry as well) and \[ Y_b = \hat{\cal Q}(S^*_b) \ni (i,j) \]
We learn \[ \hat{\pi}(t, {\cal N}) = P(Y=1|T=t) \]
Reference density is law \[ S_{ij} | S_{kl}, (k,l) \neq (i,j) \] modified by learned $\hat{\pi}$. This is well-defined only when $S > 0$!
Can also use a many-at-a-time resampling scheme.

Exercise

For $n=100, p=20$ and $\Theta=I$, try the above algorithm to construct hypothesis tests for edges selected by the graphical LASSO at a fixed value of the regularization parameter.
Describe how you can use a learnt $\hat{\pi}$ function to construct confidence intervals for a selected element of $\Theta$. Try this out on the graphical LASSO above.

Selection through a black box

web.stanford.edu/class/stats364/

Selection through a black box

General selection mechanisms

Selection through a black box

By now we should now that…

Selection through a black box

What do we really need to do this?

A simple idea

How to draw samples?

Selection through a black box

A simple example: drop the losers

Selection through a black box

A simple example: drop the losers

Fit the probability learner

Selection through a black box

A simple example: drop the losers

Bootstrap may not capture full range

Exercise

Selection through a black box

How to vary range of \(\bar{D}_{i}\) for \(i=i^*\)?

Selection through a black box

Can we just bootstrap all the data?

Selection through a black box

Does it depend heavily on our selection?

Selection through a black box

A three-stage trial

Generating learning data

Selection through a black box

Is this a black box?

Selection through a black box

Regression

Selection through a black box

Selection through a black box

Selection through a black box

Covariance / precision

Exercise