Drop the losers

web.stanford.edu/class/stats364/

Jonathan Taylor

Spring 2020

Setup

Conditioning on the winner

Conditioning on the winner

Conditioning on the winner

Conditioning on the winner

Inference in drop-the-losers: nuisance parameters

Inference in drop-the-losers: nuisance parameters and asymptotics

Inference in drop-the-losers: nuisance parameters and asymptotics

Possible approaches

Null model

Possible approaches

Plugin score test

Possible approaches

Bootstrap

Possible approaches

Centered bootstrap

Data generating mechanism

winning_idx, K, noise_sd = 0, 10, 1
truth = np.zeros(K)

def draw_one(truth,
             noise_sd,
             winning_idx):
    while True:
        Z_star = (np.random.standard_normal(truth.shape) * np.sqrt(1 + noise_sd**2) 
                  + truth)
        if Z_star[winning_idx] == np.max(Z_star):
            break
    W = 1 / noise_sd * np.random.standard_normal() + truth[winning_idx]
    Z_star[winning_idx] = (Z_star[winning_idx] + noise_sd**2 * W) / (1 + noise_sd**2)
    return Z_star

Different ways of handling nuisance parameters

def null(Z, gamma, winning_idx):
    mean_param = np.zeros_like(Z)
    mean_param[winning_idx] = gamma
    return mean_param

def score(Z, gamma, winning_idx):
    mean_param = Z.copy()
    mean_param[winning_idx] = gamma
    return mean_param

def bootstrap(Z, gamma, winning_idx):
    return Z.copy()

Reference distribution by simulation

def reference_full(gamma,
                   observed,
                   winning_idx=0,
                   nuisance=null,
                   noise_sd=1,
                   ndraw=2000):
    
    reference_mean = nuisance(observed, gamma, winning_idx)

    reference = []
    for _ in range(ndraw):
        reference.append(draw_one(reference_mean,
                                  noise_sd,
                                  winning_idx))
    reference = np.array(reference)
    
    pvalue = np.mean(reference[:, winning_idx] < observed[winning_idx])
    return observed[winning_idx], reference[:, winning_idx], pvalue

Null model for nuisance parameters

Works well when null is correct

Null model for nuisance parameters

Not so good when null is not true

Score test for nuisance parameters

Works reasonably well here

Testing via bootstrap quantile intervals

Not a good idea…

## proportion covering truth: 0.07

Using mean-corrected bootstrap

Not pivotal (c.f. Leeb and Potscher)

Using mean-corrected bootstrap

Not pivotal (c.f. Leeb and Potscher)

Score test with larger \(K\)

Not pivotal

Score test with \(K\) a little larger and \(\tau=0.5\)

Not pivotal

Null with \(K\) a little larger and \(\tau=0.5\) (and null is true)

Looks pretty good (when null is true)

An unachievable oracle

K, noise_sd = 20, 0.5
truth = np.ones(K) * 0
oracle = lambda Z, gamma, winning_idx: truth # an oracle for "nuisance"

Looks pretty good (as it should)

A valid approach (Sampson and Sill)

A valid approach (Sampson and Sill)

def reference_conditional(gamma,
                          observed,
                          winning_idx=0,
                          noise_sd=1,
                          ndraw=20000):
    
    random_threshold = max([observed[j] for j in range(observed.shape[0]) 
                            if j != winning_idx])

    Zsample = np.random.standard_normal(ndraw) + gamma
    W = normal_dbn.sf((random_threshold - Zsample) / noise_sd)
    F = discrete_family(Zsample, W)
    pvalue = F.cdf(0, observed[winning_idx])
    return pvalue

Exact inference even when null is not true

Looks pretty good

Exact inference even when null is not true

Looks pretty good

Inference in drop-the-losers: nuisance parameters and asymptotics

Inference in drop-the-losers: some plugins are OK

Inference in drop-the-losers: some plugins are OK

Exercise

  1. In the data splitting example (with only one parameter) we noted that there existed a normally distributed unbiased estimator of \(\mu\) (at least when \(\tau > 0\)) in model \({\cal M}^*\)? What is the analogous estimator of \(\mu_i\) in model \({\cal M}_i^*\)?

  2. Write a function that computes the 95% CI for \(\mu_i\) in model \({\cal M}_i^*\). Compare their length to the 95% CI for \(\mu_i\) based on the unbiased estimator above for various values of \(\mu\), \(\tau=1\) and \(K=20\).

  3. The CI based on the unbiased estimator has the advantage that it’s easy to compute. Suppose now that you make your living by constructing 95% CIs \([L, U]\) in drop-the-losers trials where your payout for a given trial is \[ 1_{[L, U]}(\mu_{w^*}) - \lambda \cdot (U - L). \] That is, you are rewarded if the constructed interval covers its target, but you must pay for each unit in length. What CI would you use? For a few different values of \(\mu\) and \(\lambda\) compare your payout using the full conditional drop-the-losers interval and the data splitting interval.

Exercise

We argued above that plugin estimates of variance can be OK under some conditions. Are there any linear functions of \(\mu\) that are safe to use plugin estimates in drop the losers? Are there additional assumptions you might make under which more or perhaps many linear functions of \(\mu\) are safe to use in using a method like reference_full above?

Exercise: meta analysis

Suppose we have run \(M\) different drop-the-loser studies in which hydroxychloroquine is one of the candidates for treatment of COVID-19 with different other treatments across the \(M\) studies. Of these \(m < M\) show hydroxychloroquine to be the most effective treatment. Confident in the efficacy of hydroxychloroquine, you want to pool these \(m\) studies to improve your interval or point estimate for the effect of hydroxychloroquine.

  1. For \(m\) different studies in which hydroxychloroquine is the best treatment, describe a model of all corresponding treatment effects under which you can carry out exact inference as in the drop-the-losers design. Do you trust the resulting confidence interval based on this model? (Added recently: Does it matter that hydroxychloroquine did not win the other \(M-m\) trials if our goal is valid conditional inference over all COVID-19 trial run?)

  2. A colleague suggests you might also use the \(M-m\) studies in which hydroxychloroquine was not the winner. What do you think of this suggestion? Can it help you in your task of valid inference for the hydroxychloroquine effect?

  3. Going back to the original drop-the-losers problem suppose that remdisivir as well as hydroxychloroquine. Suppose you ran a trial in which you committed to drawing more samples from the hydroxychloroquine arm as well as whatever the winning arm was. How would you construct valid confidence intervals for the effect of hydroxychloroquine conditional on your trial producing remdisivir as its winner?