Brief notes on readings #5

web.stanford.edu/class/stats364/

Jonathan Taylor

Spring 2020

Fithian, Sun, T. (2014)

General conditional approach

Fithian, Sun, T. (2014)

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\)

Fithian, Sun, T. (2014)

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\): selected models

Fithian, Sun, T. (2014)

Examples of \({\cal Q}\): nonparametric models

Fithian, Sun, T. (2014)

Is it OK to use a selected model?

set.seed(0)
X = matrix(rnorm(500), 100, 5)
Y = rnorm(100)
M = lm(Y ~ X, subset=1:50)
E = (abs(coef(summary(M))[,3]) > 1.5)[-1]
print(E)
##    X1    X2    X3    X4    X5 
##  TRUE FALSE FALSE FALSE FALSE
summary(lm(Y ~ X[,E], subset=51:100))
## 
## Call:
## lm(formula = Y ~ X[, E], subset = 51:100)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5089 -0.6060 -0.1135  0.6509  1.8467 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.12880    0.12253  -1.051    0.298
## X[, E]      -0.07858    0.14438  -0.544    0.589
## 
## Residual standard error: 0.8661 on 48 degrees of freedom
## Multiple R-squared:  0.006134,   Adjusted R-squared:  -0.01457 
## F-statistic: 0.2962 on 1 and 48 DF,  p-value: 0.5888

Fithian, Sun, T. (2014)

What to condition on?

Fithian, Sun, T. (2014)

Marginalization

Fithian, Sun, T. (2014)

Exponential familes are preserved

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Removing dependence on nuisance parameters

Proof (wlog assume \(\text{rank}(\Sigma_q)=p, \text{rank}(L_q'\Sigma_q L_q)=k\))

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance)

Generalization of polyhedral lemma

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on \(\eta\)

Fithian, Sun, T. (2014)

Special case: Gaussian (known covariance) linear constraints on \(\eta\)

Generalization of polyhedral lemma

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

Does this matter?

Fithian, Sun, T. (2014)

A simulation example

p = 100 # number of features; orthogonal design

def bonferroni_sample(p, alpha=0.05, B=10000, steps=1):
    # At step j, Bonferroni will use distribution
    # of largest p-j+1 absolute Z N(0,1)'s
    Z = np.empty((B, steps))
    for j in range(steps):
        for i in range(B):
            Z[i,j] = np.fabs(np.random.standard_normal(p - j)).max()
    return Z

bonf_ref = bonferroni_sample(p, steps=20)

Fithian, Sun, T. (2014)

A simulation example


def tnorm(lower, upper, p, B=10000):
    sample = np.zeros(B)
    U = np.random.sample((B, p))
    sample = normal_dbn.ppf(normal_dbn.cdf(lower) + 
                            U * (normal_dbn.cdf(upper) - 
                                 normal_dbn.cdf(lower)))
    return np.fabs(sample).max(1)

Fithian, Sun, T. (2014)

A simulation example


def simulate(truth, B=10000, steps=1, bonf_ref=None):
    # Compute simultaneous test and conditional test
    # of goodness of fit H_0:after step k everything 
    # is mean 0
    
    p = truth.shape[0]
    
    if bonf_ref is None:
        bonf_ref = bonferroni_reference(p)
    
    Z = np.random.standard_normal(truth.shape) + truth
    orderZ = np.argsort(-np.fabs(Z))
    sortZ = -np.sort(-np.fabs(Z))
    GOF = []
    bonferroni = []
    covTest = []
    bound = np.inf
    for i in range(steps):
        if i >= 1:
            bound = sortZ[i-1]

        GOF_ref = tnorm(-bound, bound, p - i - 1)
        GOF.append(np.mean(GOF_ref >= sortZ[i]))
        bonferroni.append(np.mean(bonf_ref[:,i] >= sortZ[i]))
        covTest.append(normal_dbn.sf(sortZ[i]) / normal_dbn.sf(sortZ[i+1]))
        
    return pd.DataFrame({'step':np.arange(steps)+1,
                         'feature':orderZ[:steps], 
                         'GOF': GOF,
                         'Bonferroni':bonferroni,
                         'covTest':covTest})
null = np.zeros(p)
simulate(np.zeros(p), bonf_ref=bonf_ref, steps=3)
##    step  feature     GOF  Bonferroni   covTest
## 0     1       52  0.1816      0.1768  0.209240
## 1     2       64  0.5286      0.6057  0.705575
## 2     3       88  0.3259      0.7391  0.779091

Fithian, Sun, T. (2014)

First step under null

Fithian, Sun, T. (2014)

Second step under null

Fithian, Sun, T. (2014)

Third step under null

Fithian, Sun, T. (2014)

Fourth step under null

Fithian, Sun, T. (2014)

Adding some signal

Fithian, Sun, T. (2014)

First step some signal

Fithian, Sun, T. (2014)

Second step some signal

Fithian, Sun, T. (2014)

Third step some signal

Fithian, Sun, T. (2014)

Fourth step some signal

Fithian, Sun, T. (2014)

After having discovered truth: step 7

Fithian, Sun, T. (2014)

After having discovered truth: step 8

Fithian, Sun, T. (2014)

After having discovered truth: step 9

Fithian, Sun, T. (2014)

After having discovered truth: step 10

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general \(X\) case?

Forward stepwise

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general \(X\) case?

Should we stop after finding \(E_k\)?

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general \(X\) case?

Fithian, T., Tibshirani^2 (2015)

Selective sequential model selection: what about general \(X\) case?

Fithian, T., Tibshirani^2 (2015)

Stopping rules

Sequential FWER

\[ P_{\beta}(\hat{k}_0(p_1, \dots, p_p) > k_0(\beta, i^*_1,\dots, i^*_p)) \leq \alpha. \]

Sequential FDR?

Fithian, Sun, T. (2014)

Binomial drop the losers

Fithian, Sun, T. (2014)

Binomial drop the losers