Brief notes on readings #3

web.stanford.edu/class/stats364/

Jonathan Taylor

Spring 2020

From Monday’s lecture

Exercise:

library(MASS)
data(Boston)
X = model.matrix(lm(medv ~ . - 1, data=Boston))
Y = Boston[,14]

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Problem

library(lars)
## Loaded lars 1.2
L = lars(X, Y)
plot(L)

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

X = matrix(rnorm(50000), 500, 100)
Y = rnorm(500)
L = lars(X, Y)
plot(L)

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

What about when correlation is present?

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

What about when correlation is present?

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

A simple manipulation

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Where does exponential come in?

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Later time points

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Why did we condition on \(M(j_0, s_0)\)?

Exercises

  1. Assuming \(X^TX=I\) and \(p=100\), \(\sigma^2=1\) try plugging in \(\widehat{\beta}\) to calibrate the law of \[ s^*X_{j^*}^TY | (j^*,s^*)=(j_0, s_0) \] to test e.g. \(H_{j^*,0}\) or \(H_{j^*,\delta}\) for various true values of \(\beta\) (including 0). Does this work well? What do you plugin for the value of \(\beta_{j^*}\) under \(H_{j^*,\delta}\)?

  2. Suppose \((j^*,s^*)=(1,+1)\) and \(\beta=(\delta, 1, \dots, 1)\) (unbeknownst to you of course). Try plugging in beta[2:p]=0 and testing \(H_{0,j^*}:\delta=1\) without conditioning on \(M(1,+1)\) but simulating its distribution (again keep \(p=100\)). Do you expect this to work?

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

simulate = function(n=1000, p=20, plot=FALSE, s=0, k=1) {
    X = matrix(rnorm(n*p), n, p)
    X = scale(X, TRUE, TRUE)
    beta = rep(0, p)
    if (s > 0) {
        beta[1:s] = (c(1:s)+4) / sqrt(n)
    }
    Y = rnorm(n) + X %*% beta
    L = lars(X, Y)
    
    if (plot) {
        plot(L)
    }
    L = L$lambda
    return(list(L1=L[k], 
                L2=L[k+1], 
                E=(L[k]-L[k+1])*L[k], 
                P=pnorm(L[k], lower.tail=FALSE)/pnorm(L[k+1], lower.tail=FALSE),
                B=min(2 * (p - k + 1) * pnorm(L[k], lower.tail=FALSE), 1)))
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

simulate(plot=TRUE)

## $L1
## [1] 1.873002
## 
## $L2
## [1] 1.68958
## 
## $E
## [1] 0.3435503
## 
## $P
## [1] 0.6702797
## 
## $B
## [1] 1

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

simulate(s=3, plot=TRUE)

## $L1
## [1] 8.105509
## 
## $L2
## [1] 6.262178
## 
## $E
## [1] 14.94114
## 
## $P
## [1] 1.383557e-06
## 
## $B
## [1] 1.050501e-14

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 1 – null

D = data.frame(simulate())
for (i in 1:500) {
    result = simulate()
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 1 – 3 strong signals

D = data.frame(simulate(s=3))
for (i in 1:500) {
    result = simulate(s=3)
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 2 – 3 strong signals

D = data.frame(simulate(s=3, k=2))
for (i in 1:500) {
    result = simulate(s=3, k=2)
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 3 – 3 strong signals

D = data.frame(simulate(s=3, k=3))
for (i in 1:500) {
    result = simulate(s=3, k=3)
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 4 – 3 strong signals

D = data.frame(simulate(s=3, k=4))
for (i in 1:500) {
    result = simulate(s=3, k=4)
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

Step 5 – 3 strong signals

D = data.frame(simulate(s=3,k=5))
for (i in 1:500) {
    result = simulate(s=3,k=5)
    D = rbind(D, result)
}

Lockhart, Taylor, Tibshirani & Tibshirani (2014)

library(selectiveInference)
## Loading required package: glmnet
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-18
## Loading required package: intervals
## 
## Attaching package: 'intervals'
## The following object is masked from 'package:Matrix':
## 
##     expand
## Loading required package: survival
## Loading required package: adaptMCMC
## Loading required package: parallel
## Loading required package: coda
Y = Boston[,14]
X = model.matrix(lm(medv ~ . - 1, data=Boston))
L = lar(X, Y)
larInf(L)
## 
## Call:
## larInf(obj = L)
## 
## Standard deviation of noise (specified or estimated) sigma = 4.745
## 
## Sequential testing results with alpha = 0.100
## 
##  Step Var    Coef Z-score P-value LowConfPt UpConfPt Spacing CovTest
##     1  13  -0.950 -32.129   0.000    -0.999   -0.901   0.000   0.000
##     2   6   5.095  13.383   0.000     4.463    5.791   0.000   0.000
##     3  11  -0.931  -8.718   0.000    -1.108   -0.746   0.000   0.000
##     4  12   0.010   4.160   0.042     0.001    0.018   0.042   0.037
##     5   4   3.320   3.955   0.012     1.071    6.175   0.012   0.008
##     6   1  -0.037  -1.272   0.617    -0.172    0.377   0.617   0.766
##     7   8  -0.597  -4.937   0.144    -1.911    0.406   0.018   0.015
##     8   5 -16.223  -5.111   0.270   -35.591   23.349   0.001   0.000
##     9   2   0.042   3.189   0.150    -0.029    0.114   0.041   0.032
##    10   3  -0.046  -0.837   0.800    -0.475    2.814   0.856   0.953
##    11   9   0.135   3.310   0.418    -0.571    0.614   0.001   0.175
##    12  10  -0.012  -3.280   0.187    -0.063    0.019   0.001   0.000
##    13   7   0.001   0.052   0.466    -0.735    0.854   0.956   0.997
##  LowTailArea UpTailArea
##        0.049      0.049
##        0.048      0.049
##        0.049      0.049
##        0.049      0.050
##        0.050      0.050
##        0.050      0.050
##        0.050      0.000
##        0.050      0.050
##        0.050      0.050
##        0.050      0.050
##        0.050      0.050
##        0.050      0.000
##        0.050      0.050
## 
## Estimated stopping point from ForwardStop rule = 5

Zhong and Prentice (2008)

Zhong and Prentice (2008)

Zhong and Prentice (2008)

Exercise:

For various values of \(c\), plot the 95% confidence intervals of Zhong and Prentice with the data \(Z\) on the \(x\)-axis. Repeat the procedure with a single half-open interval for trunction region: \([c,\infty)\). What is the most striking thing about these intervals? Based on the intervals, compare the power of one-sided tests of \(H_0:\mu=0\) using the single half-open interval to the power of a two-sided test with the truncation region \(R_c\).