Assignment 3

You may discuss homework problems with other students, but you have to prepare the written assignments yourself.

Please combine all your answers, the computer code and the figures into one PDF file submitting it to gradescope.

Grading scheme: 10 points per numbered problem, 20 for remaining problems.

Due date: February 18, 2022, 11:59PM.

Questions from Agresti

  • 8.8

  • 8.10

  • 8.17

  • 8.23

  • 9.7

  • 9.8

  • 9.20

  • 9.25

  • 10.19

Logistic regression

For this problem use the zip.train and zip.test data sets that can be found in the ElemStatLearn package or here

  1. Extract the 6’s and the 8’s from both zip.train and zip.test.

  2. Fit a (logistic) LASSO and a ridge path predicting the class of the digits in your 6 vs. 8 data set. Finally, fit an elastic Net path with \(\alpha=0.5\).

  3. Using the values of lambda.1se use the zip.test to evaluate which method is best for this prediction problem in terms of classification accuracy.

A deeper dive on LASSO for least squares

In class we saw that for the LASSO problem

\[ \text{minimize}_{\beta} \frac{1}{2} \|Y-X\beta\|^2_2 + \lambda \|\beta\|_1 \]

the KKT conditions were

\[ X'(Y-X\hat{\beta}) = \lambda \hat{u} \]

where \(\hat{u} \in \partial (\|\cdot\|_1)(\hat{\beta})\).

To make life / notation a little easier we’ve assumed the intercept is 0.

  1. We wrote out these conditions when the selected variables were \(E\) with signs were \(s_E\). The conditions split up into two blocks, an active block and an inactive block. Explain these conditions in your own words.

  2. Below, we use glmnet (which modifies the squared error by dividing by \(n\)) to solve the lasso on the prostate data from library(ElemStatLearn). Use the data (X,Y) to verify that beta.hat below satisfies the KKT conditions. (You will in the process construct a vector u.hat.) What should the u.hat vector look like?

  3. Keeping the same variables and signs, describe how to construct responses \(Y\) that have the same variables selected and signs. Produce a response vector \(Y\) whose beta.hat has the same sparsity pattern for s=0.17 but has \(\hat{\beta}_{lcavol}=0.6\), \(\hat{\beta}_{lweight}=0.15\) and \(\hat{\beta}_{svi}=0.20\). Use glmnet to verify that with your response will yield such a solution. How would you ensure that your response vector \(Y\) had a specified value for \(\hat{u}_{-E}\)?

library(ElemStatLearn)
data(prostate)
library(glmnet)
X = model.matrix(lm(lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason + pgg45, data=prostate))
X = scale(X, TRUE, TRUE)[, 2:ncol(X)]
Y = as.numeric(prostate$lpsa - mean(prostate$lpsa))
G = glmnet(X, Y, intercept=FALSE, standardize=FALSE)
beta.hat =  coef(G, s=0.17, exact=TRUE, x=X, y=Y)

Bonus (+20pts): a deeper dive LASSO for logistic regression

For logistic regression, the squared error loss is replaced with the logistic negative log-likelihood. The problem is

\[ \text{minimize}_{\beta} - \log L(\beta|Y,X) + \lambda \|\beta\|_1 \]

with KKT conditions

\[ X'(Y - \pi(\hat{\beta}))= \lambda \hat{u} \]

where \(\hat{u} \in \partial (\|\cdot\|_1)(\hat{\beta})\).

We fit this model to the SAheart data:

data(SAheart)
X = model.matrix(glm(chd ~ ., data=SAheart, family=binomial))[,-1]
X = scale(X, TRUE, TRUE)
Y = SAheart$chd
cvG = cv.glmnet(x=X,
                y=Y,
                intercept=FALSE,
		standardize=FALSE,
		family="binomial")
G = glmnet(x=X, y=Y, family="binomial")
beta.hat =  coef(G,
                 s=0.043,
		 exact=TRUE,
                 x=X, y=Y,
		 intercept=FALSE,
		 standardize=FALSE)
E = which(beta.hat != 0)

To make life / notation a little easier we’ve assumed the intercept is 0.

  1. The LASSO has selected variables E above for a the value of \(\lambda=0.043\) (roughly lambda.1se). What do we know about the \(E\) coordinates of \(\nabla (-\log L(\hat{\beta}|X,Y))\)? (Hint: consider the active block of the KKT conditions for the LASSO)

  2. If we had started with a model the variables E, we could have fit a model M=glm(Y ~ X[,E], family=binomial) using e.g. Newton-Raphson. Define \(\bar{\beta}_E\) to be the estimator formed by taking one Newton-Raphson step starting from \(\hat{\beta}[E]\). Assuming E had been fixed in advance and \(\hat{\beta}_E\) was already close to the MLE of the model M, roughly what distribution would \(\bar{\beta}_E\) have? (Again, assume \(E\) was fixed in advance here)

  3. The active block of the KKT conditions when the LASSO selects variables \(E\) with signs \(s_E\) put a constraint on \(\bar{\beta}_E\). Ignoring the inactive conditions how might you model the distribution of \(\bar{\beta}_E\) conditional on the LASSO having selected variables \(E\) with signs \(s_E\)? Use this conditional distribution to describe a test of \(H_0:\beta_E=0\) in model M.

  4. Carry out your test for the variables selected by SAheart.