Assignment 3¶
You may discuss homework problems with other students, but you have to prepare the written assignments yourself.
Please combine all your answers, the computer code and the figures into one PDF file submitting it to gradescope.
Grading scheme: 10 points per numbered problem, 20 for remaining problems.
Due date: February 18, 2022, 11:59PM.
Questions from Agresti¶
8.8
8.10
8.17
8.23
9.7
9.8
9.20
9.25
10.19
Logistic regression¶
For this problem use the zip.train and zip.test data sets that can be found in the ElemStatLearn package or here
Extract the 6’s and the 8’s from both
zip.trainandzip.test.Fit a (logistic) LASSO and a ridge path predicting the class of the digits in your 6 vs. 8 data set. Finally, fit an elastic Net path with \(\alpha=0.5\).
Using the values of
lambda.1seuse thezip.testto evaluate which method is best for this prediction problem in terms of classification accuracy.
A deeper dive on LASSO for least squares¶
In class we saw that for the LASSO problem
the KKT conditions were
where \(\hat{u} \in \partial (\|\cdot\|_1)(\hat{\beta})\).
To make life / notation a little easier we’ve assumed the intercept is 0.
We wrote out these conditions when the selected variables were \(E\) with signs were \(s_E\). The conditions split up into two blocks, an active block and an inactive block. Explain these conditions in your own words.
Below, we use
glmnet(which modifies the squared error by dividing by \(n\)) to solve the lasso on theprostatedata fromlibrary(ElemStatLearn). Use the data(X,Y)to verify thatbeta.hatbelow satisfies the KKT conditions. (You will in the process construct a vectoru.hat.) What should theu.hatvector look like?Keeping the same variables and signs, describe how to construct responses \(Y\) that have the same variables selected and signs. Produce a response vector \(Y\) whose
beta.hathas the same sparsity pattern fors=0.17but has \(\hat{\beta}_{lcavol}=0.6\), \(\hat{\beta}_{lweight}=0.15\) and \(\hat{\beta}_{svi}=0.20\). Useglmnetto verify that with your response will yield such a solution. How would you ensure that your response vector \(Y\) had a specified value for \(\hat{u}_{-E}\)?
library(ElemStatLearn)
data(prostate)
library(glmnet)
X = model.matrix(lm(lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason + pgg45, data=prostate))
X = scale(X, TRUE, TRUE)[, 2:ncol(X)]
Y = as.numeric(prostate$lpsa - mean(prostate$lpsa))
G = glmnet(X, Y, intercept=FALSE, standardize=FALSE)
beta.hat = coef(G, s=0.17, exact=TRUE, x=X, y=Y)
Bonus (+20pts): a deeper dive LASSO for logistic regression¶
For logistic regression, the squared error loss is replaced with the logistic negative log-likelihood. The problem is
with KKT conditions
where \(\hat{u} \in \partial (\|\cdot\|_1)(\hat{\beta})\).
We fit this model to the SAheart data:
data(SAheart)
X = model.matrix(glm(chd ~ ., data=SAheart, family=binomial))[,-1]
X = scale(X, TRUE, TRUE)
Y = SAheart$chd
cvG = cv.glmnet(x=X,
y=Y,
intercept=FALSE,
standardize=FALSE,
family="binomial")
G = glmnet(x=X, y=Y, family="binomial")
beta.hat = coef(G,
s=0.043,
exact=TRUE,
x=X, y=Y,
intercept=FALSE,
standardize=FALSE)
E = which(beta.hat != 0)
To make life / notation a little easier we’ve assumed the intercept is 0.
The LASSO has selected variables
Eabove for a the value of \(\lambda=0.043\) (roughlylambda.1se). What do we know about the \(E\) coordinates of \(\nabla (-\log L(\hat{\beta}|X,Y))\)? (Hint: consider the active block of the KKT conditions for the LASSO)If we had started with a model the variables
E, we could have fit a modelM=glm(Y ~ X[,E], family=binomial)using e.g. Newton-Raphson. Define \(\bar{\beta}_E\) to be the estimator formed by taking one Newton-Raphson step starting from \(\hat{\beta}[E]\). AssumingEhad been fixed in advance and \(\hat{\beta}_E\) was already close to the MLE of the modelM, roughly what distribution would \(\bar{\beta}_E\) have? (Again, assume \(E\) was fixed in advance here)The active block of the KKT conditions when the LASSO selects variables \(E\) with signs \(s_E\) put a constraint on \(\bar{\beta}_E\). Ignoring the inactive conditions how might you model the distribution of \(\bar{\beta}_E\) conditional on the LASSO having selected variables \(E\) with signs \(s_E\)? Use this conditional distribution to describe a test of \(H_0:\beta_E=0\) in model
M.Carry out your test for the variables selected by
SAheart.