Assignment 4¶
You may discuss homework problems with other students, but you have to prepare the written assignments yourself.
Please combine all your answers, the computer code and the figures into one PDF file submitting it to gradescope.
Grading scheme: 10 points per numbered problem, 20 for remaining problems.
Due date: March 10, 2022, 11:59PM.
Questions from Agresti¶
11.3
11.9
11.11
11.26
11.32
11.34
Modelling time to staphyllococcus infection after burn¶
For this problem use the burn data set found in the KMsurv package available on CRAN.
Consider the time
T3(the time to staphylocous aureaus infection or on study time) with censoring variableD3(1 if patient did get an infection, 0 otherwise). Plot Kaplan-Meier survival curves stratifying by treatment variableZ1(0:routine bathing, 1:Body cleansing) along with 95% pointwise confidence intervals. Include a legend.Use a log-rank test to test the null hypothesis of no difference in time to infection between treatment groups (
Z1).Fit a Cox model with just treatment variable
Z1. Compare the score test to your results in part 2.Fit a Cox model with all
Zvariables in the dataset (ignoringT1, D1, T2, D2). Make sure that all categorical variables considered are properly treated as factros.Use
glmnetto fit a LASSO using the design matrix from your model in 4 choosing \(\lambda\) bycv.glmnetwith partial likelihood as the objective. Which variables are selected atlambda.min?Repeat 5. using \(C\)-index as criterion.
One of the categorical variables is a factor with 4 levels. What penalty might you use rather than a LASSO to automatically select categorical variables in a regression?
Bone marrow transplant data¶
For this problem use bmt from the KMsurv package. The time we consider is the disease
free survival time t2 with censoring indicator d3.
Plot Kaplan-Meier survival curves stratifying by variable
groupwhiich describes the type of disease each patient suffers from. Include a legend.Use a log-rank test to test the null hypothesis of no difference in disease free survival among different groups. This test is based on a generalization of the Mantel-Haenszel test to combine \(K\) different 2x2 tables. With 3 groups, the test combines \(K\) different 2x3 tables. What are the rows / columns of the tables in this context? What is \(K\)? What is the analog of the hypergeometric distribution used in the 2x2 case?
The return value of
survdiffgives has attributesexpandvar. Explain how these are used to compute the test statistic in the output ofsurvdiff.Now consider the response
ta(time to acute graft vs. host disease) with right censoring indicatorda. Of particlar interest is the effect ofMTXon survival time (covariatez10) and a potential interaction betweenz10andgroup. Use forward stepwise to select a model including this interaction. Report an estimate of the main effect of MTX, interaction effect and corresponding 95% confidence intervals. Do you expect these confidence intervals to cover their intended target?
Likelihood under right censoring¶
The observed data in survival analysis with right censoring is \((O_i,\delta_i, X_i)\) with \(O_i\) the time observed, \(\delta_i\) the indicator of whether or not the observed time corresponds to a failure and covariates \(X_i\). Each \((O,\delta,X)\) correspond to functions of \((T,C,X)\) where \(T\) is a failure time, \(C\) is a right-censoring time and \(X\) are covariates with
The censoring time should be independent of the failure time in some sense. There are at least two possible notions: i) \(C\) and \(T\) are independent; and ii) \(C\) and \(T\) are conditionally independent given \(X\). Which assumption is more natural in the context of building a “regression model” for the hazard
For some value \(o\) for the observed time, using the two notions of independence compute
You can assume the joint law \((T,C)\) has a density. Which one yields the likelihood we’ve been using for survival analysis under right censoring? Besides the notion of independence, what assumptions do you have to make about the law of the censoring times \(C\)?