Survival analysis

web.stanford.edu/class/stats305b

Jonathan Taylor

Winter 2022

Survival analysis

References

Estimands

\[ E_{F_{X_i}}(T - a | T \geq a) \]

\[ \text{median}_{F_{X_i}}(T - a | T \geq a). \]

Hazard function

\[ \begin{aligned} H([t,t+dt)) &= P_F(T < t+dt|T \geq t) \\ \end{aligned} \]

\[ \begin{aligned} H([t,t+dt)) \approx h(t) \; dt \end{aligned} \]

or

\[ \begin{aligned} h(t) &= \frac{f(t)}{S(t)} \\ &= - \frac{d}{dt} \log S(t) \end{aligned} \]

with \(S(t) = \bar{F}(t) = 1 - F(t)\)

Cumulative hazard

\[ H(t) = \int_{[0,t]} h(s) \; ds \]

\[ S(t) = \exp(-H(t)). \]

Hazard function for discrete laws

\[ h(t) = \begin{cases} \frac{P(T=t)}{\sum_{s \geq t} P(T=s)} &\leq 1 \\ 0 &= \text{if $t$ is not in support} \end{cases} \]

\[ H(t) = \sum_{j:T_j \leq t} h(T_j) \]

\[ S(t) = \exp(-H(t)). \]

\[ \tilde{H}(t) = \sum_{j:T_j \leq t} -\log(1 - h(T_j)) \]

Hazard models

Constant hazard

\[ f_T(t) = \frac{1}{\lambda} e^{-t/\lambda} 1_{[0,\infty)}(t) \]

\[ h(t) = \frac{1}{\lambda} \]

and

\[ H(t) = t / \lambda. \]

Estimating a hazard rate

Nelson-Aalen

\[ \hat{H}(t) = \sum_{j:T_j \leq t} \frac{1}{ |\left\{T_j \geq t\right\}|} = \sum_{j:T_j \leq t} \frac{1}{ |R(t)|} \]

where \(R(t) = \left\{j:T_j \geq t \right\}\) is the risk set at time \(t\).

library(survival)
events = rexp(100)
censor = rep(1, 100)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

Nelson-Aalen estimator

\[ (\hat{H}(t) - H(t))_{t \geq 0} \]

can be analyzed using martingale techniques to provide pointwise confidence intervals for \(H\).

Nelson-Aalen variance estimate

\[ \widehat{Var}(\hat{H}(t)) = \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2} \]

\[ \exp(-\hat{H}(t)) \pm 1.96 * \exp\left(-\widehat{H}(t)\right) \sqrt{\sum_{j:T_j \leq t} \frac{1}{|R(t)|^2}} \]

Where does this variance estimate come from?

\[ \widehat{H}(t) = \sum_{j=0}^{m-1} \widehat{H}(s_{j+1}) - \widehat{H}(s_j) \]

and

\[ \widehat{H}(s_{j+1}) - \widehat{H}(s_j) = \frac{\sum_{i=1}^n 1_{[s_{j+1},s_j]}(T_i)}{|R(s_j)|} \]

\[\widehat{H}(s_{j+1}) - \widehat{H}(s_j) \biggl| |R(s_j)| \sim \frac{1}{|R(s_j)|} \text{Poisson}(|R(s_j)| h(s_j) (s_{j+1}-s_j)) \]

leads to (conditional) variance above

\[ \frac{h(s_j) (s_{j+1}-s_j)}{|R(s_j)|} \]

\[ \int_{[0,t]} \frac{1}{|R(t)|} h(s) \; ds \]

\[f \mapsto \int_{[0,t]} f(s) h(s)\; ds \approx \int_{[0,t]} f(s) \hat{H}(ds) \]

\[ \int_{[0,t]} f(s) h(s)\; ds \approx \sum_{j:T_j \leq t} f(T_j) \frac{1}{|R(t_j)|} \]

\[ \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2}. \]

# a larger sample
events = rexp(1000)
censor = rep(1, 1000)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

Models

Proportional hazards

\[ h_{\beta}(t;x) = \exp(x^T\beta) \cdot h_0(t) \]

with baseline hazard rate \(h_0\).

Cox’s model

Accelerated failure time

\[ h_{\beta}(t;x) = \theta(x^T\beta) \cdot h_0(\theta(x^T\beta) t) \]

with \(h_0\) the hazard function of \(Z_i\).

\[ \log(T_i) = -\log(\theta(X_i^T\beta)) + \log Z_i \]

\[ \log(T_i) = X_i^T\beta + \log Z_i \]

summary(survreg(Surv(events, censor)~1, dist='weibull'))
## 
## Call:
## survreg(formula = Surv(events, censor) ~ 1, dist = "weibull")
##               Value Std. Error     z     p
## (Intercept)  0.0603     0.0326  1.85 0.065
## Log(scale)  -0.0194     0.0248 -0.79 0.432
## 
## Scale= 0.981 
## 
## Weibull distribution
## Loglik(model)= -1052   Loglik(intercept only)= -1052
## Number of Newton-Raphson Iterations: 6 
## n= 1000
X = runif(1000)
censor = rep(1,1000)
events = rexp(1000)^0.5 * exp(2*X-1)
M = survreg(Surv(events, censor) ~ X, dist='weibull')
summary(M)
## 
## Call:
## survreg(formula = Surv(events, censor) ~ X, dist = "weibull")
##               Value Std. Error     z      p
## (Intercept) -1.0071     0.0315 -32.0 <2e-16
## X            2.0110     0.0540  37.2 <2e-16
## Log(scale)  -0.6994     0.0247 -28.4 <2e-16
## 
## Scale= 0.497 
## 
## Weibull distribution
## Loglik(model)= -585.6   Loglik(intercept only)= -982.8
##  Chisq= 794.41 on 1 degrees of freedom, p= 8.9e-175 
## Number of Newton-Raphson Iterations: 6 
## n= 1000
M2 = survreg(Surv(events, censor) ~ X, dist='exponential')
summary(M2)
## 
## Call:
## survreg(formula = Surv(events, censor) ~ X, dist = "exponential")
##               Value Std. Error     z      p
## (Intercept) -1.1134     0.0628 -17.7 <2e-16
## X            1.9812     0.1089  18.2 <2e-16
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -874   Loglik(intercept only)= -1035.5
##  Chisq= 323.06 on 1 degrees of freedom, p= 3.1e-72 
## Number of Newton-Raphson Iterations: 4 
## n= 1000

Censoring

\[ (O_i, \delta_i, X_i) \]

with \(O_i\) the observed “time” for subject \(i\) and \(\delta_i=1\) denotes the time of interest was observed, \(\delta_i=0\) denotes this observation was right-censored.

Likelihood under censoring

\[ \left(\prod_{i:\delta_i=1} f_{\beta}(O_i;X_i) \right) \left(\prod_{i:\delta_i=0} S_{\beta}(O_i;X_i) \right) \]

\[ \prod_{i} h_{\beta}(O_i;X_i)^{\delta_i} S_{\beta}(O_i;X_i) \]

\[ f(t) = h(t) S(t). \]

Tcens = rexp(length(events)) * 1.5
observed = pmin(events, Tcens)
censor = (events <= Tcens)
M3 = survreg(Surv(observed, censor) ~ X, dist='weibull')
summary(M3)
## 
## Call:
## survreg(formula = Surv(observed, censor) ~ X, dist = "weibull")
##               Value Std. Error     z      p
## (Intercept) -1.0243     0.0370 -27.7 <2e-16
## X            2.0303     0.0723  28.1 <2e-16
## Log(scale)  -0.7227     0.0305 -23.7 <2e-16
## 
## Scale= 0.485 
## 
## Weibull distribution
## Loglik(model)= -341.6   Loglik(intercept only)= -624.1
##  Chisq= 565.11 on 1 degrees of freedom, p= 6.5e-125 
## Number of Newton-Raphson Iterations: 6 
## n= 1000

Nelson-Aalen with censoriing

\[ \hat{H}(t) = \sum_{j: \delta_j = 1, T_j \leq t} \frac{d_j}{n_j} \]

Nelson-Aalen: variance with censoring

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j (n_j - d_j)} \]

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j^2} \]

\[ \hat{S}(t) = \exp(-\hat{H}(t)) \]

events = rexp(1000)
Tcens = rexp(1000) * 3
observed = pmin(events, Tcens)
censor = (events < Tcens)
H = basehaz(coxph(Surv(observed, censor) ~ 1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

plot(H$time, exp(-H$hazard), type='l')
lines(H$time, exp(-H$time), col='red')

What happens if we ignore censoring?

ignore_censor = rep(1, length(censor))
Hbias = basehaz(coxph(Surv(observed, ignore_censor) ~ 1))
plot(Hbias$time, exp(-Hbias$hazard), type='l')
lines(H$time, exp(-H$time), col='red')

Kaplan-Meier

\[ \hat{S}(t) = \prod_{j:\delta_j=1, T_j \leq t} \left(1 - \frac{d_j}{n_j} \right) \]

\[ P(T > v_i) = \prod_{j=1}^{i} P(T > v_{j} | T > v_{j-1}) \]

with \(V=(0,T_1,\dots,T_{\sum_i \delta_i}, \infty)\) and

\[ \widehat{ P}(T > T_{j} | T > T_{j-1}) = 1 - \frac{d_j}{n_j}. \]

plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)

plot(survfit(Surv(observed, censor) ~ 1), lwd=4, xlim=c(2,3), ylim=c(0,0.2))
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)

plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$time), col='red', lwd=2) # truth

Bias when we ignore censoring

plot(survfit(Surv(observed, rep(1, length(censor))) ~ 1))
F = survfit(Surv(observed, censor) ~ 1)
lines(H$time, exp(-H$time), col='red')

Comparing populations

Log-rank test

Contingency table

Status
Failed Survived Total
Sample 1 \(d_{1,j}\) \(|R_1(T_j)| - d_{1,j}\) \(|R_1(T_j)|\)
2 \(d_{2,j}\) \(|R_2(T_j)| - d_{2,j}\) \(|R_2(T_j)|\)
Total \(d_j = d_{1,j} + d_{2,j}\) \(|R(T_j)| - d_j\) \(|R(T_j)|\)

Comparing populations

Log-rank test

\[ \frac{d_j \cdot |R_1(T_j)|}{|R(T_j)|} \]

\[ \frac{\sum_{j: d_j \neq 0} d_{1,j} - \frac{d_j |R_1(T_j)|}{|R(T_j)|}} {\sqrt{\sum_{j: d_j \neq 0} \text{Var}(d_{1,j} | |R_1(T_j)|, |R(T_j)|, d_j) }} \overset{H_0}{\Rightarrow} N(0,1) \]

brainCancer = read.table(
    'https://www.stanford.edu/class/stats305b/data/BrainCancer.csv',
    sep=',',
    header=TRUE)
names(brainCancer)
## [1] "sex"       "diagnosis" "loc"       "ki"        "gtv"       "stereo"   
## [7] "status"    "time"
plot(survfit(Surv(time, status) ~ sex, data=brainCancer), lty=2:3) # Kaplan-Meier does not drop to 0 -- why?

survdiff(Surv(time, status) ~ sex, data=brainCancer)
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = brainCancer)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Female 45       15     18.5     0.676      1.44
## sex=Male   43       20     16.5     0.761      1.44
## 
##  Chisq= 1.4  on 1 degrees of freedom, p= 0.2

Cox (proportional hazards) model

\[ h_{\beta}(t;x) = \exp(x^T\beta) \cdot h_0(t) \]

with baseline hazard rate \(h_0\).

Partial likelihood

\[ \frac{\exp(X_i^T\beta)}{\sum_{j \in R(t)} \exp(X_j^T\beta)} \]

\[ \prod_{i: \delta_i=1} \left(\frac{\exp(X_i^T\beta)}{\sum_{j \in R(T_i)} \exp(X_j^T\beta)} \right) = \prod_{i: \delta_i=1} \left(\frac{\exp(X_i^T\beta)}{\sum_{j: O_j \geq O_i} \exp(X_j^T\beta)} \right) \]

Partial likelihood when no ties

\[ \begin{aligned} P_{\beta}(T_1<T_2) &= \int_0^{\infty} \left[\int_{t_1}^{\infty} e^{\eta_2} h_0(t_2) e^{-e^{\eta_2} H_0(t_2)} \; dt_2 \right] \\ & \qquad \cdot e^{\eta_1} h_0(t_1) e^{-e^{\eta_2} H_0(t_1)} \; dt_1. \end{aligned} \]

Inner integral

\[ \int_{t_1}^{\infty} e^{\eta_2} h_0(t_2) e^{-e^{\eta_2} H_0(t_2)} \; dt_2 = e^{-e^{\eta_2} H_0(t_1)}. \]

Final integral

\[ \int_0^{\infty} e^{\eta_1} h_0(t) e^{-(e^{-\eta_1}+e^{-\eta_2}) H_0(t_1)} \; dt_1 = \frac{e^{\eta}_1}{e^{\eta_1}+e^{\eta_2}} \]

brainCancer = read.table(
    'https://www.stanford.edu/class/stats305b/data/BrainCancer.csv',
    sep=',',
    header=TRUE)
library(survival)
M = coxph(Surv(time, status) ~ sex, data=brainCancer)
summary(M) # why no intercept?
## Call:
## coxph(formula = Surv(time, status) ~ sex, data = brainCancer)
## 
##   n= 88, number of events= 35 
## 
##           coef exp(coef) se(coef)     z Pr(>|z|)
## sexMale 0.4077    1.5033   0.3420 1.192    0.233
## 
##         exp(coef) exp(-coef) lower .95 upper .95
## sexMale     1.503     0.6652     0.769     2.939
## 
## Concordance= 0.565  (se = 0.045 )
## Likelihood ratio test= 1.44  on 1 df,   p=0.2
## Wald test            = 1.42  on 1 df,   p=0.2
## Score (logrank) test = 1.44  on 1 df,   p=0.2
sqrt(diag(vcov(M)))
##   sexMale 
## 0.3420042
table(factor(brainCancer$diagnosis))
## 
##  HG glioma  LG glioma Meningioma      Other 
##         22          9         42         14
subs = !is.na(brainCancer$diagnosis)
M = coxph(Surv(time, status) ~ sex, data=brainCancer, subset=subs) # one missing diagnosis
M2 = coxph(Surv(time, status) ~ sex + diagnosis, data=brainCancer, subset=subs)
anova(M, M2)
## Analysis of Deviance Table
##  Cox model: response is  Surv(time, status)
##  Model 1: ~ sex
##  Model 2: ~ sex + diagnosis
##    loglik  Chisq Df P(>|Chi|)    
## 1 -136.54                        
## 2 -125.68 21.728  3 7.432e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(M2)
## Call:
## coxph(formula = Surv(time, status) ~ sex + diagnosis, data = brainCancer, 
##     subset = subs)
## 
##   n= 87, number of events= 35 
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## sexMale              0.0353    1.0359   0.3617  0.098   0.9223    
## diagnosisLG glioma  -1.1108    0.3293   0.5632 -1.972   0.0486 *  
## diagnosisMeningioma -1.9822    0.1378   0.4382 -4.524 6.07e-06 ***
## diagnosisOther      -1.1891    0.3045   0.5113 -2.325   0.0200 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## sexMale                1.0359     0.9653   0.50990    2.1046
## diagnosisLG glioma     0.3293     3.0368   0.10919    0.9931
## diagnosisMeningioma    0.1378     7.2586   0.05837    0.3252
## diagnosisOther         0.3045     3.2842   0.11177    0.8295
## 
## Concordance= 0.749  (se = 0.038 )
## Likelihood ratio test= 23.51  on 4 df,   p=1e-04
## Wald test            = 23.62  on 4 df,   p=1e-04
## Score (logrank) test = 29.69  on 4 df,   p=6e-06

Score

\[ \sum_{i:\delta_i=1} \left(\frac{\sum_{j: O_j \geq O_i} X_j \exp(X_j^T\beta)}{\sum_{j: O_j \geq O_i} \exp(X_j^T\beta)} - X_i \right) \]

\[ \sum_{i:\delta_i=1} \left( E_{\beta,i}[X] - X_i \right) \]

Hessian

Relation between score and log-rank test

\[ E_{0,i}[X] = \frac{d_i \cdot \sum_{j:O_j \geq O_i} X_i}{ \sum_{j:O_j \geq O_i} 1 } = \frac{d_i \cdot \# R_1(T_i)}{\# R(T_i)} \]

Variations of log-rank test

Weighted log-rank

\[ \frac{\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])}{\left(\sum_{i:\delta_i=1} W(T_i)^2 \text{Var}_{0,i}[X]\right)^{1/2}} \]

Multivariate for \(K\) groups:

\[ \left(\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])\right)^T \left(\sum_{i:\delta_i=1} W(T_i)^2 \text{Var}_{0,i}[X]\right)^{-1} \left(\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])\right) \overset{H_0}{\approx} \chi^2_{K-1}. \]

survdiff(Surv(time, status) ~ sex, data=brainCancer)
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = brainCancer)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Female 45       15     18.5     0.676      1.44
## sex=Male   43       20     16.5     0.761      1.44
## 
##  Chisq= 1.4  on 1 degrees of freedom, p= 0.2
summary(coxph(Surv(time, status) ~ sex, data=brainCancer))
## Call:
## coxph(formula = Surv(time, status) ~ sex, data = brainCancer)
## 
##   n= 88, number of events= 35 
## 
##           coef exp(coef) se(coef)     z Pr(>|z|)
## sexMale 0.4077    1.5033   0.3420 1.192    0.233
## 
##         exp(coef) exp(-coef) lower .95 upper .95
## sexMale     1.503     0.6652     0.769     2.939
## 
## Concordance= 0.565  (se = 0.045 )
## Likelihood ratio test= 1.44  on 1 df,   p=0.2
## Wald test            = 1.42  on 1 df,   p=0.2
## Score (logrank) test = 1.44  on 1 df,   p=0.2
survdiff(Surv(time, status) ~ sex, 
         data=brainCancer,
         rho=1)
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = brainCancer, 
##     rho = 1)
## 
##             N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Female 45     11.3     14.4     0.661      1.73
## sex=Male   43     16.0     13.0     0.735      1.73
## 
##  Chisq= 1.7  on 1 degrees of freedom, p= 0.2

An example with \(K=3\)

survdiff(Surv(time, status) ~ diagnosis, data=brainCancer)
## Call:
## survdiff(formula = Surv(time, status) ~ diagnosis, data = brainCancer)
## 
## n=87, 1 observation deleted due to missingness.
## 
##                       N Observed Expected (O-E)^2/E (O-E)^2/V
## diagnosis=HG glioma  22       17     5.69   22.4456   27.5457
## diagnosis=LG glioma   9        4     3.75    0.0167    0.0188
## diagnosis=Meningioma 42        9    20.26    6.2583   15.2537
## diagnosis=Other      14        5     5.30    0.0165    0.0196
## 
##  Chisq= 29.7  on 3 degrees of freedom, p= 2e-06
summary(coxph(Surv(time, status) ~ diagnosis, data=brainCancer))
## Call:
## coxph(formula = Surv(time, status) ~ diagnosis, data = brainCancer)
## 
##   n= 87, number of events= 35 
##    (1 observation deleted due to missingness)
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)    
## diagnosisLG glioma  -1.1072    0.3305   0.5620 -1.970   0.0488 *  
## diagnosisMeningioma -1.9937    0.1362   0.4221 -4.723 2.32e-06 ***
## diagnosisOther      -1.1872    0.3051   0.5109 -2.324   0.0201 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## diagnosisLG glioma     0.3305      3.026   0.10985    0.9943
## diagnosisMeningioma    0.1362      7.342   0.05955    0.3115
## diagnosisOther         0.3051      3.278   0.11207    0.8305
## 
## Concordance= 0.725  (se = 0.044 )
## Likelihood ratio test= 23.5  on 3 df,   p=3e-05
## Wald test            = 23.61  on 3 df,   p=3e-05
## Score (logrank) test = 29.68  on 3 df,   p=2e-06
survdiff(Surv(time, status) ~ diagnosis,
         data=brainCancer,
         rho=1)
## Call:
## survdiff(formula = Surv(time, status) ~ diagnosis, data = brainCancer, 
##     rho = 1)
## 
## n=87, 1 observation deleted due to missingness.
## 
##                       N Observed Expected (O-E)^2/E (O-E)^2/V
## diagnosis=HG glioma  22    14.13     4.73  18.71252   27.5305
## diagnosis=LG glioma   9     2.72     2.88   0.00869    0.0122
## diagnosis=Meningioma 42     6.53    15.41   5.11289   14.8544
## diagnosis=Other      14     3.86     4.23   0.03263    0.0474
## 
##  Chisq= 29.3  on 3 degrees of freedom, p= 2e-06

Baseline hazard

\[ \widehat{H}_B(t;\beta) = \sum_{i:\delta_i=1, T_i \leq t} \frac{d_i}{\sum_{j \in R(T_i)} \exp(X_j^T\beta)} \]

Estimated survival function

\[ S_{\widehat{\beta}}(t;X_{new})=\exp\left(-\exp(X_{new}^T\widehat{\beta}) \cdot \widehat{H}_B(t;\widehat{\beta}) \right) \]

Profile likelihood

\[ \ell_{\text{P}}(\beta) = \inf_{\theta} \ell(\beta,\theta) \]

with minimizer \(\widehat{\beta}_{\text{P}}\)

\[ \widehat{\beta}_{\text{P}} = \widehat{\beta}_{MLE}! \]

Profile likelihood

Profile likelihood

Example: OLS

\[ \ell_{\text{P}}(\beta) = \frac{n}{2} \log(\|Y-X\beta\|^2_2) \]

and

\[ \nabla^2 \ell_{\text{P}}(\widehat{\beta}_{\text{P}}) = \frac{n}{\|Y-X\widehat{\beta}_{\text{P}}\|^2_2} X^TX. \]

Breslow estimate of hazard (sketch, no ties)

\[ \prod_{i=1}^n \left(e^{X_i^T\beta} h_i \right)^{\delta_i} \left(\exp\left(-e^{X_i^T\beta} \sum_{j:O_j \leq O_i} h_j\right) \right) \]

\[ \left(\sum_{i=1}^n \delta_i (X_i^T\beta + \log h_i) \right) - \left( \sum_{i=1}^n e^{X_i^T\beta} \sum_{j:O_j \leq O_i} h_j\right) \]

\[ \left(\sum_{i=1}^n \delta_i (X_i^T\beta + \log h_i) \right) - \left( \sum_{j=1}^n h_j \sum_{i:O_i \geq O_j} e^{X_i^T\beta} \right) \]

\[ \left(\sum_{i=1}^n \delta_i (X_i^T\beta + \log h_i) \right) - \left( \sum_{i=1}^n h_i \sum_{j:O_j \geq O_i} e^{X_j^T\beta} \right) \]

\[ \frac{\partial}{\partial h_i}: \frac{\delta_i}{h_i} - \sum_{j:O_j \geq O_i} e^{X_j^T\beta} \]

\[ h_i(\beta) = \frac{\delta_i}{\sum_{j:O_j \geq O_i} e^{X_j^T\beta}}. \]

\[ \sum_{i=1}^n \delta_i (X_i^T\beta) - \log\left(\sum_{j:O_j \geq O_i} e^{X_j^T\beta}\right) \]

\[ \left(\sum_{i=1}^n \frac{\delta_i}{\sum_{k:O_k \geq O_i} e^{X_k^T\beta}} \sum_{j:O_j \geq O_i} e^{X_j^T\beta} \right) = \sum_{i=1}^n \delta_i. \]

Diagnostics

Residuals (several)

\[ \delta_i - \exp(X_i'\widehat{\beta}) \cdot \widehat{H}_B(O_i;\widehat{\beta}) \]

\[ -\log \left(S_{\widehat{\beta}}(O_i;X_i)\right) = \exp(X_i'\widehat{\beta}) \cdot \widehat{H}_B(T_i;\widehat{\beta}) \]

M3 = coxph(Surv(time, status) ~ sex + ki + diagnosis + gtv, data=brainCancer)
summary(M3)
## Call:
## coxph(formula = Surv(time, status) ~ sex + ki + diagnosis + gtv, 
##     data = brainCancer)
## 
##   n= 87, number of events= 35 
##    (1 observation deleted due to missingness)
## 
##                         coef exp(coef) se(coef)      z Pr(>|z|)    
## sexMale              0.14985   1.16166  0.36031  0.416  0.67749    
## ki                  -0.05484   0.94664  0.01832 -2.993  0.00277 ** 
## diagnosisLG glioma  -1.16457   0.31206  0.57321 -2.032  0.04219 *  
## diagnosisMeningioma -2.20885   0.10983  0.44137 -5.005  5.6e-07 ***
## diagnosisOther      -1.53888   0.21462  0.53680 -2.867  0.00415 ** 
## gtv                  0.04190   1.04279  0.02033  2.061  0.03934 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## sexMale                1.1617     0.8608   0.57330    2.3538
## ki                     0.9466     1.0564   0.91324    0.9813
## diagnosisLG glioma     0.3121     3.2046   0.10146    0.9597
## diagnosisMeningioma    0.1098     9.1053   0.04624    0.2609
## diagnosisOther         0.2146     4.6594   0.07495    0.6146
## gtv                    1.0428     0.9590   1.00205    1.0852
## 
## Concordance= 0.782  (se = 0.042 )
## Likelihood ratio test= 40.5  on 6 df,   p=4e-07
## Wald test            = 38.44  on 6 df,   p=9e-07
## Score (logrank) test = 46.21  on 6 df,   p=3e-08
plot(predict(M3), resid(M3)) # default martingale residuals
abline(h=0, lty=2)

cox_snell = pmax(brainCancer$status[subs] - resid(M3), 0)
H = basehaz(coxph(Surv(cox_snell, brainCancer$status[subs]) ~ 1))
plot(H$time, H$hazard)
abline(c(0, 1))

Regularization

LASSO

\[ \text{minimize}_{\beta} \ell(X\beta) + \lambda \|\beta\|_1. \]

library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.0-2
M = coxph(Surv(time, status) ~ ., data=brainCancer)
D = model.frame(M) # treats diagnosis as continuous
X = model.matrix(M)[,-1]
Y = D[,1]
plot(glmnet(X, Y, family='cox'))

Ridge

plot(glmnet(X, Y, family='cox', alpha=0))

Elastic Net

plot(glmnet(X, Y, family='cox', alpha=0.2))

plot(cv.glmnet(X, Y, family='cox'))

Cross-validation score

Harrell’s concordance score (no ties)

plot(cv.glmnet(X, Y, family='cox', type.measure='C'))

Verifying proportional hazard assumption

plot(survfit(Surv(time, status) ~ diagnosis, data=brainCancer), lty=c(1:4))

plot(survfit(Surv(time, status) ~ sex, data=brainCancer), lty=c(1:2))

Verifying proportional hazard assumption

Residuals (continued)

Using Schoenfeld residuals to test PH

\[ \sum_{i:\delta_i=1} \ell_i(\beta(T_i)) \]

where

\[ \ell_i(\beta(t)) = \left[X_i^T\beta(t) - \log\left(\sum_{j: j \in R(t)} \exp(X_j^T\beta(t)) \right) \right] \]

\[ \begin{aligned} \ell_i(\widehat{\beta}_{MLE} + \zeta_i) - \ell_i(\widehat{\beta}_{MLE}) &= \nabla \ell_i(\widehat{\beta}_{MLE})^T\zeta_i + \frac{1}{2} \zeta_i^T\nabla^2 \ell_i(\widehat{\beta}_{MLE}) \zeta_i + \dots \\ &= S_i^T\zeta_i - \frac{1}{2} \zeta_i^T\text{Var}_{\widehat{\beta}_{MLE},i}[X]\zeta_i + \dots \end{aligned} \]

Residual plot

\[ \widehat{\zeta}_i = S_i^*(\widehat{\beta}_{MLE}) \]

\[ (T_i, \widehat{\beta}_{MLE,j} + \widehat{\zeta}_{i,j}), \qquad \delta_i=1, 1 \leq j \leq p \]

should look flat if \(\beta_j(t) \equiv \beta_j\).

par(mfrow=c(2,1))
plot(cox.zph(M2))

M2
## Call:
## coxph(formula = Surv(time, status) ~ sex + diagnosis, data = brainCancer, 
##     subset = subs)
## 
##                        coef exp(coef) se(coef)      z        p
## sexMale              0.0353    1.0359   0.3617  0.098   0.9223
## diagnosisLG glioma  -1.1108    0.3293   0.5632 -1.972   0.0486
## diagnosisMeningioma -1.9822    0.1378   0.4382 -4.524 6.07e-06
## diagnosisOther      -1.1891    0.3045   0.5113 -2.325   0.0200
## 
## Likelihood ratio test=23.51  on 4 df, p=0.0001003
## n= 87, number of events= 35

Formal test of PH assumption

cox.zph(M2)
##           chisq df    p
## sex       0.671  1 0.41
## diagnosis 1.337  3 0.72
## GLOBAL    1.951  4 0.74
cox.zph(M3)
##            chisq df    p
## sex       0.9708  1 0.32
## ki        0.0547  1 0.81
## diagnosis 1.1849  3 0.76
## gtv       0.1370  1 0.71
## GLOBAL    2.5483  6 0.86