Survival analysis

  • Very brief description of data generating mechanism and models.

  • Large part of a full course EPI262

  • Assuming no censoring, we observe: \((T_i, X_i)\) with \(T_i\) an event / failure time and \(X_i\) are explanatory covariates.

  • With (right) censoring, we observe \((O_i, \delta_i, X_i)\) where \(O_i\) as a time (either event or censored) and \(\delta_i\) denotes whether the time is a true event or not.

  • Goal: understand \(F_{X_i} = {\cal L}(T_i|X_i)\) where \(T_i\) are the true event times.

References


Estimands

  • Given a candidate \(F_{X_i}\) and age \(a\) we might be interested in remaining lifetime

\[ E_{F_{X_i}}(T - a | T \geq a) \]
  • Or perhaps the median

\[ \text{median}_{F_{X_i}}(T - a | T \geq a). \]
  • No natural loss function as in linear / logistic regression.

  • If we can model / estimate \({\cal L}(T|X)\) then we can derive estimates of above quantities.

  • Tests of equality of distribution functions are common rather than tests about means / medians.


Hazard function

  • For a survival distribution \(F\), many quantities of interest can be more simply expressed using hazard

\[\begin{split} \begin{aligned} H([t,t+dt)) &= P_F(T < t+dt|T \geq t) \\ \end{aligned} \end{split}\]
  • When density exists, we get the hazard rate

\[ \begin{aligned} H([t,t+dt)) \approx h(t) \; dt \end{aligned} \]

or

\[\begin{split} \begin{aligned} h(t) &= \frac{f(t)}{S(t)} \\ &= - \frac{d}{dt} \log S(t) \end{aligned} \end{split}\]

with \(S(t) = \bar{F}(t) = 1 - F(t)\)


Cumulative hazard

  • Define

\[ H(t) = \int_{[0,t]} h(s) \; ds \]
  • Or,

\[ S(t) = \exp(-H(t)). \]

Hazard function for discrete laws

  • For discrete distributions supported on \(\{t_i\}\), the hazard is 0 everywhere except support points

\[\begin{split} h(t) = \begin{cases} \frac{P(T=t)}{\sum_{s \geq t} P(T=s)} &\leq 1 \\ 0 &= \text{if $t$ is not in support} \end{cases} \end{split}\]
  • “Natural” cumulative hazard

\[ H(t) = \sum_{j:T_j \leq t} h(T_j) \]
  • Does not have property

\[ S(t) = \exp(-H(t)). \]
  • Does hold for alternative cumulative hazard

\[ \tilde{H}(t) = \sum_{j:T_j \leq t} -\log(1 - h(T_j)) \]

Hazard models

  • Many models in survival analysis model the hazard function (or cumulative version).

  • The hazard function just has to be non-negative (and integrate to \(\infty\) eventually…).

Constant hazard

  • If \(T \sim \text{Exp}(\lambda)\) with density

\[ f_T(t) = \frac{1}{\lambda} e^{-t/\lambda} 1_{[0,\infty)}(t) \]
  • Then \(S(t) = e^{-t/\lambda}\) and

\[ h(t) = \frac{1}{\lambda} \]

and

\[ H(t) = t / \lambda. \]

Estimating a hazard rate

Nelson-Aalen

  • Non-parametric estimate (assuming no ties in data)

\[ \hat{H}(t) = \sum_{j:T_j \leq t} \frac{1}{ |\left\{T_j \geq t\right\}|} = \sum_{j:T_j \leq t} \frac{1}{ |R(t)|} \]

where \(R(t) = \left\{j:T_j \geq t \right\}\) is the risk set at time \(t\).

  • This assumes no censoring – coming up…

library(survival)
events = rexp(100)
censor = rep(1, 100)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)
../_images/Survival_I_1_0.png

Nelson-Aalen estimator

  • Asymptotic properties can be analyzed using techniques from point processes in which \(H\) is the “cumulative intensity”.

  • Fluctuation process

\[ (\hat{H}(t) - H(t))_{t \geq 0} \]

can be analyzed using martingale techniques to provide pointwise confidence intervals for \(H\).


Nelson-Aalen variance estimate

  • Standard error

\[ \widehat{Var}(\hat{H}(t)) = \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2} \]
  • Can use delta method for CI for \(S(t)\) based on Breslow estimator of survival: 95% CI for \(S(t)\):

\[ \exp(-\hat{H}(t)) \pm 1.96 * \exp\left(-\widehat{H}(t)\right) \sqrt{\sum_{j:T_j \leq t} \frac{1}{|R(t)|^2}} \]

Where does this variance estimate come from?

  • Roughly, partitioning \([0,t]\) finely, set \(s_j=tj/m\)

\[ \widehat{H}(t) = \sum_{j=0}^{m-1} \widehat{H}(s_{j+1}) - \widehat{H}(s_j) \]

and

\[ \widehat{H}(s_{j+1}) - \widehat{H}(s_j) = \frac{\sum_{i=1}^n 1_{[s_{j+1},s_j]}(T_i)}{|R(s_j)|} \]
  • A reasonable heuristic

\[\widehat{H}(s_{j+1}) - \widehat{H}(s_j) \biggl| |R(s_j)| \sim \frac{1}{|R(s_j)|} \text{Poisson}(|R(s_j)| h(s_j) (s_{j+1}-s_j)) \]

leads to (conditional) variance above

\[ \frac{h(s_j) (s_{j+1}-s_j)}{|R(s_j)|} \]

  • Summing all these conditional variances (this uses the martingale property) leads to

\[ \int_{[0,t]} \frac{1}{|R(t)|} h(s) \; ds \]
  • We can approximate

\[f \mapsto \int_{[0,t]} f(s) h(s)\; ds \approx \int_{[0,t]} f(s) \hat{H}(ds)\]
  • Or,

\[ \int_{[0,t]} f(s) h(s)\; ds \approx \sum_{j:T_j \leq t} f(T_j) \frac{1}{|R(t_j)|} \]
  • With \(f(t)=|R(t)|^{-1}\) this yields

\[ \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2}. \]

# a larger sample
events = rexp(1000)
censor = rep(1, 1000)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)
../_images/Survival_I_3_0.png

Models

Proportional hazards

  • Models the hazard rate

\[ h_{\beta}(t;x) = \exp(x^T\beta) \cdot h_0(t) \]

with baseline hazard rate \(h_0\).

  • In principle, the exponential could be swapped with another link.

Cox’s model

  • Allows nonparametric model for \(h_0\).

  • Example of a semiparametric model.

  • We will come back to Cox model shortly.


Accelerated failure time

  • Assumes \(T_i = \theta(X_i^T\beta)^{-1} \cdot Z_i\) where law of \(Z_i\) does not depend on any parameter.

  • Not hard to see this corresponds a

\[ h_{\beta}(t;x) = \theta(x^T\beta) \cdot h_0(\theta(x^T\beta) t) \]

with \(h_0\) the hazard function of \(Z_i\).

  • Usually a parametric model for \(Z_i\), e.g. Weibull, log-logistic, etc.


  • Equivalent to

\[ \log(T_i) = -\log(\theta(X_i^T\beta)) + \log Z_i \]
  • Choosing \(\theta(\eta)=\exp(-\eta)\) yields

\[ \log(T_i) = X_i^T\beta + \log Z_i \]
  • Just a parametric regression model – errors are not Gaussian.


summary(survreg(Surv(events, censor)~1, dist='weibull'))
Call:
survreg(formula = Surv(events, censor) ~ 1, dist = "weibull")
              Value Std. Error     z    p
(Intercept)  0.0236     0.0332  0.71 0.48
Log(scale)  -0.0030     0.0246 -0.12 0.90

Scale= 0.997 

Weibull distribution
Loglik(model)= -1022.3   Loglik(intercept only)= -1022.3
Number of Newton-Raphson Iterations: 6 
n= 1000 

  • Let’s generate some data from correctly specified model

X = runif(1000)
censor = rep(1,1000)
events = rexp(1000)^0.5 * exp(2*X-1)
M = survreg(Surv(events, censor) ~ X, dist='weibull')
summary(M)
Call:
survreg(formula = Surv(events, censor) ~ X, dist = "weibull")
              Value Std. Error     z      p
(Intercept) -1.0567     0.0324 -32.6 <2e-16
X            2.0856     0.0546  38.2 <2e-16
Log(scale)  -0.6852     0.0247 -27.7 <2e-16

Scale= 0.504 

Weibull distribution
Loglik(model)= -605.7   Loglik(intercept only)= -1011.4
	Chisq= 811.31 on 1 degrees of freedom, p= 1.9e-178 
Number of Newton-Raphson Iterations: 6 
n= 1000 

  • Using weibull estimates a variance in the regression…

M2 = survreg(Surv(events, censor) ~ X, dist='exponential')
summary(M2)
Call:
survreg(formula = Surv(events, censor) ~ X, dist = "exponential")
              Value Std. Error     z      p
(Intercept) -1.1673     0.0636 -18.4 <2e-16
X            2.0651     0.1085  19.0 <2e-16

Scale fixed at 1 

Exponential distribution
Loglik(model)= -883.3   Loglik(intercept only)= -1056.4
	Chisq= 346.3 on 1 degrees of freedom, p= 2.7e-77 
Number of Newton-Raphson Iterations: 4 
n= 1000 

Censoring

  • Let’s imagine a lab running experiments on development of cancer in mice after exposure to different levels of carcinogen.

  • Mice are born on day \(B\), exposed to carcinogen \(X\) and their symptoms are detected at time \(D\) yielding event time \(T=D-B\).

  • Funding for experiment was only guaranteed up to end date \(E=\)3/31/2020.

  • Not all mice had symptoms at end date \(E\).

  • For those mice who have presented symptoms, we observe time \(O=T-B\).

  • For the others, we observe \(O=E-B\) and only know that their symptoms could only present later than \(O\) in their lifetime.

  • This is called right-censoring we only know the time of interest lies in some interval \([O,\infty)\).

  • With right-censored data, observations are

\[ (O_i, \delta_i, X_i) \]

with \(O_i\) the observed “time” for subject \(i\) and \(\delta_i=1\) denotes the time of interest was observed, \(\delta_i=0\) denotes this observation was right-censored.


Likelihood under censoring

  • In order to fit model and carry out inference, we usually must assume censoring is non-informative.

  • For non-informative censoring, the likelihood is

\[ \left(\prod_{i:\delta_i=1} f_{\beta}(O_i;X_i) \right) \left(\prod_{i:\delta_i=0} S_{\beta}(O_i;X_i) \right) \]
  • Or,

\[ \prod_{i} h_{\beta}(O_i;X_i)^{\delta_i} S_{\beta}(O_i;X_i) \]
  • We have used

\[ f(t) = h(t) S(t). \]

Tcens = rexp(length(events)) * 1.5
observed = pmin(events, Tcens)
censor = (events <= Tcens)
M3 = survreg(Surv(observed, censor) ~ X, dist='weibull')
summary(M3)
Call:
survreg(formula = Surv(observed, censor) ~ X, dist = "weibull")
              Value Std. Error     z      p
(Intercept) -1.0666     0.0392 -27.2 <2e-16
X            2.1073     0.0779  27.0 <2e-16
Log(scale)  -0.6881     0.0318 -21.7 <2e-16

Scale= 0.503 

Weibull distribution
Loglik(model)= -340.8   Loglik(intercept only)= -610.9
	Chisq= 540.15 on 1 degrees of freedom, p= 1.8e-119 
Number of Newton-Raphson Iterations: 6 
n= 1000 

Nelson-Aalen with censoriing

  • Can still estimate hazard under right-censoring

\[ \hat{H}(t) = \sum_{j: \delta_j = 1, T_j \leq t} \frac{d_j}{n_j} \]
  • \(R(t) = \left\{j: O_j \geq t\right\}\) is the risk set

  • \(d_j\) the number of events at \(T_j\) (1 if no ties possible)

  • \(n_j = |R(T_j)|\) the number at risk just prior to \(T_j\) (i.e. candidates to fail at \(T_j\))


Nelson-Aalen: variance with censoring

  • Standard error is estimated as

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j (n_j - d_j)} \]
  • Or (asmptotically similar)

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j^2} \]
  • Can similarly use delta method for CI for \(S(t)\) based on Breslow estimator of survival function

\[ \hat{S}(t) = \exp(-\hat{H}(t)) \]

events = rexp(1000)
Tcens = rexp(1000) * 3
observed = pmin(events, Tcens)
censor = (events < Tcens)
H = basehaz(coxph(Surv(observed, censor) ~ 1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)
../_images/Survival_I_14_0.png
plot(H$time, exp(-H$hazard), type='l')
lines(H$time, exp(-H$time), col='red')
../_images/Survival_I_15_0.png

What happens if we ignore censoring?

ignore_censor = rep(1, length(censor))
Hbias = basehaz(coxph(Surv(observed, ignore_censor) ~ 1))
plot(Hbias$time, exp(-Hbias$hazard), type='l')
lines(H$time, exp(-H$time), col='red')
../_images/Survival_I_17_0.png

Kaplan-Meier

  • A more commonly used estimate of the survival function

\[ \hat{S}(t) = \prod_{j:\delta_j=1, T_j \leq t} \left(1 - \frac{d_j}{n_j} \right) \]
  • Based on product formula (for partition \(V=(v_0=0 < v_1 < \dots < v_m)=\infty\)) of \([0,\infty)\))

\[ P(T > v_i) = \prod_{j=1}^{i} P(T > v_{j} | T > v_{j-1}) \]

with \(V=(0,T_1,\dots,T_{\sum_i \delta_i}, \infty)\) and

\[ \widehat{ P}(T > T_{j} | T > T_{j-1}) = 1 - \frac{d_j}{n_j}. \]
  • Note Taylor expansion \(e^{-h} \approx 1 - h \dots\) so it is close to Breslow’s estimator


plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)
../_images/Survival_I_19_0.png

plot(survfit(Surv(observed, censor) ~ 1), lwd=4, xlim=c(2,3), ylim=c(0,0.2))
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)
../_images/Survival_I_21_0.png

plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$time), col='red', lwd=2) # truth
../_images/Survival_I_23_0.png

Bias when we ignore censoring

plot(survfit(Surv(observed, rep(1, length(censor))) ~ 1))
F = survfit(Surv(observed, censor) ~ 1)
lines(H$time, exp(-H$time), col='red')
../_images/Survival_I_25_0.png

Comparing populations

  • Suppose we have two populations from which we observe right-censored survival times \((O_{1,i},\delta_{1,i}), 1 \leq i \leq n_1\) and \((O_{2,i}, \delta_{2,i}), 1 \leq i \leq n_2\).

  • Can we tell if their underlying survival times (some of which we have not observed) come from the same distribution?

  • Null hypothesis \(H_0: F_1 = F_2\) where \(T_{1,i} \overset{IID}{\sim} F_1\) and \(T_{2,i} \overset{IID}{\sim} F_2\).


Log-rank test

  • Suppose we have a failure time \(T_j\) (could have come from either sample).

  • We have two risk sets \(R_1(t) = \left\{i: O_{1,i} \geq t\right\}, R_2(t) = \left\{i: O_{2,i} \geq t\right\}\)

  • The log-rank test model the interval \([t_j, t_j+dt)\) under \(H_0\).

  • At time \(T_j\) we the total (pooled) risk set is \(R(T_j) = R_1(T_j) \cup R_2(T_j)\) with cardinality \(|R(T_j)|\).


Contingency table

  • We construct a null model \(2x2\) table

Status
FailedSurvivedTotal
Sample1 $d_{1,j}$ $|R_1(T_j)| - d_{1,j}$ $|R_1(T_j)|$
2$d_{2,j}$ $|R_2(T_j)| - d_{2,j}$ $|R_2(T_j)|$
Total $d_j = d_{1,j} + d_{2,j}$ $|R(T_j)| - d_j$ $|R(T_j)|$
  • Under \(H_0\) we assign \(d_j\) deaths at time \(T_j\) and they should distribute equally between two samples.


Comparing populations

Log-rank test

  • Expected number of deaths under \(H_0\) in Sample 1 at time \(T_j\) is

\[ \frac{d_j \cdot |R_1(T_j)|}{|R(T_j)|} \]
  • Law of \(d_{1,j} | |R_1(T_j)|, |R(T_j)|, d_j\) is Hypergeometric.

  • Summing over \(j\)

\[ \frac{\sum_{j: d_j \neq 0} d_{1,j} - \frac{d_j |R_1(T_j)|}{|R(T_j)|}} {\sqrt{\sum_{j: d_j \neq 0} \text{Var}(d_{1,j} | |R_1(T_j)|, |R(T_j)|, d_j) }} \overset{H_0}{\Rightarrow} N(0,1) \]
  • Mantel-Haenszel test applied to each of these 2x2 tables!


brainCancer = read.table(
    'https://www.stanford.edu/class/stats305b/data/BrainCancer.csv',
    sep=',',
    header=TRUE)
names(brainCancer)
  1. 'sex'
  2. 'diagnosis'
  3. 'loc'
  4. 'ki'
  5. 'gtv'
  6. 'stereo'
  7. 'status'
  8. 'time'
plot(survfit(Surv(time, status) ~ sex, data=brainCancer), lty=2:3) # Kaplan-Meier does not drop to 0 -- why?
../_images/Survival_I_28_0.png

survdiff(Surv(time, status) ~ sex, data=brainCancer)
Call:
survdiff(formula = Surv(time, status) ~ sex, data = brainCancer)

            N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female 45       15     18.5     0.676      1.44
sex=Male   43       20     16.5     0.761      1.44

 Chisq= 1.4  on 1 degrees of freedom, p= 0.2