Survival analysis¶

Very brief description of data generating mechanism and models.
Large part of a full course EPI262
Assuming no censoring, we observe: $(T_i, X_i)$ with $T_i$ an event / failure time and $X_i$ are explanatory covariates.
With (right) censoring, we observe $(O_i, \delta_i, X_i)$ where $O_i$ as a time (either event or censored) and $\delta_i$ denotes whether the time is a true event or not.
Goal: understand $F_{X_i} = {\cal L}(T_i|X_i)$ where $T_i$ are the true event times.

References¶

Therneau & Grambsh: Modeling Survival Data
Kalbfleisch & Prentice: The Statistical Analysis of Failure Time Data

Estimands¶

Given a candidate $F_{X_i}$ and age $a$ we might be interested in remaining lifetime

\[ E_{F_{X_i}}(T - a | T \geq a) \]

Or perhaps the median

\[ \text{median}_{F_{X_i}}(T - a | T \geq a). \]

No natural loss function as in linear / logistic regression.
If we can model / estimate ${\cal L}(T|X)$ then we can derive estimates of above quantities.
Tests of equality of distribution functions are common rather than tests about means / medians.

Hazard function¶

For a survival distribution $F$, many quantities of interest can be more simply expressed using hazard

\[\begin{split} \begin{aligned} H([t,t+dt)) &= P_F(T < t+dt|T \geq t) \\ \end{aligned} \end{split}\]

When density exists, we get the hazard rate

\[ \begin{aligned} H([t,t+dt)) \approx h(t) \; dt \end{aligned} \]

\[\begin{split} \begin{aligned} h(t) &= \frac{f(t)}{S(t)} \\ &= - \frac{d}{dt} \log S(t) \end{aligned} \end{split}\]

with $S(t) = \bar{F}(t) = 1 - F(t)$

Cumulative hazard¶

Define

\[ H(t) = \int_{[0,t]} h(s) \; ds \]

\[ S(t) = \exp(-H(t)). \]

Hazard function for discrete laws¶

For discrete distributions supported on $\{t_i\}$, the hazard is 0 everywhere except support points

\[\begin{split} h(t) = \begin{cases} \frac{P(T=t)}{\sum_{s \geq t} P(T=s)} &\leq 1 \\ 0 &= \text{if $t$ is not in support} \end{cases} \end{split}\]

“Natural” cumulative hazard

\[ H(t) = \sum_{j:T_j \leq t} h(T_j) \]

Does not have property

\[ S(t) = \exp(-H(t)). \]

Does hold for alternative cumulative hazard

\[ \tilde{H}(t) = \sum_{j:T_j \leq t} -\log(1 - h(T_j)) \]

Hazard models¶

Many models in survival analysis model the hazard function (or cumulative version).
The hazard function just has to be non-negative (and integrate to $\infty$ eventually…).

Constant hazard¶

If $T \sim \text{Exp}(\lambda)$ with density

\[ f_T(t) = \frac{1}{\lambda} e^{-t/\lambda} 1_{[0,\infty)}(t) \]

Then $S(t) = e^{-t/\lambda}$ and

\[ h(t) = \frac{1}{\lambda} \]

and

\[ H(t) = t / \lambda. \]

Estimating a hazard rate¶

Nelson-Aalen¶

Non-parametric estimate (assuming no ties in data)

\[ \hat{H}(t) = \sum_{j:T_j \leq t} \frac{1}{ |\left\{T_j \geq t\right\}|} = \sum_{j:T_j \leq t} \frac{1}{ |R(t)|} \]

where $R(t) = \left\{j:T_j \geq t \right\}$ is the risk set at time $t$.

This assumes no censoring – coming up…

library(survival)
events = rexp(100)
censor = rep(1, 100)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

Nelson-Aalen estimator¶

Asymptotic properties can be analyzed using techniques from point processes in which $H$ is the “cumulative intensity”.
Fluctuation process

\[ (\hat{H}(t) - H(t))_{t \geq 0} \]

can be analyzed using martingale techniques to provide pointwise confidence intervals for $H$.

Nelson-Aalen variance estimate¶

Standard error

\[ \widehat{Var}(\hat{H}(t)) = \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2} \]

Can use delta method for CI for $S(t)$ based on Breslow estimator of survival: 95% CI for $S(t)$:

\[ \exp(-\hat{H}(t)) \pm 1.96 * \exp\left(-\widehat{H}(t)\right) \sqrt{\sum_{j:T_j \leq t} \frac{1}{|R(t)|^2}} \]

Where does this variance estimate come from?¶

Roughly, partitioning $[0,t]$ finely, set $s_j=tj/m$

\[ \widehat{H}(t) = \sum_{j=0}^{m-1} \widehat{H}(s_{j+1}) - \widehat{H}(s_j) \]

and

\[ \widehat{H}(s_{j+1}) - \widehat{H}(s_j) = \frac{\sum_{i=1}^n 1_{[s_{j+1},s_j]}(T_i)}{|R(s_j)|} \]

A reasonable heuristic

\[\widehat{H}(s_{j+1}) - \widehat{H}(s_j) \biggl| |R(s_j)| \sim \frac{1}{|R(s_j)|} \text{Poisson}(|R(s_j)| h(s_j) (s_{j+1}-s_j)) \]

leads to (conditional) variance above

\[ \frac{h(s_j) (s_{j+1}-s_j)}{|R(s_j)|} \]

Summing all these conditional variances (this uses the martingale property) leads to

\[ \int_{[0,t]} \frac{1}{|R(t)|} h(s) \; ds \]

We can approximate

\[f \mapsto \int_{[0,t]} f(s) h(s)\; ds \approx \int_{[0,t]} f(s) \hat{H}(ds)\]

\[ \int_{[0,t]} f(s) h(s)\; ds \approx \sum_{j:T_j \leq t} f(T_j) \frac{1}{|R(t_j)|} \]

With $f(t)=|R(t)|^{-1}$ this yields

\[ \sum_{j:T_j \leq t} \frac{1}{|R(T_j)|^2}. \]

# a larger sample
events = rexp(1000)
censor = rep(1, 1000)
H = basehaz(coxph(Surv(events, censor)~1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

Models¶

Proportional hazards¶

Models the hazard rate

\[ h_{\beta}(t;x) = \exp(x^T\beta) \cdot h_0(t) \]

with baseline hazard rate $h_0$.

In principle, the exponential could be swapped with another link.

Cox’s model¶

Allows nonparametric model for $h_0$.
Example of a semiparametric model.
We will come back to Cox model shortly.

Accelerated failure time¶

Assumes $T_i = \theta(X_i^T\beta)^{-1} \cdot Z_i$ where law of $Z_i$ does not depend on any parameter.
Not hard to see this corresponds a

\[ h_{\beta}(t;x) = \theta(x^T\beta) \cdot h_0(\theta(x^T\beta) t) \]

with $h_0$ the hazard function of $Z_i$.

Usually a parametric model for $Z_i$, e.g. Weibull, log-logistic, etc.

Equivalent to

\[ \log(T_i) = -\log(\theta(X_i^T\beta)) + \log Z_i \]

Choosing $\theta(\eta)=\exp(-\eta)$ yields

\[ \log(T_i) = X_i^T\beta + \log Z_i \]

Just a parametric regression model – errors are not Gaussian.

summary(survreg(Surv(events, censor)~1, dist='weibull'))

Call:
survreg(formula = Surv(events, censor) ~ 1, dist = "weibull")
              Value Std. Error     z    p
(Intercept)  0.0236     0.0332  0.71 0.48
Log(scale)  -0.0030     0.0246 -0.12 0.90

Scale= 0.997 

Weibull distribution
Loglik(model)= -1022.3   Loglik(intercept only)= -1022.3
Number of Newton-Raphson Iterations: 6 
n= 1000 

Let’s generate some data from correctly specified model

X = runif(1000)
censor = rep(1,1000)
events = rexp(1000)^0.5 * exp(2*X-1)
M = survreg(Surv(events, censor) ~ X, dist='weibull')

summary(M)

Call:
survreg(formula = Surv(events, censor) ~ X, dist = "weibull")
              Value Std. Error     z      p
(Intercept) -1.0567     0.0324 -32.6 <2e-16
X            2.0856     0.0546  38.2 <2e-16
Log(scale)  -0.6852     0.0247 -27.7 <2e-16

Scale= 0.504 

Weibull distribution
Loglik(model)= -605.7   Loglik(intercept only)= -1011.4
	Chisq= 811.31 on 1 degrees of freedom, p= 1.9e-178 
Number of Newton-Raphson Iterations: 6 
n= 1000 

Using weibull estimates a variance in the regression…

M2 = survreg(Surv(events, censor) ~ X, dist='exponential')
summary(M2)

Call:
survreg(formula = Surv(events, censor) ~ X, dist = "exponential")
              Value Std. Error     z      p
(Intercept) -1.1673     0.0636 -18.4 <2e-16
X            2.0651     0.1085  19.0 <2e-16

Scale fixed at 1 

Exponential distribution
Loglik(model)= -883.3   Loglik(intercept only)= -1056.4
	Chisq= 346.3 on 1 degrees of freedom, p= 2.7e-77 
Number of Newton-Raphson Iterations: 4 
n= 1000 

Censoring¶

Let’s imagine a lab running experiments on development of cancer in mice after exposure to different levels of carcinogen.
Mice are born on day $B$, exposed to carcinogen $X$ and their symptoms are detected at time $D$ yielding event time $T=D-B$.
Funding for experiment was only guaranteed up to end date $E=$3/31/2020.
Not all mice had symptoms at end date $E$.
For those mice who have presented symptoms, we observe time $O=T-B$.
For the others, we observe $O=E-B$ and only know that their symptoms could only present later than $O$ in their lifetime.
This is called right-censoring we only know the time of interest lies in some interval $[O,\infty)$.
With right-censored data, observations are

\[ (O_i, \delta_i, X_i) \]

with $O_i$ the observed “time” for subject $i$ and $\delta_i=1$ denotes the time of interest was observed, $\delta_i=0$ denotes this observation was right-censored.

Likelihood under censoring¶

In order to fit model and carry out inference, we usually must assume censoring is non-informative.
For non-informative censoring, the likelihood is

\[ \left(\prod_{i:\delta_i=1} f_{\beta}(O_i;X_i) \right) \left(\prod_{i:\delta_i=0} S_{\beta}(O_i;X_i) \right) \]

\[ \prod_{i} h_{\beta}(O_i;X_i)^{\delta_i} S_{\beta}(O_i;X_i) \]

We have used

\[ f(t) = h(t) S(t). \]

Tcens = rexp(length(events)) * 1.5
observed = pmin(events, Tcens)
censor = (events <= Tcens)
M3 = survreg(Surv(observed, censor) ~ X, dist='weibull')
summary(M3)

Call:
survreg(formula = Surv(observed, censor) ~ X, dist = "weibull")
              Value Std. Error     z      p
(Intercept) -1.0666     0.0392 -27.2 <2e-16
X            2.1073     0.0779  27.0 <2e-16
Log(scale)  -0.6881     0.0318 -21.7 <2e-16

Scale= 0.503 

Weibull distribution
Loglik(model)= -340.8   Loglik(intercept only)= -610.9
	Chisq= 540.15 on 1 degrees of freedom, p= 1.8e-119 
Number of Newton-Raphson Iterations: 6 
n= 1000 

Nelson-Aalen with censoriing¶

Can still estimate hazard under right-censoring

\[ \hat{H}(t) = \sum_{j: \delta_j = 1, T_j \leq t} \frac{d_j}{n_j} \]

$R(t) = \left\{j: O_j \geq t\right\}$ is the risk set
$d_j$ the number of events at $T_j$ (1 if no ties possible)
$n_j = |R(T_j)|$ the number at risk just prior to $T_j$ (i.e. candidates to fail at $T_j$)

Nelson-Aalen: variance with censoring¶

Standard error is estimated as

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j (n_j - d_j)} \]

Or (asmptotically similar)

\[ \widehat{Var}(\hat{H}(t))= \sum_{\delta_j=1, T_j \leq t} \frac{d_j}{n_j^2} \]

Can similarly use delta method for CI for $S(t)$ based on Breslow estimator of survival function

\[ \hat{S}(t) = \exp(-\hat{H}(t)) \]

events = rexp(1000)
Tcens = rexp(1000) * 3
observed = pmin(events, Tcens)
censor = (events < Tcens)
H = basehaz(coxph(Surv(observed, censor) ~ 1))
plot(H$time, H$hazard, type='l')
abline(0, 1, col='red', lty=2)

plot(H$time, exp(-H$hazard), type='l')
lines(H$time, exp(-H$time), col='red')

What happens if we ignore censoring?¶

ignore_censor = rep(1, length(censor))
Hbias = basehaz(coxph(Surv(observed, ignore_censor) ~ 1))
plot(Hbias$time, exp(-Hbias$hazard), type='l')
lines(H$time, exp(-H$time), col='red')

Kaplan-Meier¶

A more commonly used estimate of the survival function

\[ \hat{S}(t) = \prod_{j:\delta_j=1, T_j \leq t} \left(1 - \frac{d_j}{n_j} \right) \]

Based on product formula (for partition $V=(v_0=0 < v_1 < \dots < v_m)=\infty$) of $[0,\infty)$)

\[ P(T > v_i) = \prod_{j=1}^{i} P(T > v_{j} | T > v_{j-1}) \]

with $V=(0,T_1,\dots,T_{\sum_i \delta_i}, \infty)$ and

\[ \widehat{ P}(T > T_{j} | T > T_{j-1}) = 1 - \frac{d_j}{n_j}. \]

Note Taylor expansion $e^{-h} \approx 1 - h \dots$ so it is close to Breslow’s estimator

plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)

plot(survfit(Surv(observed, censor) ~ 1), lwd=4, xlim=c(2,3), ylim=c(0,0.2))
lines(H$time, exp(-H$hazard), col='yellow', lwd=2)

plot(survfit(Surv(observed, censor) ~ 1), lwd=4)
lines(H$time, exp(-H$time), col='red', lwd=2) # truth

Bias when we ignore censoring¶

plot(survfit(Surv(observed, rep(1, length(censor))) ~ 1))
F = survfit(Surv(observed, censor) ~ 1)
lines(H$time, exp(-H$time), col='red')

Comparing populations¶

Suppose we have two populations from which we observe right-censored survival times $(O_{1,i},\delta_{1,i}), 1 \leq i \leq n_1$ and $(O_{2,i}, \delta_{2,i}), 1 \leq i \leq n_2$.
Can we tell if their underlying survival times (some of which we have not observed) come from the same distribution?
Null hypothesis $H_0: F_1 = F_2$ where $T_{1,i} \overset{IID}{\sim} F_1$ and $T_{2,i} \overset{IID}{\sim} F_2$.

Log-rank test¶

Suppose we have a failure time $T_j$ (could have come from either sample).
We have two risk sets $R_1(t) = \left\{i: O_{1,i} \geq t\right\}, R_2(t) = \left\{i: O_{2,i} \geq t\right\}$
The log-rank test model the interval $[t_j, t_j+dt)$ under $H_0$.
At time $T_j$ we the total (pooled) risk set is $R(T_j) = R_1(T_j) \cup R_2(T_j)$ with cardinality $|R(T_j)|$.

Contingency table¶

We construct a null model $2x2$ table

		Status
		Failed	Survived	Total
Sample	1	$d_{1,j}$	$\|R_1(T_j)\| - d_{1,j}$	$\|R_1(T_j)\|$
	2	$d_{2,j}$	$\|R_2(T_j)\| - d_{2,j}$	$\|R_2(T_j)\|$
Total		$d_j = d_{1,j} + d_{2,j}$	$\|R(T_j)\| - d_j$	$\|R(T_j)\|$

Under $H_0$ we assign $d_j$ deaths at time $T_j$ and they should distribute equally between two samples.

Comparing populations¶

Log-rank test¶

Expected number of deaths under $H_0$ in Sample 1 at time $T_j$ is

\[ \frac{d_j \cdot |R_1(T_j)|}{|R(T_j)|} \]

Law of $d_{1,j} | |R_1(T_j)|, |R(T_j)|, d_j$ is Hypergeometric.
Summing over $j$

\[ \frac{\sum_{j: d_j \neq 0} d_{1,j} - \frac{d_j |R_1(T_j)|}{|R(T_j)|}} {\sqrt{\sum_{j: d_j \neq 0} \text{Var}(d_{1,j} | |R_1(T_j)|, |R(T_j)|, d_j) }} \overset{H_0}{\Rightarrow} N(0,1) \]

Mantel-Haenszel test applied to each of these 2x2 tables!

brainCancer = read.table(
    'https://www.stanford.edu/class/stats305b/data/BrainCancer.csv',
    sep=',',
    header=TRUE)
names(brainCancer)

'sex'
'diagnosis'
'loc'
'ki'
'gtv'
'stereo'
'status'
'time'

plot(survfit(Surv(time, status) ~ sex, data=brainCancer), lty=2:3) # Kaplan-Meier does not drop to 0 -- why?

survdiff(Surv(time, status) ~ sex, data=brainCancer)

Call:
survdiff(formula = Surv(time, status) ~ sex, data = brainCancer)

            N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female 45       15     18.5     0.676      1.44
sex=Male   43       20     16.5     0.761      1.44

 Chisq= 1.4  on 1 degrees of freedom, p= 0.2 

Models for matched pairs (11.1)

Cox (proportional hazards) model

STATS 305B

Survival analysis¶

References¶

Estimands¶

Hazard function¶

Cumulative hazard¶

Hazard function for discrete laws¶

Hazard models¶

Constant hazard¶

Estimating a hazard rate¶

Nelson-Aalen¶

Nelson-Aalen estimator¶

Nelson-Aalen variance estimate¶

Where does this variance estimate come from?¶

Models¶

Proportional hazards¶

Cox’s model¶

Accelerated failure time¶

Censoring¶

Likelihood under censoring¶

Nelson-Aalen with censoriing¶

Nelson-Aalen: variance with censoring¶

What happens if we ignore censoring?¶

Kaplan-Meier¶

Bias when we ignore censoring¶

Comparing populations¶

Log-rank test¶

Contingency table¶

Comparing populations¶

Log-rank test¶