Cox (proportional hazards) model¶

Models the hazard rate

\[ h_{\beta}(t;x) = \exp(x^T\beta) \cdot h_0(t) \]

with baseline hazard rate $h_0$.

Given a death has occured at time $t$ what are the relative chances that individual $i$ died rather than individual $k$?
This should be the ratio $$ \frac{h_{\beta}(t;X_i)}{h_{\beta}(t;X_k)} \propto \exp((X_i-X_k)^T\beta) $$
Baseline hazard $h_0(t)$ cancels!

Partial likelihood¶

Therefore, given individual $i$ dies at some time $t$ the contribution to the (partial) likelihood is

\[ \frac{\exp(X_i^T\beta)}{\sum_{j \in R(t)} \exp(X_j^T\beta)} \]

As the death times are actual observed times for individuals with $\delta_i=1$, this yields something like a likelihood (recall when $\delta_i=1, O_i=T_i)$

\[ \prod_{i: \delta_i=1} \left(\frac{\exp(X_i^T\beta)}{\sum_{j \in R(T_i)} \exp(X_j^T\beta)} \right) = \prod_{i: \delta_i=1} \left(\frac{\exp(X_i^T\beta)}{\sum_{j: O_j \geq O_i} \exp(X_j^T\beta)} \right) \]

This is effectively the conditional likelihood given the set of observed failure times (under proportional hazards assumption).

Partial likelihood when no ties¶

When there is no censoring, this really is the conditional likelihood.
Let’s just consider $n=2$ with no censoring.
Let $R(T_1,T_2)$ denote the ranking of our two times. Under the Cox model its distribution depends on $\beta$, more precisely on $\eta=(\eta_1,\eta_2)=(X_1'\beta,X_2'\beta)$.
Let’s compute (working in density context)

\[\begin{split} \begin{aligned} P_{\beta}(T_1<T_2) &= \int_0^{\infty} \left[\int_{t_1}^{\infty} e^{\eta_2} h_0(t_2) e^{-e^{\eta_2} H_0(t_2)} \; dt_2 \right] \\ & \qquad \cdot e^{\eta_1} h_0(t_1) e^{-e^{\eta_2} H_0(t_1)} \; dt_1. \end{aligned} \end{split}\]

Inner integral¶

\[ \int_{t_1}^{\infty} e^{\eta_2} h_0(t_2) e^{-e^{\eta_2} H_0(t_2)} \; dt_2 = e^{-e^{\eta_2} H_0(t_1)}. \]

Final integral¶

\[ \int_0^{\infty} e^{\eta_1} h_0(t) e^{-(e^{-\eta_1}+e^{-\eta_2}) H_0(t_1)} \; dt_1 = \frac{e^{\eta}_1}{e^{\eta_1}+e^{\eta_2}} \]

Similar (but tedious) calculation yields case for $n>2$ (again with no censoring)

brainCancer = read.table(
    'https://www.stanford.edu/class/stats305b/data/BrainCancer.csv',
    sep=',',
    header=TRUE)

library(survival)
M = coxph(Surv(time, status) ~ sex, data=brainCancer)
summary(M) # why no intercept?

Call:
coxph(formula = Surv(time, status) ~ sex, data = brainCancer)

  n= 88, number of events= 35 

          coef exp(coef) se(coef)     z Pr(>|z|)
sexMale 0.4077    1.5033   0.3420 1.192    0.233

        exp(coef) exp(-coef) lower .95 upper .95
sexMale     1.503     0.6652     0.769     2.939

Concordance= 0.565  (se = 0.045 )
Likelihood ratio test= 1.44  on 1 df,   p=0.2
Wald test            = 1.42  on 1 df,   p=0.2
Score (logrank) test = 1.44  on 1 df,   p=0.2

sqrt(diag(vcov(M)))

sexMale: 0.34200423243715

table(factor(brainCancer$diagnosis))

 HG glioma  LG glioma Meningioma      Other 
        22          9         42         14 

subs = !is.na(brainCancer$diagnosis)
M = coxph(Surv(time, status) ~ sex, data=brainCancer, subset=subs) # one missing diagnosis
M2 = coxph(Surv(time, status) ~ sex + diagnosis, data=brainCancer, subset=subs)
anova(M, M2)
summary(M2)

A anova: 2 × 4
	loglik	Chisq	Df	P(>\|Chi\|)
	<dbl>	<dbl>	<int>	<dbl>
1	-136.5441	NA	NA	NA
2	-125.6802	21.72773	3	7.431759e-05

Call:
coxph(formula = Surv(time, status) ~ sex + diagnosis, data = brainCancer, 
    subset = subs)

  n= 87, number of events= 35 

                       coef exp(coef) se(coef)      z Pr(>|z|)    
sexMale              0.0353    1.0359   0.3617  0.098   0.9223    
diagnosisLG glioma  -1.1108    0.3293   0.5632 -1.972   0.0486 *  
diagnosisMeningioma -1.9822    0.1378   0.4382 -4.524 6.07e-06 ***
diagnosisOther      -1.1891    0.3045   0.5113 -2.325   0.0200 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                    exp(coef) exp(-coef) lower .95 upper .95
sexMale                1.0359     0.9653   0.50990    2.1046
diagnosisLG glioma     0.3293     3.0368   0.10919    0.9931
diagnosisMeningioma    0.1378     7.2586   0.05837    0.3252
diagnosisOther         0.3045     3.2842   0.11177    0.8295

Concordance= 0.749  (se = 0.038 )
Likelihood ratio test= 23.51  on 4 df,   p=1e-04
Wald test            = 23.62  on 4 df,   p=1e-04
Score (logrank) test = 29.69  on 4 df,   p=6e-06

Score¶

Differentiating negative log-likelihood yields

\[ \sum_{i:\delta_i=1} \left(\frac{\sum_{j: O_j \geq O_i} X_j \exp(X_j^T\beta)}{\sum_{j: O_j \geq O_i} \exp(X_j^T\beta)} - X_i \right) \]

Each first term is an expectation of $X$ w.r.t. a distribution defined over the risk set with weights $\exp(X_j^T\beta)$.
Could write

\[ \sum_{i:\delta_i=1} \left( E_{\beta,i}[X] - X_i \right) \]

Hessian¶

Differentiating negative log-likelihood again yields $$ \sum_{i:\delta_i=1} \text{Var}_{\beta,i}[X] $$ with $$ \begin{aligned} \text{Var}_{\beta,i}[X] &= \frac{\sum_{j: O_j \geq O_i} X_jX_j^T \exp(X_j^T\beta)}{\sum_{j:O_j \geq O_i} \exp(X_j^T\beta)} - E_{\beta,i}[X] E_{\beta,i}[X]^T \\ &= E_{\beta,i}[XX^T] - E_{\beta,i}[X] E_{\beta,i}[X]^T. \end{aligned} $$

Relation between score and log-rank test¶

Suppose $X$ is a binary indicator (i.e. sex in our brainCancer data) with, say $X_i=1$ denoting female.
At an event time $T_i$, $X_i$ is the indicator that the death was a female death at that time.
Under $H_0:\beta=0$, $E_{0,i}[X]$ is the expected number of female deaths at time $T_i$:

\[ E_{0,i}[X] = \frac{d_i \cdot \sum_{j:O_j \geq O_i} X_i}{ \sum_{j:O_j \geq O_i} 1 } = \frac{d_i \cdot \# R_1(T_i)}{\# R(T_i)} \]

In other words, $E_{0,i}$ is the hypergeometric distribution on the $i$-th table of the log-rank test!

Standard optimality of score / LRT indicates that the log-rank tests will perform well against alternatives that are proportional hazards.
This connection to Cox score provides a clearer picture of what low-dimensional log-rank is testing.

Variations of log-rank test¶

Weighted log-rank¶

\[ \frac{\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])}{\left(\sum_{i:\delta_i=1} W(T_i)^2 \text{Var}_{0,i}[X]\right)^{1/2}} \]

Multivariate for $K$ groups:¶

\[ \left(\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])\right)^T \left(\sum_{i:\delta_i=1} W(T_i)^2 \text{Var}_{0,i}[X]\right)^{-1} \left(\sum_{i:\delta_i=1} W(T_i)(X_i - E_{0,i}[X])\right) \overset{H_0}{\approx} \chi^2_{K-1}. \]

Examples of weights $W(T_i) = \widehat{S}_P(T_i)^p(1-\widehat{S}_P(T_i))^q$ for pooled survival; $W(T_i)=(\# R(T_i))^{\alpha}$…

survdiff(Surv(time, status) ~ sex, data=brainCancer)
summary(coxph(Surv(time, status) ~ sex, data=brainCancer))

Call:
survdiff(formula = Surv(time, status) ~ sex, data = brainCancer)

            N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female 45       15     18.5     0.676      1.44
sex=Male   43       20     16.5     0.761      1.44

 Chisq= 1.4  on 1 degrees of freedom, p= 0.2 

Call:
coxph(formula = Surv(time, status) ~ sex, data = brainCancer)

  n= 88, number of events= 35 

          coef exp(coef) se(coef)     z Pr(>|z|)
sexMale 0.4077    1.5033   0.3420 1.192    0.233

        exp(coef) exp(-coef) lower .95 upper .95
sexMale     1.503     0.6652     0.769     2.939

Concordance= 0.565  (se = 0.045 )
Likelihood ratio test= 1.44  on 1 df,   p=0.2
Wald test            = 1.42  on 1 df,   p=0.2
Score (logrank) test = 1.44  on 1 df,   p=0.2

Using weight $W(t)=\hat{S}_P(t)$

survdiff(Surv(time, status) ~ sex, 
         data=brainCancer,
         rho=1)

Call:
survdiff(formula = Surv(time, status) ~ sex, data = brainCancer, 
    rho = 1)

            N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female 45     11.3     14.4     0.661      1.73
sex=Male   43     16.0     13.0     0.735      1.73

 Chisq= 1.7  on 1 degrees of freedom, p= 0.2 

An example with $K=3$¶

survdiff(Surv(time, status) ~ diagnosis, data=brainCancer)

Call:
survdiff(formula = Surv(time, status) ~ diagnosis, data = brainCancer)

n=87, 1 observation deleted due to missingness.

                      N Observed Expected (O-E)^2/E (O-E)^2/V
diagnosis=HG glioma  22       17     5.69   22.4456   27.5457
diagnosis=LG glioma   9        4     3.75    0.0167    0.0188
diagnosis=Meningioma 42        9    20.26    6.2583   15.2537
diagnosis=Other      14        5     5.30    0.0165    0.0196

 Chisq= 29.7  on 3 degrees of freedom, p= 2e-06 

summary(coxph(Surv(time, status) ~ diagnosis, data=brainCancer))

Call:
coxph(formula = Surv(time, status) ~ diagnosis, data = brainCancer)

  n= 87, number of events= 35 
   (1 observation deleted due to missingness)

                       coef exp(coef) se(coef)      z Pr(>|z|)    
diagnosisLG glioma  -1.1072    0.3305   0.5620 -1.970   0.0488 *  
diagnosisMeningioma -1.9937    0.1362   0.4221 -4.723 2.32e-06 ***
diagnosisOther      -1.1872    0.3051   0.5109 -2.324   0.0201 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                    exp(coef) exp(-coef) lower .95 upper .95
diagnosisLG glioma     0.3305      3.026   0.10985    0.9943
diagnosisMeningioma    0.1362      7.342   0.05955    0.3115
diagnosisOther         0.3051      3.278   0.11207    0.8305

Concordance= 0.725  (se = 0.044 )
Likelihood ratio test= 23.5  on 3 df,   p=3e-05
Wald test            = 23.61  on 3 df,   p=3e-05
Score (logrank) test = 29.68  on 3 df,   p=2e-06

Using weight $W(t)=\hat{S}_P(t)$

survdiff(Surv(time, status) ~ diagnosis,
         data=brainCancer,
         rho=1)

Call:
survdiff(formula = Surv(time, status) ~ diagnosis, data = brainCancer, 
    rho = 1)

n=87, 1 observation deleted due to missingness.

                      N Observed Expected (O-E)^2/E (O-E)^2/V
diagnosis=HG glioma  22    14.13     4.73  18.71252   27.5305
diagnosis=LG glioma   9     2.72     2.88   0.00869    0.0122
diagnosis=Meningioma 42     6.53    15.41   5.11289   14.8544
diagnosis=Other      14     3.86     4.23   0.03263    0.0474

 Chisq= 29.3  on 3 degrees of freedom, p= 2e-06