Topics covered in course¶
Models for discrete data¶
Draw \(X_i \overset{IID}{\sim} F , 1 \leq i \leq N\)
Let \(R(X_i), C(X_i)\) be discrete random variables: presence or absence of a trait; labelling a trait.
Sample can be summarized in table
\(c_1\) |
\(\dots\) |
\(c_J\) |
Row total |
|
|---|---|---|---|---|
\(r_1\) |
\(Y_{11}\) |
\(\dots\) |
\(Y_{1J}\) |
\(Y_{1.}\) |
\(\vdots\) |
\(\dots\) |
\(\dots\) |
\(\dots\) |
\(\dots\) |
\(r_I\) |
\(Y_{I1}\) |
\(\dots\) |
\(Y_{IJ}\) |
\(Y_{I.}\) |
Column total |
\(Y_{.1}\) |
\(\dots\) |
\(Y_{.J}\) |
\(Y_{..}\) |
Distribution described by
Common questions¶
Independence
Homogeneity: when \(I=J\) and values are common
Regression¶
Linear regression¶
Response matrix: \(Y \in \mathbb{R}^{n \times q}\)
Design matrix: \(X \in \mathbb{R}^{n \times p}\)
Usual estimation problem¶
Binary regression models: \(Y \in \{0,1\}^n\)¶
Usual model¶
For some CDF \(F\):
Common choices:
probit: \(F \sim N(0,1)\)logit: \(F(x) = e^x/(1+e^x)\).
Usual estimation problem¶
Asymptotic distribution? Inference?
Survival analysis¶
\(T\) a survival time
Basic object: survival function and hazard¶
Complications: censoring, truncation.
Non-parameteric methods¶
Kaplan-Meier estimator: direct estimate of \(P(T > t)\) based on IID draws of censored observations from \(F\)
Log-rank test
Usual model¶
Cox model: