Assignment 1¶
You may discuss homework problems with other students, but you have to prepare the written assignments yourself.
Please combine all your answers, the computer code and the figures into one PDF file submitting it to gradescope.
Grading scheme: 10 points per numbered problem, 20 for remaining 3 problems.
Due date: January 17, 2022, 11:59PM (Monday evening).
Numbered questions from Agresti¶
1.14
1.29
2.4
2.13
2.18
2.27
3.6
3.7
3.16
3.23
Other questions¶
Matched case-control revisited¶
Section 2.1.7 of Agresti describes the sampling scheme used to collect the data in Table 2.5. (For simplicity we will assume that age \(A\) are matched exactly as well as biological sex \(S\).) We also have variables \(C\) for lung cancer status as well as treatment or exposure status \(T\) (smoking in this case). Assume in what follows that we have access to simple random samples from some distribution \(\mathbb{P}\) and we have access to a mechanism for sampling from \(\mathbb{P}\) restricted to a given cancer status, sex and age level \((c,s,a)\). Answer the following:
In the sampling scheme described by Agresti, what distribution of sex and ages would our sampled sexes and ages settle down to if we sample indefinitely?
Let \(N_{ij}\) denote the \((i,j)\) cell of Table 2.5. Suppose we sample \(N \sim \text{Poisson}(\lambda)\) lung cancer patients, express \(N_{ij}\) in terms of \(\mathbb{P}\).
If we sample indefinitely, what will our estimator
settle down to? Express your answer in terms of \(\mathbb{P}\). Is this easily related to the odds-ratio
Comparison of Pearson’s \(X^2\) and likelihood ratio¶
In the multinomial model, suppose \(N_{I \times J} \sim \text{Multinomial}(N, \pi_{I \times J})\) for some fixed \(\pi_{I \times J}\) satisfying the independence model. Show directly that Pearson’s \(X^2\) is asymptotically \((N \to \infty)\) equivalent to the likelihood ratio test statistic in the sense their difference goes to 0 in probability.
Homogeneous association¶
Consider 3 binary random variables \(X, Y, Z\) with law
for some choices of parameters \(\alpha^1=(\alpha_X,\alpha_Y,\alpha_Z)\) and \(\alpha^2=(\alpha^2_{XY},\alpha^2_{XZ},\alpha^2_{YZ})\). Let
denote the conditional odds ratios given \(X=x\). Express it as a function of \((\alpha^1, \alpha^2)\) and the value of \(x\). Show that they do not depend on the value of \(x\).
Similarly show that the similarly defined \(\theta_{XY}(z)\) does not depend on \(z\) nor does \(\theta_{XZ}(y)\) depend on \(y\). (That is: the model satisfies homogeneous association.)
Suppose \(\alpha^2_{XY}=0\). Show that \(X\) and \(Y\) are conditionally independent given \(Z\). Compute the joint distribution for \(Z=0\) and \(Z=1\) as a function of \((\alpha^1,\alpha^2)\).
Suppose we consider more than 3 binary variables: \((B_1, \dots, B_V)\). Generalize the model above to the setting with \(V\) variables. Do the conclusions about marginal homogeneity still hold? What about conditional independence?
Can you generalize the above model to the case that \(X,Y,Z\) are not binary but general categorical variables?
Suppose \(X \sim N(0,\Sigma)\) for some non-singular \(\Sigma\). Let \(\Theta=\Sigma^{-1}\). Show that \(\Theta_{ij}=0\) implies \(X_i\) and \(X_j\) are conditionally independent given \((X_k, k \not \in \{i,j\})\). (Recall for Gaussian data, independence is equivalent to being uncorrelated.) Comment on the relation between this observation to our initial binary model on three variables.