Normal Theory#

Download#

Multivariate normal#

\[ \mathbb{R}^p \ni X \sim N(\mu, \Sigma) \]

Linear transformations#

\[ AX \sim N(A\mu, A\Sigma A^T) \]

Mahalanobis distance#

  • If \(\Sigma > 0\) then

\[ (X-\mu)^T \Sigma^{-1}(X-\mu) \sim \chi^2_p \]
  • If \(\Sigma\) degenerate,

\[ (X-\mu)^T\Sigma^{\dagger} (X-\mu) \sim \chi^2_{\text{rank}(\Sigma)} \]
  • Defined even if \(X-\mu\) is not in \(\text{row}(\Sigma)\), but first projects onto \(\text{row}(\Sigma)\).

Quadratic forms#

  • What is distribution of \((X-\mu)^TQ(X-\mu)\) (with \(Q\) symmetric)

\[ (X-\mu)^TQ(X-\mu) \overset{D}{=} \sum_{j=1}^p \lambda_j(Q^{1/2}\Sigma Q^{1/2}) W_j \]
  • Above \(W_j \sim \chi^2_1\) independently, \(\lambda_j(A), 1 \leq i \leq p\) are the eigenvalues of \(A\).

  • Here \(Q^{1/2} = UD^{1/2}U^T\) where \(Q=UDU^T\). Could also use any square-root of \(Q\): \(AA^T=Q\)

Normal data matrix (Section 3.3 of MKB)#

Normal data matrix#

\[ \mathbb{R}^{n \times p } \ni X \sim N(M, \Sigma_R \otimes \Sigma_C) \]

Normal data matrix#

  • IID rows will have \(\Sigma_R = I_{n \times n}\)

  • Mean: \(M \in \mathbb{R}^{n \times p}\)

  • Covariance: a Kronecker product

\[ \text{Cov}(a^TXb, c^TXd) = a^T\Sigma_R c \cdot b^T\Sigma_C d \]
  • Not the most general covariance structure – separates operations on rows from operations on columns.

Kronecker product#

  • \(A \otimes B\) a tensor…

  • Can be thought of as a linear map \(\mathbb{R}^{n \times p} \to \mathbb{R}^{n \times p}\)

  • Defined by

\[ \text{Tr}(E_{ij}(A \otimes B)E_{kl}) = A_{ik} \cdot B_{jl} \]
  • \(E_{ij} \in \mathbb{R}^{n \times p}\) is one-hot: selects \((i,j)\) entry of a matrix.

Linear transformations of normal data matrices#

\[ AXB \sim N\left(AMB, (A \Sigma_R A^T) \otimes (B^T \Sigma_C B)\right) \]

Covariance#

\[ \text{Cov}(AXB, A'XB') = (A \Sigma_R (A')^T) \otimes (B^T \Sigma_C (B')) \]

Independence#

\(AXB\) and \(A'XB'\) are independent if either

  • \(A \Sigma_R (A')^T=0\) or

  • \(B^T \Sigma_C B' =0\).

Wishart distribution (Section 3.4 of MKB)#

Wishart distribution#

  • Suppose \(X \sim N(0, I_{n \times n} \otimes \Sigma_{p \times p})\)

  • Define

\[ W = X^TX \sim \text{Wishart}(n, \Sigma) \]
  • Degrees of freedom: \(n\).

  • Standard Wishart: \(\Sigma = I_{p \times p}\).

Operations with Wisharts#

Linear transformation#

  • Let \(W \sim \text{Wishart}(n, \Sigma)\).

\[ AWA^T \sim \text{Wishart}(n, A\Sigma A^T) \]

Addition#

  • Let \(M_1 \sim \text{Wishart}(n_1, \Sigma), M_2 \sim \text{Wishart}(n_2, \Sigma)\) be independent

\[ M_1 + M_2 \sim \text{Wishart}(n_1+n_2, \Sigma) \]

Quadratic forms#

  • Let \(\mathbb{R}^{n \times p} \ni X \sim N(0, I \otimes \Sigma)\)

\[ X^TQX \overset{D}{=} \sum_{j=1}^n \lambda_j(Q) W_j \]
  • Matrices \(W_j \sim \text{Wishart}(1, \Sigma)\) independently.

Special case: projections#

  • If \(P\) is a projection (i.e. \(P^2=P, P=P^T\)) then

\[ X^TPX \sim \text{Wishart}(\text{tr}(P), \Sigma) \]

Example#

  • Let \(R=I - \frac{1}{n}11^T\) be the centering matrix and \(X \sim N(1\mu^T, I \otimes \Sigma)\)

\[ (n-1)\hat{\Sigma} = n \hat{\Sigma}_{MLE} = X^TRX \overset{D}{=} \text{Wishart}(n-1, \Sigma) \]
  • Let \(H=n^{-1}11'\) be the hat matrix for the intercept-only model.

  • Then \(HX, RX\) are independent…