Onesample#

Download#

One sample problem#

One sample problem#

library(MVT)
data(examScor)
mu_hat = apply(examScor, 2, mean)
mu_hat
Loading required package: fastmatrix
mechanics
38.9545454545455
vectors
50.5909090909091
algebra
50.6022727272727
analysis
46.6818181818182
statistics
42.3068181818182
  • Suppose I want to test \(H_0:\mu=(50,50,50,50,50)\).

Choosing a statistic#

  • A natural choice? How should we standardize?

null_mean = rep(50, 5) # mean of 50 on each score
sample_mean = apply(examScor, 2, mean)
T = sum((null_mean-sample_mean)^2)
T
192.909349173554

Gaussian model#

  • Common statistical model:

\[ X_i = X[i] \overset{IID}{\sim} N(\mu, \Sigma), \qquad 1 \leq i \leq n \]
  • Density:

\[ f(x) = \det(2\pi\Sigma)^{-1/2} \exp\left((x-\mu)^T\Sigma^{-1}(x-\mu)\right) \]
  • MGF:

\[ E[e^{a^TX}] = \exp\left(a^T\mu + \frac{1}{2} a^T\Sigma a \right) \]
  • Assume \(\Sigma>0\) known to start

Likelihood#

  • Likelihood is

\[\begin{split} \begin{aligned} -\log L(\mu | X) &= \sum_{i=1}^n \frac{1}{2}(X_i-\mu)^T\Sigma^{-1}(X_i-\mu) \\ &= \frac{1}{2} \text{Tr}\left(\Sigma^{-1}(X-1\mu^T)^T(X-1\mu^T)\right) \\ \end{aligned} \end{split}\]
  • MLE does not depend on \(\Sigma\). Why?


  • Enough to solve MLE when \(\Sigma=I\)

\[\begin{split} \begin{aligned} \hat{\mu} &= \text{argmin}_{\mu} \|X-1\mu^T\|_F^2 \\ &= \text{argmin}_{\mu} \sum_{j=1}^p \|X[,j]-\mu_j \cdot 1\|_2^2 \\ \end{aligned} \end{split}\]
  • So,

\[ \hat{\mu} = \bar{X} \sim N\left(\mu, n^{-1}\Sigma\right) \]

Testing \(H_0: \mu=\mu_0\)#

  • Log-likelihood under \(H_0\)

\[ -\frac{1}{2} \text{Tr}\left(\Sigma^{-1}(X-\mu_0 1^T)^T(X-\mu_0 1^T)\right) \]
  • Log-likelihood under \(H_a\)

\[ -\frac{1}{2} \text{Tr}\left(\Sigma^{-1}\left(X^T\left(I-\frac{1}{n}11^T\right) \right) X \right) \]
  • Difference

\[\begin{split} \begin{aligned} \frac{1}{2} \text{Tr}\left(\Sigma^{-1}((X-\mu_0 1)^T \left[\frac{1}{n}11^T \right](X-\mu_0 1^T) \right) &= \frac{n}{2} \text{Tr}\left(\Sigma^{-1}(\hat{\mu}-\mu_0)(\hat{\mu}-\mu_0)^T\right) \\ &= \frac{n}{2} (\hat{\mu}-\mu_0)^T\Sigma^{-1}(\hat{\mu}-\mu_0) \end{aligned} \end{split}\]

Mahalanobis distance#

  • Given \(\Sigma > 0\)

\[ d_{\Sigma}(x,y) = (x-y)^T\Sigma^{-1}(x-y) \]
  • LRT with \(\Sigma\) known is

\[ d_{n^{-1}\Sigma}(\hat{\mu}, \mu_0) \]
  • Claim: \(Z \sim N(\mu,\Sigma) \implies d_{\Sigma}(Z, \mu) \sim \chi^2_p\)

Unknown \(\Sigma\)#

  • What if we don’t know \(\Sigma\)?

  • Univariate case \(p=1\), MLE is

\[\begin{split} \begin{aligned} \hat{\sigma}^2_{MLE} &= \frac{1}{n} \sum_{i=1}^n (X_i-\bar{X})^2 \\ &= \frac{1}{n} X^T\left(I-\frac{1}{n}11^T\right)X \\ \end{aligned} \end{split}\]
  • Also, we know \(n \cdot \hat{\sigma}^2_{MLE} \sim \sigma^2 \cdot \chi^2_{n-1}\).

  • What is analog for \(p>1\)?

Maximizing likelihood#

  • With \(\Sigma\) unknown, after maximizing over \(\mu\) the (profile) likelihood for \(\Sigma\) is

\[ -\frac{n}{2} \log \det(2\pi\Sigma) - \frac{1}{2} \text{Tr}\left(\Sigma^{-1}\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)\right) \]
  • Introducing precision matrix \(\Theta=\Sigma^{-1}\) likelihood is

\[ \frac{n}{2} \log \det(\Theta) - \frac{1}{2} \text{Tr}\left(\Theta\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)\right) \]

  • Differentiate w.r.t. \(\Theta\) using:

\[ \nabla \log \det(\Theta) = \Theta^{-1} \]
  • Yields

\[ (\hat{\Sigma}_{MLE})_{p \times p} = \hat{\Theta}^{-1}_{MLE} = \frac{1}{n} X^T\left(I-\frac{1}{n}11^T\right)X \]
  • A sum of squares matrix

Wishart distribution#

  • Suppose \(Z_i \overset{IID}{\sim} N(0, \Sigma), 1 \leq i \leq k\)

\[ Z^TZ \overset{\text{def}}{\sim} \textrm{Wishart}(k, \Sigma) \]
  • Claim:

\[ n \cdot \hat{\Sigma}_{MLE} \sim \textrm{Wishart}(n-1, \Sigma) \]
  • Unbiased estimate: (using fact \(E[Z^TZ]=k\Sigma\))

\[ \hat{\Sigma} = \frac{1}{n-1} X^T\left(I - \frac{1}{n}11^T\right)X \]
  • Analogous to univariate case except sums of squares are matrices!

Estimate of covariance#

Sigma_hat = cov(examScor) # unbiased estimate
Sigma_hat
A matrix: 5 × 5 of type dbl
mechanicsvectorsalgebraanalysisstatistics
mechanics305.7680127.22257101.57941106.27273117.40491
vectors127.2226172.84222 85.15726 94.67294 99.01202
algebra101.5794 85.15726112.88597112.11338121.87056
analysis106.2727 94.67294112.11338220.38036155.53553
statistics117.4049 99.01202121.87056155.53553297.75536

Back to test#

  • W.l.o.g. we take \(\mu_0=0\) below

  • Under \(H_0\), maximized log-likelihood is

\[\begin{split} \begin{aligned} - \frac{n}{2} \log \det \left(\hat{\Sigma}_{0,MLE} \right) &= - \frac{n}{2} \log \det \left(\frac{1}{n}(X-1\mu_0)^T(X-1\mu_0^T)\right) \\ &= - \frac{n}{2} \log \det \left(\frac{1}{n}X^TX\right) \\ \end{aligned} \end{split}\]
  • Under \(H_a\), we claim maximized log-likelihood is

\[\begin{split} \begin{aligned} - \frac{n}{2} \log \det \left(\hat{\Sigma}_{MLE} \right) &= - \frac{n}{2} \log \det \left(\frac{1}{n}X^T\left(I-\frac{1}{n}11^T\right)X\right) \\ \end{aligned} \end{split}\]

  • LRT is based on

\[\begin{split} \begin{aligned} n \cdot \log \det \left(X^TX\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)^{-1}\right) &= n \cdot \log \det(I + \hat{\mu}\hat{\mu}^T \hat{\Sigma}_{MLE}^{-1}) \\ &= n \cdot \log(1 + \hat{\mu}^T\hat{\Sigma}_{MLE}^{-1}\hat{\mu}) \end{aligned} \end{split}\]
  • Spring 2025: Initial version had a mistaken \(n\)

Hotelling’s \(T^2\)#

  • Suppose \(W \sim \text{Wishart}(k, \Sigma)\), independent of \(Z \sim N(0, \Sigma)\) with \(\mu \in \mathbb{R}^p, \Sigma \in \mathbb{R}^{p \times p } > 0\)

  • The random variable

\[ Z^T(k^{-1}W)^{-1}Z \sim T^2_k \]
  • As \(k \to \infty\), \(T^2_k \overset{D}{\to} \chi^2_p\).

  • In one-sample problem, LRT equivalent to

\[ T^2 = d_{n^{-1}\hat{\Sigma}}(\hat{\mu}, \mu_0) \]

The “right” statistic#

mu_0 = rep(50, 5)
n = nrow(examScor)
T2 = sum((mu_hat - null_mean) * (solve(Sigma_hat / n) %*%
         (mu_hat - null_mean)))
T2
101.957411658211

Recap of one-sample problem#

  • Estimation of mean structure – independent of \(\Sigma\). Leads to uncoupled estimation of \(\mu\).

  • Estimates of \(\Sigma\) involve sum-of-squared error matrix.

  • Wishart distribution is analog of \(\chi^2\).

  • Hotelling’s \(T^2\) is the LRT with \(\Sigma\) unknown. Analogous to Student’s \(T\) in univariate case.

  • While multivariate normal model is likely simplification, using the likelihood provides reasonably intuitive methods and tests.

  • Distribution theory of LRT can get hairy, though in this case

\[ T^2 \overset{D}{=} C_{n, p} \cdot F_{p, n-p} \]