Onesample

Onesample#

Download#

One sample problem#

library(MVT)
data(examScor)
mu_hat = apply(examScor, 2, mean)
mu_hat

Loading required package: fastmatrix

mechanics: 38.9545454545455
vectors: 50.5909090909091
algebra: 50.6022727272727
analysis: 46.6818181818182
statistics: 42.3068181818182

Suppose I want to test \(H_0:\mu=(50,50,50,50,50)\).

Choosing a statistic#

A natural choice? How should we standardize?

null_mean = rep(50, 5) # mean of 50 on each score
sample_mean = apply(examScor, 2, mean)
T = sum((null_mean-sample_mean)^2)
T

192.909349173554

Gaussian model#

Common statistical model:

\[ X_i = X[i] \overset{IID}{\sim} N(\mu, \Sigma), \qquad 1 \leq i \leq n \]

Density:

\[ f(x) = \det(2\pi\Sigma)^{-1/2} \exp\left((x-\mu)^T\Sigma^{-1}(x-\mu)\right) \]

MGF:

\[ E[e^{a^TX}] = \exp\left(a^T\mu + \frac{1}{2} a^T\Sigma a \right) \]

Assume \(\Sigma>0\) known to start

Likelihood#

Likelihood is

\[\begin{split} \begin{aligned} -\log L(\mu | X) &= \sum_{i=1}^n \frac{1}{2}(X_i-\mu)^T\Sigma^{-1}(X_i-\mu) \\ &= \frac{1}{2} \text{Tr}\left(\Sigma^{-1}(X-1\mu^T)^T(X-1\mu^T)\right) \\ \end{aligned} \end{split}\]

MLE does not depend on \(\Sigma\). Why?

Enough to solve MLE when \(\Sigma=I\)

\[\begin{split} \begin{aligned} \hat{\mu} &= \text{argmin}_{\mu} \|X-1\mu^T\|_F^2 \\ &= \text{argmin}_{\mu} \sum_{j=1}^p \|X[,j]-\mu_j \cdot 1\|_2^2 \\ \end{aligned} \end{split}\]

So,

\[ \hat{\mu} = \bar{X} \sim N\left(\mu, n^{-1}\Sigma\right) \]

Testing \(H_0: \mu=\mu_0\)#

Log-likelihood under \(H_0\)

\[ -\frac{1}{2} \text{Tr}\left(\Sigma^{-1}(X-\mu_0 1^T)^T(X-\mu_0 1^T)\right) \]

Log-likelihood under \(H_a\)

\[ -\frac{1}{2} \text{Tr}\left(\Sigma^{-1}\left(X^T\left(I-\frac{1}{n}11^T\right) \right) X \right) \]

Difference

\[\begin{split} \begin{aligned} \frac{1}{2} \text{Tr}\left(\Sigma^{-1}((X-\mu_0 1)^T \left[\frac{1}{n}11^T \right](X-\mu_0 1^T) \right) &= \frac{n}{2} \text{Tr}\left(\Sigma^{-1}(\hat{\mu}-\mu_0)(\hat{\mu}-\mu_0)^T\right) \\ &= \frac{n}{2} (\hat{\mu}-\mu_0)^T\Sigma^{-1}(\hat{\mu}-\mu_0) \end{aligned} \end{split}\]

Mahalanobis distance#

Given \(\Sigma > 0\)

\[ d_{\Sigma}(x,y) = (x-y)^T\Sigma^{-1}(x-y) \]

LRT with \(\Sigma\) known is

\[ d_{n^{-1}\Sigma}(\hat{\mu}, \mu_0) \]

Claim: \(Z \sim N(\mu,\Sigma) \implies d_{\Sigma}(Z, \mu) \sim \chi^2_p\)

Unknown \(\Sigma\)#

What if we don’t know \(\Sigma\)?
Univariate case \(p=1\), MLE is

\[\begin{split} \begin{aligned} \hat{\sigma}^2_{MLE} &= \frac{1}{n} \sum_{i=1}^n (X_i-\bar{X})^2 \\ &= \frac{1}{n} X^T\left(I-\frac{1}{n}11^T\right)X \\ \end{aligned} \end{split}\]

Also, we know \(n \cdot \hat{\sigma}^2_{MLE} \sim \sigma^2 \cdot \chi^2_{n-1}\).
What is analog for \(p>1\)?

Maximizing likelihood#

With \(\Sigma\) unknown, after maximizing over \(\mu\) the (profile) likelihood for \(\Sigma\) is

\[ -\frac{n}{2} \log \det(2\pi\Sigma) - \frac{1}{2} \text{Tr}\left(\Sigma^{-1}\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)\right) \]

Introducing precision matrix \(\Theta=\Sigma^{-1}\) likelihood is

\[ \frac{n}{2} \log \det(\Theta) - \frac{1}{2} \text{Tr}\left(\Theta\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)\right) \]

Differentiate w.r.t. \(\Theta\) using:

\[ \nabla \log \det(\Theta) = \Theta^{-1} \]

Yields

\[ (\hat{\Sigma}_{MLE})_{p \times p} = \hat{\Theta}^{-1}_{MLE} = \frac{1}{n} X^T\left(I-\frac{1}{n}11^T\right)X \]

A sum of squares matrix…

Wishart distribution#

Suppose \(Z_i \overset{IID}{\sim} N(0, \Sigma), 1 \leq i \leq k\)

\[ Z^TZ \overset{\text{def}}{\sim} \textrm{Wishart}(k, \Sigma) \]

Claim:

\[ n \cdot \hat{\Sigma}_{MLE} \sim \textrm{Wishart}(n-1, \Sigma) \]

Unbiased estimate: (using fact \(E[Z^TZ]=k\Sigma\))

\[ \hat{\Sigma} = \frac{1}{n-1} X^T\left(I - \frac{1}{n}11^T\right)X \]

Analogous to univariate case except sums of squares are matrices!

Estimate of covariance#

Sigma_hat = cov(examScor) # unbiased estimate
Sigma_hat

A matrix: 5 × 5 of type dbl
	mechanics	vectors	algebra	analysis	statistics
mechanics	305.7680	127.22257	101.57941	106.27273	117.40491
vectors	127.2226	172.84222	85.15726	94.67294	99.01202
algebra	101.5794	85.15726	112.88597	112.11338	121.87056
analysis	106.2727	94.67294	112.11338	220.38036	155.53553
statistics	117.4049	99.01202	121.87056	155.53553	297.75536

Back to test#

W.l.o.g. we take \(\mu_0=0\) below
Under \(H_0\), maximized log-likelihood is

\[\begin{split} \begin{aligned} - \frac{n}{2} \log \det \left(\hat{\Sigma}_{0,MLE} \right) &= - \frac{n}{2} \log \det \left(\frac{1}{n}(X-1\mu_0)^T(X-1\mu_0^T)\right) \\ &= - \frac{n}{2} \log \det \left(\frac{1}{n}X^TX\right) \\ \end{aligned} \end{split}\]

Under \(H_a\), we claim maximized log-likelihood is

\[\begin{split} \begin{aligned} - \frac{n}{2} \log \det \left(\hat{\Sigma}_{MLE} \right) &= - \frac{n}{2} \log \det \left(\frac{1}{n}X^T\left(I-\frac{1}{n}11^T\right)X\right) \\ \end{aligned} \end{split}\]

LRT is based on

\[\begin{split} \begin{aligned} n \cdot \log \det \left(X^TX\left(X^T\left(I-\frac{1}{n}11^T\right)X\right)^{-1}\right) &= n \cdot \log \det(I + \hat{\mu}\hat{\mu}^T \hat{\Sigma}_{MLE}^{-1}) \\ &= n \cdot \log(1 + \hat{\mu}^T\hat{\Sigma}_{MLE}^{-1}\hat{\mu}) \end{aligned} \end{split}\]

Spring 2025: Initial version had a mistaken \(n\)…

Hotelling’s \(T^2\)#

Suppose \(W \sim \text{Wishart}(k, \Sigma)\), independent of \(Z \sim N(0, \Sigma)\) with \(\mu \in \mathbb{R}^p, \Sigma \in \mathbb{R}^{p \times p } > 0\)
The random variable

\[ Z^T(k^{-1}W)^{-1}Z \sim T^2_k \]

As \(k \to \infty\), \(T^2_k \overset{D}{\to} \chi^2_p\).
In one-sample problem, LRT equivalent to

\[ T^2 = d_{n^{-1}\hat{\Sigma}}(\hat{\mu}, \mu_0) \]

The “right” statistic#

mu_0 = rep(50, 5)
n = nrow(examScor)
T2 = sum((mu_hat - null_mean) * (solve(Sigma_hat / n) %*%
         (mu_hat - null_mean)))
T2

101.957411658211

Recap of one-sample problem#

Estimation of mean structure – independent of \(\Sigma\). Leads to uncoupled estimation of \(\mu\).
Estimates of \(\Sigma\) involve sum-of-squared error matrix.
Wishart distribution is analog of \(\chi^2\).
Hotelling’s \(T^2\) is the LRT with \(\Sigma\) unknown. Analogous to Student’s \(T\) in univariate case.
While multivariate normal model is likely simplification, using the likelihood provides reasonably intuitive methods and tests.
Distribution theory of LRT can get hairy, though in this case

\[ T^2 \overset{D}{=} C_{n, p} \cdot F_{p, n-p} \]

Onesample

Contents

Onesample#

Download#

One sample problem#

One sample problem#

Choosing a statistic#

Gaussian model#

Likelihood#

Testing \(H_0: \mu=\mu_0\)#

Mahalanobis distance#

Unknown \(\Sigma\)#

Maximizing likelihood#

Wishart distribution#

Estimate of covariance#

Back to test#

Hotelling’s \(T^2\)#

The “right” statistic#

Recap of one-sample problem#