
Many applied settings use PCA simply to visualize the data.
For this, typically only the first few PCs are used. No need to really pick a rank…
Suppose we believe \(X \sim N(0, I \otimes \Sigma)\).
If we think \(\Sigma = \sigma^2 I + VV^T\) then we might want to decide if \(V=0\) or not… what rank is \(V\)?
Note: \(\text{rank}(V)\) is not the rank of \(\Sigma\)…
The matrix \(X\) may show up later in downstream analyses using its PCs for regression. \(\implies\) can use CV to pick ranks. number of PCs.
We may want to fit a generative model to \(X\) itself: probabilistic PCA of Bishop & Tittering (1999)
\[ \hat{\mu} = \sum_i 1_{d_i > c} d_i \cdot u_i v_i^T \]
Suppose \(X_{n \times p} \sim N(0, I_{n \times n} \otimes \Sigma)\)
Often (e.g. as presented in MKB) assumes \(n \to \infty\) and \(p\) essentially fixed.
Allows (with extra conditions) for MLE analysis of eigendecomposition of \(\Sigma=U\Lambda U^T\)
Notes there is ambiguity when there are ties in \(\Lambda\): variance of MLE of \(U[,i]\) depends on gaps \(\lambda_{i-1}-\lambda_i\) and \(\lambda_i - \lambda_{i+1}\)…
Could it be used for testing \(H_0:\Sigma=\sigma^2 \cdot I\) vs. \(H_a:\Sigma=\sigma^2 \cdot I + vv^T\) for \(v \in \mathbb{R}^p\)? (wlog \(\sigma^2=1\) going forward)
We know LRT here would be based on \(\log \det(\hat{\Sigma})\).
As all gaps are 0 under the null, at face value this is a context where much of \(\Sigma\) is poorly estimable…
\[ d_1 = \max_{u \in \mathbb{R}^p} \frac{1}{n} u'X'Xu \]
Set \(p/n=\gamma\) and consider the eigenvalues of \(X \sim N(0, I_{n \times n} \otimes I_{p \times p})\).
The population eigenvalues of \(\mathbb{E}[\Sigma]\) are all (essentially) 1…
What about the sample eigenvalues?
\[ \frac{1}{p} \sum_{j=1}^p \delta_{\lambda_i(W)/n} \]
\[ \frac{d_1/n - \mu_{n,p}}{\sigma_{n,p}} \Rightarrow TW_1 \]
\[ \begin{aligned} \mu_{n,p} &= \frac{\left(\sqrt{n-1} + \sqrt{p}\right)^2}{n} \\ & \approx (1 + \sqrt{\gamma})^2 \\ \sigma_{n,p} &= \frac{\left(\sqrt{n-1} + \sqrt{p}\right)}{n} \left(\frac{1}{\sqrt{n-1}} + \frac{1}{\sqrt{p}}\right)^{1/3} \\ &\approx n^{-2/3} (1 + \sqrt{\gamma})\left(1 + \frac{1}{\sqrt{\gamma}} \right)^{1/3}. \end{aligned} \]
library(RMTstat)
X = matrix(rnorm(500), 50, 10)
eigvals_hat = svd(X)$d^2 / (nrow(X) - 1)
P_c = diag(rep(1, 50)) - matrix(1, 50, 50) / 50
Sigma_hat = t(X) %*% P_c %*% X / (nrow(X) - 1)
noise_level = sum(diag(Sigma_hat)) / ncol(X)
pWishartMax(eigvals_hat[1], nrow(X) - 1, ncol(X), var=noise_level, lower.tail=FALSE)[1] 0.1194141
Given a test statistic for the largest eigenvalue, there’s a natural “forward stepwise” procedure: starting at \(k=0\)
Compare \(d_{k+1}\) to appropriate quantile (\(1-\alpha\) ?) of Tracy-Widom (\(TW_1\)).
If larger, then \(k += 1\) and return to 1. Else stop.
Basis of pseudorank of Krichtman and Nadler (2008)
A little more refined (and involved) model Choi et al. (2017)
\[ \hat{\mu} = \sum_{i:d_i > c} d_i \cdot u_i v_i^T \]
\[ \|\hat{\mu} - \mu\|^2_F \]
In testing context, BPP phenomenon says only large enough (properly scaled) eigenvalues will surpass the edge \((1 + \sqrt{\gamma})^2\)
Suggests hard thresholding eigenvalues at \((1 + \sqrt{\gamma})^2\): a formal elbow rule….
Gavish and Donoho (2013) show that this choice is not asymptotically minimax optimal one in terms of MSE even if it can be used to properly determine the rank of \(V\) in \(H_a:I + VV'\)…
This work has been extended beyond standard Wishart in ScreeNOT (2022)
Rough approach: given the limiting spectrum of “noise”, they can derive an asymptotically optimal thresholding function.



We talked about models of the form \(I + VV'\) as alternatives in testing \(H_0:\Sigma=I\).
This model is essentially Probabilistic PCA: \(q < p\)
\[ \begin{aligned} Z & \sim N(0, I_{q \times q}) \\ X|Z &\sim N(VZ, \sigma^2 I_{p \times p}) \end{aligned} \]
It models a low-rank structure within the covariance, not as the the covariance.
It gives us a likelihood, this allows:
AIC / BIC / CV for rank selection
Mixtures, Bayesian modeling etc (at least in theory)
Latent scores \(Z|X=x\)
Let \(\hat{\Sigma}_{MLE} = U D U'\).
Then, estimate \(\sigma^2\) as
\[ \hat{\sigma}^2 = \frac{1}{p-k} \sum_{i=k+1}^p d_i^2 \]
\[ \hat{V} = U_k \text{diag}(d_k-\hat{\sigma}^2)^{1/2} R \]
where \(R_{k \times k}\) is an arbitrary rotation.
\[ \frac{np}{2} \log \sigma^2 + \sum_{i=1}^n \left[\frac{1}{2\sigma^2} \|X_i - VZ_i\|^2_2 + \frac{1}{2} \|Z_i\|^2_2 \right] \]
\[ \frac{np}{2} \log \sigma^2 + \sum_{i=1}^n \frac{1}{2 \sigma^2} \left[\|X_i\|^2_2 - 2 \text{Tr}(V\mathbb{E}_{V_c, \sigma^2_c}[Z_i|X_i]X_i') + \text{Tr}(V'V\mathbb{E}_{V_c,\sigma^2_c}[Z_iZ_i'|X_i]) \right] \]
[1] 0.4187987