Factor analysis

Factor analysis#

Download#

Factor analysis#

Our PPCA model

\[ \Sigma = \sigma^2 I + V V^T \]

More generally, we could assume

\[ \Sigma = \Psi + \Gamma \Gamma^T, \qquad \Gamma \in \mathbb{R}^{p \times k} \]

and $\Psi$ a diagonal matrix. (Book uses $\Lambda$ for my $\Gamma$).

Probabilistic PCA vs Factor Analysis#

Suppose $A=\text{diag}(a_1, \dots, a_p)$ then $$ AX \sim N\left(A\mu, A^2 \Psi + A \Gamma \Gamma^TA^T\right) $$
$\implies$ a factor model remains a factor model
$\implies$ not true for PPCA!
If $R$ is an orthogonal rotation then

\[ RX \sim N\left(R\mu, RVV'R' + \sigma^2 I\right) \]

$\implies$ PPCA model is preserved but factor model is not…

Likelihood of factor analysis model#

In the normal model, there is a well-defined likelihood (with $R_c$ we’ve profiled out $\mu$..)

\[ \ell(\Gamma,\Psi|X) = \frac{n}{2} \log \det(2\pi(\Psi + \Gamma \Gamma^T)) + \frac{1}{2} \text{Tr}((\Psi + \Gamma \Gamma^T)^{-1}X^TR_cX) \]

(Negative-log)-likelihood not really amenable to simple algorithms to minimize over $(\Psi,\Gamma)$.
Still, having a log-likelihood opens up AIC / BIC / CV for choosing rank $k$.

Generative model#

Similar to Probabilistic PCA:

\[\begin{split} \begin{aligned} Z & \sim N(0, I_{q \times q}) \\ X|Z &\sim N(VZ, \Psi) \end{aligned} \end{split}\]

Estimation of factor analysis model#

In the population setting, note that if $\Psi$ is known

\[ \Sigma - \Psi = \Gamma \Gamma^T = V \Lambda V^T \]

Hence, we might take $\Gamma = V[,1:k] \Lambda[1:k]^{1/2}$, based on leading rank $k$ eigendecomposition of $\Sigma-\Psi$.

Principal factor analysis for rank $k$#

Start with initial estimate $\widehat{\Psi}^{(1)}$ and covariance (or correlation) matrix $\widehat{\Sigma}$.
For $i=1,\dots$, run rank $k$ eigendecomposition on $\widehat{\Sigma}-\widehat{\Psi}^{(i)}$ yielding

\[ \widehat{\Gamma}^{(i)} = V^{(i)}[,1:k] D^{(i)\, {1/2}}[1:k] \]

Set

\[ \widehat{\Psi}^{(i+1)} = \text{diag}\left(\widehat{\Sigma} - \widehat{\Gamma}^{(i)} \widehat{\Gamma}^{(i)\, T} \right) \]

Repeat steps 2. & 3.

Factor analysis for flow data (MLE)#

flow = read.table('http://www.stanford.edu/class/stats305c/data/sachs.data.txt', sep=' ', header=FALSE)
FA_flow = factanal(flow, 2, scores='regression')
names(FA_flow)
FA_flow

'converged'
'loadings'
'uniquenesses'
'correlation'
'criteria'
'factors'
'dof'
'method'
'rotmat'
'scores'
'STATISTIC'
'PVAL'
'n.obs'
'call'

Call:
factanal(x = flow, factors = 2, scores = "regression")

Uniquenesses:
   V1    V2    V3    V4    V5    V6    V7    V8    V9   V10   V11 
0.014 0.005 0.816 0.841 0.998 0.979 0.786 0.955 0.038 0.046 0.315 

Loadings:
    Factor1 Factor2
V1   0.121   0.986 
V2   0.148   0.987 
V3   0.371   0.215 
V4   0.352   0.189 
V5                 
V6   0.145         
V7   0.389   0.249 
V8  -0.164  -0.136 
V9   0.978         
V10  0.971   0.102 
V11  0.822         

               Factor1 Factor2
SS loadings      3.072   2.133
Proportion Var   0.279   0.194
Cumulative Var   0.279   0.473

Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 22471.07 on 34 degrees of freedom.
The p-value is 0 

Compare to PCA (1st loading)#

pca_flow = prcomp(flow, center=TRUE, scale.=TRUE)
plot(FA_flow$loadings[,1], pca_flow$rotation[,1], ylab='PCA', xlab='Factor Analysis', pch=22, bg='red')

../_images/ae1efbb962018c33121c4f607c759b68b7218310bcc3ba86c3f902691b83b27f.png

Compare to PCA (2nd loading)#

plot(FA_flow$loadings[,2], pca_flow$rotation[,2], ylab='PCA', xlab='Factor Analysis', pch=22, bg='red')

../_images/adf7c5ef0bccd9d10605277af6735a46e715b8e44685664f1abd1d0305cec306.png

Latent scores#

Scores $Z$ are such that

\[ X|Z \sim N(\Gamma Z, \Psi) \]

suggesting that, given an estimate $(\Gamma, \Psi)$ we can recover $Z$ from $X$ by regression (Bartlett’s scores)

\[ \widehat{Z} = (\Gamma^T\Psi^{-1}\Gamma)^{-1} \Gamma^T\Psi^{-1}X \]

Alternatively (Thomson’s scores)

\[ E[Z|X] = (I+\Gamma^T\Psi^{-1}\Gamma)^{-1}\Gamma^T\Psi^{-1}X \]

Thomson has lower MSE, Bartlett’s is unbiased…

Comparing scores#

par(mfrow=c(2, 1))
plot(FA_flow$scores[,1], pca_flow$x[,1], pch=22, bg='red', xlab='Factor 1', ylab='Factor 2')
plot(FA_flow$scores[,2], pca_flow$x[,2], pch=22, bg='red', xlab='PC 1', ylab='PC 2')

../_images/dc8a1c1308101fb9ef920150f4077a0b19a5de300b1ab7eea66da136361fbac3.png

Comparing Thomson vs. Bartlett scores#

FA_flow2 = factanal(flow, 2, scores='Bartlett')
plot(FA_flow$scores[,1], FA_flow2$scores[,1], bg='red', pch=22)

../_images/af724996afc7e54dbd5416a48f746451a6496d0bd21b90d0a7560bc613161f14.png

Comparing Thomson vs. Bartlett scores#

plot(FA_flow$scores[,2], FA_flow2$scores[,2], bg='red', pch=22)

../_images/63cde00ba4479778d053e2b61f1710eb2d1a4135467ad0f5b52b846067fc881c.png

Olive oil data#

#| echo: true
library(pdfCluster)
data(oliveoil)
OO = oliveoil[,3:10]
FA_OO = factanal(OO, 2, scores='Bartlett')
FA_OO2 = factanal(OO, 2, scores='regression')

pdfCluster 1.0-4

plot(FA_OO$scores[,1], FA_OO2$scores[,1], bg='red', pch=22)

../_images/023ae420e1e4628260a160223ffd69daad109b8ffc55c7dc17ba244be71ac616.png

How many factors can we fit?#

Remaining degrees of freedom after we fit for $k$ factors is

\[ 1/2(p-k)^2 - 1/2(p+k) \]

MLE can still be challenging for factanal

#| echo: true
k = 3; p = 11
print(0.5 * ((p - k)^2 - (p + k)))
tryCatch({
  factanal_output <- factanal(flow, k)
}, error = function(e) {
   paste("Factanal failed with k =", k, ". Error:", e$message)
})

[1] 25

'Factanal failed with k = 3 . Error: unable to optimize from this starting value'

Latent variables and EM#

Likelihood is somewhat hard to maximize having observed only $X$.
BUT, if we observed $Z$, estimation is easy…
Estimation of $\Gamma$ would be a multivariate regression of $X$ onto $Z$.
Estimation of $\Psi$ would be easy: take diagonal part of usual MLE of $\widehat{\Sigma}$ in the above regression.
Suggests EM algorithm is a nice fit for factor analysis…

EM for factor analysis#

Complete (negative) log-likelihood#

\[ \frac{n}{2} \log \det \Psi + \frac{1}{2} \left[\text{Tr}\left(\Psi^{-1} (X-ZV')'(X-ZV')\right) + \frac{1}{2} \|Z\|^2_F \right] \]

Note that $Z|X \sim N\left(V'(VV' + \Psi)^{-1} X, I - V'(VV' + \Psi)^{-1}V\right)$

Expected value given $X$ at $(V_{\text{cur}}, \sigma^2_{\text{cur}}) = (V_c, \sigma^2_c)$ (ignoring irrelevant terms)#

\[ \frac{n}{2} \log \det \Psi + \frac{1}{2}\text{Tr}(\Psi^{-1}\left(X'X - 2 X'\mathbb{E}_{V_c,\sigma^2_c}[Z|X]V' + V \mathbb{E}_{V_c,\sigma^2_c}[Z'Z|X]V' \right) \]

Estimation of new $(V_{\text{new}}, \Psi_{\text{new}})$: multi response regression with covariance $I_{n \times n} \otimes (\sigma^2 I_{p \times p})$…

A convex version of factor analysis?#

In the factor analysis model $\Sigma^{-1}$ also has a particular structure.
For arguments sake, let’s assume $\Psi=I$ and $\Gamma^T\Gamma=\kappa I$. Then,

\[ \Sigma^{-1} = I - \frac{\kappa}{\kappa + 1} \Gamma \Gamma^T \]

Or, $\Sigma^{-1}$ is diagonal minus a low-rank (for general case, use Sherman-Morrison-Woodbury.

Let $\Theta = \Sigma^{-1} = \Phi - \Gamma \Gamma^T = \Phi - \Omega$:

\[ \ell(\Phi,\Omega|X) = - \frac{n}{2} \log \det(\Phi - \Omega) + \text{Tr}(X^TR_cX(\Phi - \Omega)) \]

subject to $\Phi$ being diagonal and $\Omega$ is rank $k$ (implicitly $\Phi - \Omega > 0$)

Convex relaxation#

The rank constraint is not convex but the usual surrogate for rank is nuclear norm

\[ \text{minimize}_{(\Phi, \Omega)} = - \frac{n}{2} \log \det(\Phi - \Omega) + \text{Tr}(X^TR_cX(\Phi - \Omega)) + \lambda \|\Omega\|_* \]

This is convex once we recall that be a valid inverse covariance we must have $\Phi-\Omega \geq 0$
Introducing this (convex) constraint yields a convex version of factor analysis

\[ \text{minimize}_{(\Phi, \Omega): \Phi-\Omega \geq 0, \Omega \geq 0} = - \frac{n}{2} \log \det(\Phi - \Omega) + \text{Tr}(X^TR_cX(\Phi - \Omega)) + \lambda \|\Omega\|_* \]

subject to $\Phi$ being diagonal.

Similar model considered in Chandrasekaran et al. – relaxed diagonal $\Phi$ to sparse $\Phi$. Combination of factor analysis and graphical LASSO.

Factor analysis

Contents

Factor analysis#

Download#

Factor analysis#

Probabilistic PCA vs Factor Analysis#

Likelihood of factor analysis model#

Generative model#

Estimation of factor analysis model#

Principal factor analysis for rank \(k\)#

Factor analysis for flow data (MLE)#

Compare to PCA (1st loading)#

Compare to PCA (2nd loading)#

Latent scores#

Comparing scores#

Comparing Thomson vs. Bartlett scores#

Comparing Thomson vs. Bartlett scores#

Olive oil data#

How many factors can we fit?#

Latent variables and EM#

EM for factor analysis#

Complete (negative) log-likelihood#

Expected value given \(X\) at \((V_{\text{cur}}, \sigma^2_{\text{cur}}) = (V_c, \sigma^2_c)\) (ignoring irrelevant terms)#

A convex version of factor analysis?#

Convex relaxation#

Fitting with `sklearn`#

CV curve#

Factor analysis

Contents

Factor analysis#

Download#

Factor analysis#

Probabilistic PCA vs Factor Analysis#

Likelihood of factor analysis model#

Generative model#

Estimation of factor analysis model#

Principal factor analysis for rank \(k\)#

Factor analysis for flow data (MLE)#

Compare to PCA (1st loading)#

Compare to PCA (2nd loading)#

Latent scores#

Comparing scores#

Comparing Thomson vs. Bartlett scores#

Comparing Thomson vs. Bartlett scores#

Olive oil data#

How many factors can we fit?#

Latent variables and EM#

EM for factor analysis#

Complete (negative) log-likelihood#

Expected value given \(X\) at \((V_{\text{cur}}, \sigma^2_{\text{cur}}) = (V_c, \sigma^2_c)\) (ignoring irrelevant terms)#

A convex version of factor analysis?#

Convex relaxation#

Fitting with sklearn#

CV curve#

Fitting with `sklearn`#