CCA

CCA#

Download#

Canonical correlation analysis#

For a data matrix $X \in \mathbb{R}^{n \times p}$ with $X_i \overset{IID}{\sim} F$, PCA and Factor Analysis can be thought of as models / decompositions of

\[ \Sigma = \text{Var}_F(X) \]

Canonical correlation analysis deals with pairs $\mathbb{R}^p \times \mathbb{R}^q \ni (X_i, Y_i) \overset{IID}{\sim} F$ and models / decomposes the cross-covariance matrix

\[\begin{split} \text{Var}_F(X,Y) = \begin{pmatrix} \Sigma_{XX} & \Sigma_{XY} \\ \Sigma_{YX} & \Sigma_{YY} \end{pmatrix}. \end{split}\]

First goal:

\[ \text{maximize}_{(a,b): a^T\Sigma_{XX}a \leq 1, b^T\Sigma_{YY}b \leq 1} a^T\Sigma_{XY}b \]

Example from here
$X$: psychological variables: (Control, Concept, Motivation)
$Y$: academic + sex: (Read, Write, Math, Science, Sex)

#| echo: true
mm = read.csv("https://stats.idre.ucla.edu/stat/data/mmreg.csv")
head(mm)
dim(mm)

A data.frame: 6 × 8
	locus_of_control	self_concept	motivation	read	write	math	science	female
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<int>
1	-0.84	-0.24	1.00	54.8	64.5	44.5	52.6	1
2	-0.38	-0.47	0.67	62.7	43.7	44.7	52.6	1
3	0.89	0.59	0.67	60.6	56.7	70.5	58.0	0
4	0.71	0.28	0.67	62.7	56.7	54.7	58.0	0
5	-0.64	0.03	1.00	41.6	46.3	38.4	36.3	1
6	1.11	0.90	0.33	62.7	64.5	61.4	58.0	1

600
8

Substituting $u=\Sigma_{XX}^{1/2}a, v=\Sigma_{YY}^{1/2}v$ we see that $(u, v)$ solve

\[ \text{maximize}_{(u,v): \|u\|_2 \leq 1, \|v\|_2 \leq 1} u^T\Sigma_{XX}^{-1/2}\Sigma_{XY}\Sigma_{YY}^{-1/2}v \]

So $u=U[,1]$ and $v=V[,1]$ are leading left and right singular vectors of $\Sigma_{XX}^{-1/2}\Sigma_{XY}\Sigma_{YY}^{-1/2}= U \rho V^T$ with $\rho=\text{diag}(\rho_1, \dots,\rho_q)$.
Achieved maximal correlation value is the leading singular value $\rho_1$.

Practically speaking, can find SVD by eigendecomposition of the symmetric matrix

\[ \Sigma_{XX}^{-1/2}\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX} \Sigma_{XX}^{-1/2} = U\rho^2 U' \]

Loadings $a=\Sigma_{XX}^{-1/2}u, b=\Sigma_{YY}^{-1/2}v$ determine first canonical variates $(a^TX, b^TY)$ with

\[ \text{Var}_F(a^TX) = \text{Var}_F(b^TY) = 1, \qquad \text{Cov}_F(a^TX, b^TY) = \rho_1.\]

Let’s call $a=a_1, b=b_1$ for first canonical variates.
Why not find more?

Second canonical variate problem:

\[ \text{maximize}_{(a_2,b_2): a_2^T\Sigma_{XX}a_2 \leq 1, b_2^T\Sigma_{YY}b_2 \leq 1} a_2^T\Sigma_{XY}b_2 \]

subject to $\text{Cov}_F(a_1^TX,a_2^TX)=\text{Cov}_F(b_1^TY, b_2^TY)=0$.

Not hard to see that $a_2=\Sigma_{XX}^{-1/2}U[,2], b_2=\Sigma_{YY}^{-1/2}V[,2]$.
Realized correlation is $\rho_2$.
Continuing

\[ A = \Sigma_{XX}^{-1/2}U, \qquad B=\Sigma_{YY}^{-1/2}V. \]

Final form of covariance matrix

\[\begin{split} \text{Cov}_F(A'X, B'Y) = \begin{pmatrix} I & R \\ R^T & I \end{pmatrix} \end{split}\]

Assuming non-singular covariances, when $p > q$

\[\begin{split} R = \begin{pmatrix} \rho \\ 0 \end{pmatrix} \end{split}\]

transposing when $p \leq q$.

Generative model#

Several ways to realize the law… here is one
Pick $\bar{U} \in \mathbb{R}^{p \times p}$ be orthonormal with first $q$ columns equal to $U$.

\[\begin{split} \begin{aligned} \check{Z} & \sim N(0, I_q) \\ \epsilon_X & \sim N(0, I_p) \\ \epsilon_Y & \sim N(0, I_q) \\ Z_Y &= \sqrt{1 - \rho} \cdot \epsilon_Y + \sqrt{\rho} \cdot \check{Z} \\ Z_X[1:p] &= \sqrt{1 - \rho} \cdot \epsilon_X[1:q] + \sqrt{\rho} \cdot \check{Z} \\ Z_X[(q+1):p] &= \epsilon_X[(q+1):p] \\ X &= \Sigma_{XX}^{1/2} \textcolor{red}{\bar{U}} Z_X \\ Y &= \Sigma_{YY}^{1/2} \textcolor{red}{V} Z_Y \end{aligned} \end{split}\]

Sample version#

Simply replace $\Sigma$ with $\widehat{\Sigma}$.
Under non-degeneracy assumptions, matrices $\widehat{\Sigma}_{XX}$ and $\widehat{\Sigma}_{YY}$ will be invertible if $n > \text{max}(q,p)$.
As in PPCA, Bach & Jordan (2006) show that sample estimates are MLE for this generative model
$\implies$ could use likelihood to choose rank…

#| echo: true
library(CCA)
X = mm[,1:3]
Y = mm[,-c(1:3)]
CCA_mm = CCA::cc(X, Y)
names(CCA_mm)

Loading required package: fda

Loading required package: splines

Loading required package: fds

Loading required package: rainbow

Loading required package: MASS

Loading required package: pcaPP

Loading required package: RCurl

Loading required package: deSolve

Attaching package: ‘fda’

The following object is masked from ‘package:graphics’:

    matplot

Loading required package: fields

Loading required package: spam

Spam version 2.11-1 (2025-01-20) is loaded.
Type 'help( Spam)' or 'demo( spam)' for a short introduction 
and overview of this package.
Help for individual functions is also obtained by adding the
suffix '.spam' to the function name, e.g. 'help( chol.spam)'.

Attaching package: ‘spam’

The following objects are masked from ‘package:base’:

    backsolve, forwardsolve

Loading required package: viridisLite

Try help(fields) to get started.

'cor'
'names'
'xcoef'
'ycoef'
'scores'

Checking CCA solution#

#| echo: true
crosscov = cov(mm)[1:3,4:8]
t(CCA_mm$xcoef) %*% crosscov %*% CCA_mm$ycoef
CCA_mm$xcoef

A matrix: 3 × 3 of type dbl
4.640861e-01	5.599087e-17	-1.930228e-16
3.610083e-16	1.675092e-01	-3.725764e-16
4.012970e-16	-5.498621e-16	1.039911e-01

A matrix: 3 × 3 of type dbl
locus_of_control	-1.2538339	-0.6214776	-0.6616896
self_concept	0.3513499	-1.1876866	0.8267210
motivation	-1.2624204	2.0272641	2.0002283

Diagonalizes the $X$ and $Y$ subblocks#

#| echo: true
Xcov = cov(mm)[1:3,1:3]
t(CCA_mm$xcoef) %*% Xcov %*% CCA_mm$xcoef

A matrix: 3 × 3 of type dbl
1.000000e+00	2.478633e-16	2.048775e-16
3.413977e-16	1.000000e+00	-1.345034e-16
2.169002e-16	-1.469862e-16	1.000000e+00

#| echo: true
Ycov = cov(mm)[4:8,4:8]
t(CCA_mm$ycoef) %*% Ycov %*% CCA_mm$ycoef

A matrix: 3 × 3 of type dbl
1.000000e+00	-1.975726e-16	2.677851e-17
-1.686276e-16	1.000000e+00	-4.013920e-15
-1.444754e-17	-4.112697e-15	1.000000e+00

#| echo: true
plot(CCA_mm$scores$xscores[,1], CCA_mm$scores$yscores[,1])

../_images/37cc15677769b6f738e823a2ce75e5741d33f0e73854c0bb6adf776be82af449.png

Invariance under non-singular transformations#

Clearly the CCA decomposition is invariant under change of means of $(X, Y)$.
The matrix

\[ \Sigma_{XX}^{-1/2}\Sigma_{XY} \Sigma_{YY}^{-1/2} \]

is also invariant under non-singular transformations (of $X$ or of $Y$)

$\implies$ CCA is affine equivariant…

#| echo: true
CCA_mm_scaled = CCA::cc(scale(X, TRUE, TRUE), scale(Y, TRUE, TRUE))
cbind(CCA_mm$cor, CCA_mm_scaled$cor)
lm(CCA_mm$scores$xscores[,1] ~ CCA_mm_scaled$scores$xscores[,1])

A matrix: 3 × 2 of type dbl
0.4640861	0.4640861
0.1675092	0.1675092
0.1039911	0.1039911

Call:
lm(formula = CCA_mm$scores$xscores[, 1] ~ CCA_mm_scaled$scores$xscores[, 
    1])

Coefficients:
                      (Intercept)  CCA_mm_scaled$scores$xscores[, 1]  
                         9.15e-16                           1.00e+00  

#| echo: true
A = matrix(rnorm(9), 3, 3)
XA = as.matrix(X) %*% A
CCA_affine = CCA::cc(XA, Y)
cbind(CCA_mm$cor, CCA_affine$cor)
lm(CCA_mm$scores$xscores[,1] ~ CCA_affine$scores$xscores[,1])

A matrix: 3 × 2 of type dbl
0.4640861	0.4640861
0.1675092	0.1675092
0.1039911	0.1039911

Call:
lm(formula = CCA_mm$scores$xscores[, 1] ~ CCA_affine$scores$xscores[, 
    1])

Coefficients:
                   (Intercept)  CCA_affine$scores$xscores[, 1]  
                    -7.066e-15                       1.000e+00  

Relation to reduced rank regression#

For each of $X$ and $Y$, we saw that there are mappings

\[ X \mapsto U^T\Sigma^{-1/2}_{XX}X = A^TX = Z_X, \qquad Y \mapsto V^T\Sigma_{YY}^{-1/2}Y = B^TY = Z_Y \]

to unitless canonical variates.

These canonical variates drive the correlation between $X$ and $Y$…

Reduced rank regression for rank $k \leq \text{min}(p, q)$#

Due to affine equivariance, it’s not hard to see that

\[ \hat{Y} = \Sigma_{YY}^{1/2} V[,1:k] \rho[1:k] Z_X[1:k] \]

Inference: how to test $\rho_j=0, j \geq k$?#

Complicated joint densities…
Not too hard under global null $H_0:\Sigma_{XY}=0$, though these tests will all look like tests for regression $Y$ onto $X$.
Under global null $H_0: X \ \text{ independent of} \ Y$ could permute either $X$ or $Y$…

Generalizations#

For non-Euclidean data we might featurize either $X$, $Y$ or both.
This idea shows up over and over: given features $X$, make new features through a basis expansion (which include the kernel trick).
We see this in two-sample tests (e.g. in the kernelized versions); linear regression; kernel PCA; etc.
Wherever we form random variables like $\beta^TX$, we can replace with $W=f(X)$ …
Often, methods with richer (i.e. more) features will also feature a regularization term …
Which new features? This is domain dependent…

Kernelized CCA#

Given linear spaces of features ${\cal H}_X, {\cal H}_Y$ we might look at

\[ \text{maximize}_{(f, g) \in {\cal H}_X \times {\cal H}_Y: \text{Var}_F(f(X)) \leq 1, \text{Var}_F(g(Y)) \leq 1} \text{Cov}_F(f(X), g(Y)). \]

If ${\cal H}_X$ and ${\cal H}_Y$ are RKHS, this is kernelized CCA.

A possible issue?#

For kernelized CCA $\text{max}(\text{dim}({\cal H}_X), \text{dim}({\cal H}_Y)) > n$ the matrix

\[ \widehat{\text{Cov}}_F(f_i(X), f_j(X)) = \widehat{\Sigma}_{f(X),f(X)} \]

is degenerate so we can’t really form $\widehat{\Sigma}_{f(X),f(X)}^{-1/2}$…

Generic RKHS will always have this problem. How to resolve?

Regularized covariance estimate#

Let’s just assume that ${\cal H}_X$ is finite dimensional of dimension $k > n$ and ${\cal H}_Y$ are just the usual linear functions.
We can then just work on the usual CCA with data matrices $Y$ and $W=f(X) \in \mathbb{R}^{n \times k}$.
A simple remedy is to consider instead

\[ \text{maximize}_{(a,b): a^T(\widehat{\Sigma}_{WW}+\epsilon I)a \leq 1, b^T\widehat{\Sigma}_{YY}b \leq 1} a^T\widehat{Cov}_{WY}b \]

Will require SVD of $(\widehat{\Sigma}_{WW} + \epsilon I)^{-1/2} \widehat{\Sigma}_{WY} \widehat{\Sigma}_{YY}^{-1/2}$.

Kernelized version#

Instead of $\widehat{\text{Var}}(f(X))$ could use

\[ f \mapsto \widehat{\text{Var}}(f(X)) + \epsilon \|f\|^2_{{\cal H}_X}. \]

Even simpler (in theory): subsample the “knots” in the kernel (e.g. Snelson & Ghahramani (2005))
By usual kernel trick, this reduces problem to an $n$ dimensional problem.
In this case, an eigenvalue problem for $n \times n$ matrix…

Other covariance regularization#

(PMD) Witten et al. (pretends $\Sigma_{WW}= \kappa I$)
NOCCO in Fukumizu et al. (uses the regularized covariance operator in RKHS)
Use row/column covariance Allen et al.
Each of these methods boils down to choosing two quadratic forms $Q_X, Q_Y$…

Other structure#

Restricting to finite dimensions, the first regularized problems morally look like (assuming $X$ and $Y$ have been centered)

\[ \text{maximize}_{(a,b): a^TQ_Xa \leq 1, b^TQ_Yb \leq 1} a^TX^TYb \]

It is often tempting to interpret loadings $(a,b)$ just as in PCA.
For such settings, it is often desirable to impose structure on $(a,b)$.

Sparse CCA and variants#

In bound form: $$ \text{maximize}_{(a,b): a^TQ_Xa \leq 1, b^TQ_Yb \leq 1, \|a\|_1 \leq c_X } a^TX^TYb $$
In Lagrange form: $$ \text{maximize}_{(a,b): a^TQ_Xa \leq 1, b^TQ_Yb \leq 1} a^TX^TYb - \lambda_X \|a\|_1 $$

Non-negative factors#

\[ \text{maximize}_{(a,b): a^TQ_Xa \leq 1, b^TQ_Yb \leq 1, a \geq 0, b \geq 0} a^TX^TYb \]

Relation to sparse PCA#

Corresponds to swapping $X^TY$ with $X$ …

A bi-convex problem#

The function $(a,b) \mapsto -a^TX^TYb$ is not convex.
Numeric feasibility of solving PCA rests on computability of SVD even though it is not a convex problem.
However, for $b$ fixed, $a \mapsto -a^TX^TYb$ is linear, hence convex.
Alternating algorithm: given feasibile initial $(\widehat{a}^{(0)}, \widehat{b}^{(0)})$ we iterate (for $\ell_1$ bound form above):

\[\begin{split} \begin{aligned} \widehat{a}^{(t)} &= \text{argmin}_{a: a^TQ_Xa \leq 1, \|a\|_1 \leq c_1} - a^TX^TY\widehat{b}^{(t-1)} \\ \widehat{b}^{(t)} &= \text{argmin}_{b: b^TQ_Yb \leq 1} - (\widehat{a}^{(t)})^TX^TYb \\ \end{aligned} \end{split}\]

Yields an ascent algorithm on objective $(a,b) \mapsto a^TX^TYb$… often with pretty simple updates.
Deflation not completely obvious, but there are reasonable proposals

An example with a pretty picture: CGH#

# install.packages('PMA', repos='http://cloud.r-project.org') 
# might need to install `impute` from BioConductor
library(PMA)
# http://web.stanford.edu/class/stats305c/data/breastdata.rda
load('~/Downloads/breastdata.rda') 
attach(breastdata)
PlotCGH(dna[,1], chrom=chrom, main="Sample 1", nuc=nuc)

../_images/8fe388b4f805c923680c1fcd8c226b614849171b25bb23a68f01dd14c7f6edad.png

An example with a pretty picture: CGH#

Dataset breastdata from PMA has both gene expression data (rna) and CGH measurements (dna) on 89 samples.
Our example (which is in PMA package) will just use CGH from chromosome 1: t(dna) is a $89 \times 136$ matrix and all of t(rna), an $89 \times 19672$ matrix.
CGH data is modeled as piecewise constant with a (sparse) fused LASSO penalty constraint $$ {\cal P}_{FL}(v) = \|v\|_1 + \alpha \|Dv\|_1 \leq c_v $$ while gene expression is sparse $$ {\cal P}_S(u) = \|u\|_1. $$

Optimization problem: $$ \text{maximize}_{(u,v): \|u\|_2\leq 1, {\cal P}_S(u) \leq c_u, \|v\|_2 \leq 1, {\cal P}_{FL}(v) \leq c_v} u^TX^TYv $$
Inevitably, one has to choose tuning parameters…
Authors propose permuting $X$ and rerunning algorithm while retaining the value of the problem. Parameters are chosen to have smallest p-value.

#| echo: true
set.seed(22)
dna_t = t(dna)
rna_t = t(rna)
perm.out = CCA.permute(x=rna_t,
                       ## Run CCA using all gene exp. data, but CGH data on chrom 1 only.
                       z=dna_t[,chrom==1],
                       typex="standard",
                       typez="ordered",
                       nperms=10,
                       penaltyxs=seq(.02,.7,len=10))

Warning message in CCA.permute(x = rna_t, z = dna_t[, chrom == 1], typex = "standard", :
“Since type of z is ordered, the penalty for z was chosen w/o permutations.”

 Permutation  1  out of  10  12345678910
 Permutation  2  out of  10  12345678910
 Permutation  3  out of  10  12345678910
 Permutation  4  out of  10  12345678910
 Permutation  5  out of  10  12345678910
 Permutation  6  out of  10  12345678910
 Permutation  7  out of  10  12345678910
 Permutation  8  out of  10  12345678910
 Permutation  9  out of  10  12345678910
 Permutation  10  out of  10  12345678910

#| echo: true
print(perm.out)

Call: CCA.permute(x = rna_t, z = dna_t[, chrom == 1], typex = "standard", 
    typez = "ordered", penaltyxs = seq(0.02, 0.7, len = 10), 
    nperms = 10)

Type of x:  standard 
Type of z:  ordered 
   X Penalty Z Penalty Z-Stat P-Value  Cors Cors Perm FT(Cors) FT(Cors Perm)
1      0.020     0.032  3.694     0.0 0.825     0.641    1.171         0.762
2      0.096     0.032  1.487     0.0 0.801     0.684    1.101         0.845
3      0.171     0.032  1.882     0.0 0.805     0.649    1.113         0.783
4      0.247     0.032  1.786     0.0 0.782     0.623    1.050         0.739
5      0.322     0.032  1.591     0.0 0.753     0.603    0.980         0.706
6      0.398     0.032  1.396     0.0 0.726     0.587    0.921         0.681
7      0.473     0.032  1.221     0.0 0.701     0.574    0.869         0.661
8      0.549     0.032  1.063     0.0 0.677     0.563    0.823         0.644
9      0.624     0.032  0.922     0.2 0.655     0.553    0.784         0.630
10     0.700     0.032  0.796     0.2 0.635     0.546    0.750         0.619
   # U's Non-Zero # Vs Non-Zero
1              12            90
2             267            72
3            1189            71
4            2639            62
5            4329            62
6            6278            61
7            8399            61
8           10860            61
9           13536            61
10          16252            61
Best L1 bound for x:  0.02
Best lambda for z:  0.03178034

What is the $Z$-score?#

For each permutation, there is a realized value to the optimization problem for each value of the tuning parameter.
Can compute mean and SD over permutation data (for each value of the tuning parameter), yielding a reference distribution.
$Z$-score compares observed realized value to the permutation distribution.
Tuning parameter is chosen to maximize the $Z$-score.

#| echo: true
plot(perm.out)

../_images/acf0b8d468a9f9e980157823ddf74908969562ea3a694a5945e6069164f47736.png

#| echo: true
out = CCA(x=rna_t,
          z=dna_t[,chrom==1], 
          typex="standard", 
          typez="ordered",
          penaltyx=perm.out$bestpenaltyx,
          v=perm.out$v.init, 
          penaltyz=perm.out$bestpenaltyz,
          xnames=substr(genedesc,1,20),
          znames=paste("Pos", sep="", nuc[chrom==1]))
print(out, verbose=FALSE) # could do print(out,verbose=TRUE)

123456789101112131415

Call: CCA(x = rna_t, z = dna_t[, chrom == 1], typex = "standard", typez = "ordered", 
    penaltyx = perm.out$bestpenaltyx, penaltyz = perm.out$bestpenaltyz, 
    v = perm.out$v.init, xnames = substr(genedesc, 1, 20), znames = paste("Pos", 
        sep = "", nuc[chrom == 1]))


Num non-zeros u's:  12 
Num non-zeros v's:  90 
Type of x:  standard 
Type of z:  ordered 
Penalty for x: L1 bound is  0.02 
Penalty for z: Lambda is  0.03178034 
Cor(Xu,Zv):  0.8247056

#| echo: true
print(genechr[out$u!=0]) # which chromosome are the selected genes on
PlotCGH(out$v, 
        nuc=nuc[chrom==1], 
        chrom=chrom[chrom==1],
        main="Regions of gain/loss on Chrom 1 assoc'd with gene expression")

 [1] 1 1 1 1 1 1 1 1 1 1 1 1

../_images/991d3316430573460a0cfd89a3e1c69625b8a4c66f6012e09ce6737c08be88d9.png

Multiple CCA#

We had gene expression and CGH data on the same samples.
Can easily conceive of situations where we have yet another type of phenotype.
Given samples from $X=(X_1, \dots, X_r) \sim F, X_i \in \mathbb{R}^{d_i}$ we can write its covariance as

\[\begin{split} \text{Var}_F(X) = \begin{pmatrix} \Sigma_{11} & \Sigma_{12} & \dots \\ \Sigma_{21} & \Sigma_{22} & \dots \\ \vdots & \ddots & \dots \end{pmatrix} \end{split}\]

with $\Sigma_{ij} \in \mathbb{R}^{d_i \times d_j}$.

Forming linear combinations $a_i^TX_i, a_j^TX_j$ yields

\[\begin{split} \text{Var}_F(a_1^TX_1,\dots, a_r^TX_r) = \begin{pmatrix} a_1^T\Sigma_{11}a_1 & a_2^T\Sigma_{12}a_2 & \dots \\ a_2^T\Sigma_{21}a_1 & a_2^T\Sigma_{22}a_2 & \dots \\ \vdots & \ddots & \dots \end{pmatrix} \in \mathbb{R}^{r \times r} \end{split}\]

Pick your favorite functional $\Phi$ of the off-diagonal blocks – try to maximize

\[ \Phi\left(\text{Var}_F(a_1^TX_1,\dots, a_r^TX_r)\right) \]

subject to $\text{Var}_F(a_i^TX_i)=1, 1 \leq i \leq r$.

Examples of $\Phi$: sum of off-diagonals; max of off-diagonals…
Covariance can be regularized…
Can impose structure (sparsity, non-negativity, etc.)
Solving the problem? Certainly the sum of off-diagonals is multi-convex…
See Witten and Tibshirani (2009)

Cooperative learning#

The CCA problem is unsupervised (though is related to reduced rank regression…)
In supervised setting, we might have different views (qualitatively different types of data)
- Genetic data
- Different types of biological assay
- Imaging data

Multi-view supervised#

Multi-view objective: cooperative learning #

\[ \text{minimize}_{f_X, f_Z} \frac{1}{2} \mathbb{E} \left[ \left(Y - f_Z(Z) - f_X(Z) \right) + \frac{\rho}{2}\left(f_X(X) - f_Z(Z)\right)^2 \right] \]

Underlying generative model (?)#

$X$, $Z$ as in the CCA generative model with common variation $\check{Z}$
$Y | \check{Z}, X, Z \sim N(\alpha'\check{Z} + \beta'X + \gamma'Z, \sigma^2)$ with $\beta$, $\gamma$ simple
Supervised target shares some of the same latent variables as $X$ and $Z$…

CCA

Contents

CCA#

Download#

Canonical correlation analysis#

Generative model#

Sample version#

Checking CCA solution#

Diagonalizes the \(X\) and \(Y\) subblocks#

Invariance under non-singular transformations#

Relation to reduced rank regression#

Reduced rank regression for rank \(k \leq \text{min}(p, q)\)#

Inference: how to test \(\rho_j=0, j \geq k\)?#

Generalizations#

Kernelized CCA#

A possible issue?#

Regularized covariance estimate#

Kernelized version#

Other covariance regularization#

Other structure#

Sparse CCA and variants#

Non-negative factors#

Relation to sparse PCA#

A bi-convex problem#

An example with a pretty picture: CGH#

An example with a pretty picture: CGH#

What is the \(Z\)-score?#

Multiple CCA#

Cooperative learning#

Multi-view supervised#

Multi-view objective: cooperative learning #

Underlying generative model (?)#

CCA

Contents

CCA#

Download#

Canonical correlation analysis#

Generative model#

Sample version#

Checking CCA solution#

Diagonalizes the \(X\) and \(Y\) subblocks#

Invariance under non-singular transformations#

Relation to reduced rank regression#

Reduced rank regression for rank \(k \leq \text{min}(p, q)\)#

Inference: how to test \(\rho_j=0, j \geq k\)?#

Generalizations#

Kernelized CCA#

A possible issue?#

Regularized covariance estimate#

Kernelized version#

Other covariance regularization#

Other structure#

Sparse CCA and variants#

Non-negative factors#

Relation to sparse PCA#

A bi-convex problem#

An example with a pretty picture: CGH#

An example with a pretty picture: CGH#

What is the \(Z\)-score?#

Multiple CCA#

Cooperative learning#

Multi-view supervised#

Multi-view objective: cooperative learning#

Underlying generative model (?)#

Multi-view objective: cooperative learning #