Heavy-tailed Markov Chain Tutorial

This notebook demonstrates the knockoff construction for the heavy-tailed Markov Chain where each variable has t-distributed tails. The model is \[X_1=\sqrt{\frac{\nu-2}\nu}Z_1, \quad X_{j+1}=\rho_j X_j + \sqrt{1-\rho_j^2}\sqrt{\frac{\nu-2}\nu}Z_{j+1}, \quad Z_j\stackrel{i.i.d.}{\sim} t_\nu\] for \(j=1,\dots,p\), and we will take \(\nu = 5\). Section 5.2.1 of the accompanying paper presents simulation results in this setting.

Multiple-try Metropolis

We demonstrate the multiple-try metropolis technique (Section 3.3 of the paper) with the recommended settings.

# simulation parameters
df.t = 5  # degree of freedom of t-distribution
p = 50  # dimension of the random vector X
n_samples = 300  # number of repititions
rhos = rep(0.6, p - 1)  # correlations in the generative distribution

# algorithm parameters
halfnumtry = 4  # m/half number of candidates
stepsize = 1.5  # step size in the unit of 1/\sqrt((\Sigma)^{-1}_{jj})

# configure parallelism
library(foreach)
library(doMC)
n_cores = 3
registerDoMC(cores = n_cores)

# additional packages
library(latex2exp)
library(ggplot2)

source("../heavy-tailed-t/t_MTM_core.R")  #load the knockoff functions

First, we compute the stepsize of the grid of proposals for each variable. The recommended stepsize for variable \(j\) (Section 3.3 of the paper) is \[ 1.5 / \sqrt{(\Sigma^{-1})_{jj}}. \]

# compute the proposal size at each step using an exact analytic calculation
quantile.x = matrix(0, ncol = 2 * halfnumtry + 1, nrow = p)
sds = rep(0, p)
sds[1] = sqrt(1 - rhos[1]^2)
for (i in 2:(p - 1)) {
    sds[i] = sqrt((1 - rhos[i - 1]^2) * (1 - rhos[i]^2)/(1 - rhos[i - 1]^2 * 
        rhos[i]^2))
}
sds[p] = sqrt(1 - rhos[p - 1]^2)
for (i in 1:p) {
    quantile.x[i, ] = ((-halfnumtry):halfnumtry) * sds[i] * stepsize
}

Then, we sample the Markov chain and the knockoffs with the SCEP.MH.MC function.

set.seed(100)

temp = matrix(0, ncol = 2 * p, nrow = 1)
start = Sys.time()
result_mtm = foreach(i = 1:n_samples, .combine = "rbind") %dopar% {
    # sample the MC
    temp[1] = rt(1, df.t) * sqrt((df.t - 2)/df.t)
    for (j in 2:p) temp[j] = rt(1, df.t) * sqrt((df.t - 2)/df.t) * sqrt(1 - 
        rhos[j - 1]^2) + rhos[j - 1] * temp[j - 1]
    
    # generate the knockoffs
    temp[(p + 1):(2 * p)] = SCEP.MH.MC(temp[1:p], 0.999, quantile.x)
    temp
}
end = Sys.time()
print(paste0("Average time per observation + knockoff (seconds): ", round((end - 
    start) * n_cores/n_samples, digits = 3)))

[1] "Average time per observation + knockoff (seconds): 0.017"

X = result_mtm[, 1:p]
Xk = result_mtm[, (p + 1):(2 * p)]

dim(result_mtm)

[1] 300 100

Knockoff quality

We can compute the mean correlation between \(X_j\) and \(\tilde{X}_j\) to measure the knockoff quality:

cors = c()
for (j in 1:p) {
    cors = c(cors, cor(X[, j], Xk[, j]))
}
mean(cors)

[1] 0.5798123

We can also plot the correlation between \(X_j\) and \(\tilde{X}_j\) across different coordinates:

# plot the correlation versus the variable index
qplot(1:p, cors) + theme_bw() + xlab("variable index") + ylab(TeX("cor($X_j$, $\\tilde{X}_j$)")) + 
    scale_y_continuous(limits = c(0, 1))

We see that the first and last knockoff have higher quality, since these variables only depend on one other variable.

Diagnostics: checking means and covariances

As a basic diagnostic, we check that each coordinate is mean zero and the the empirical covariance matrices of \(X\) and \(\tilde{X}\) are close. As the number of samples increases, the empirical covariance of \(X\) and \(\tilde{X}\) will converge to the same population covariance.

max(abs(colMeans(X)))  #find the coordinate with mean farthest away from 0

[1] 0.1344224

max(abs(colMeans(Xk)))  #find the coordinate with mean farthest away from 0

[1] 0.1147447

S = cor(X)  #correlation matrix of X
Sk = cor(Xk)  #correlation matrix of Xk
max(abs(S - Sk))  #largest difference in the correlation matrix.

[1] 0.2481599

We can visualize the difference in the empirical covariance matrices of \(X\) and \(\tilde{X}\) with a heatmap.

# visualize the difference in the empirical covariance
library(ggcorrplot)
ggcorrplot(cor(X) - cor(Xk), ggtheme = ggplot2::theme_bw, legend.title = "Corr diff", 
    title = "Difference in correlation structure") + scale_y_reverse()

Covariance-guided proposals

We can also generate knockoffs with the covariance-guided proposals (Section 3.2 of the paper) with the SCEP.MH.MC.COV function. These proposals are motivated by a Gaussian approximation (indeed, they are optimal for Gaussian distributions), but they produce exact knockoffs because of the Metropolis–Hastings correction.

n_samples = 1000  # number of samples to generate knockoffs for
set.seed(100)

# generate the covariance guided knockoffs
temp = matrix(0, ncol = 2 * p, nrow = 1)
start = Sys.time()
result_cov = foreach(i = 1:n_samples, .combine = "rbind", .packages = "MASS") %do% 
    {
        # sample one observation from the Markov Chain
        temp[1] = rt(1, df.t) * sqrt((df.t - 2)/df.t)
        for (j in 2:p) temp[j] = rt(1, df.t) * sqrt((df.t - 2)/df.t) * sqrt(1 - 
            rhos[j - 1]^2) + rhos[j - 1] * temp[j - 1]
        
        # generate the knockoffs
        xk = SCEP.MH.MC.COV(temp[1:p], 1, rep(0, p), prop.mat, cond.means.coeff, 
            cond.vars)
        temp[(p + 1):(2 * p)] = xk[1:p]
        temp
    }
end = Sys.time()
print(paste0("Average time per observation + knockoff (seconds): ", round((end - 
    start)/n_samples, digits = 3)))

[1] "Average time per observation + knockoff (seconds): 0.008"

X = result_cov[, 1:p]
Xk = result_cov[, (p + 1):(2 * p)]

dim(result_cov)

[1] 1000  100

Knockoff quality

We can again compute the mean correlation between \(X_j\) and \(\tilde{X}_j\) to check the knockoff quality:

cors = c()
for (j in 1:p) {
    cors = c(cors, cor(result_cov[, j], result_cov[, j + p]))
}
mean(cors)

[1] 0.6292854

We can also plot the correlation between \(X_j\) and \(\tilde{X}_j\) across different coordinates:

# plot the correlation versus the variable index
qplot(1:p, cors) + theme_bw() + xlab("variable index") + ylab(TeX("cor($X_j$, $\\tilde{X}_j$)")) + 
    scale_y_continuous(limits = c(0, 1))

Diagnostics: checking means and covariances

max(abs(colMeans(X)))  #find the coordinate with mean farthest away from 0

[1] 0.05669989

max(abs(colMeans(Xk)))  #find the coordinate with mean farthest away from 0

[1] 0.07744425

S = cor(X)  #correlation matrix of X
Sk = cor(Xk)  #correlation matrix of Xk
max(abs(S - Sk))  #largest difference in the correlation matrix.

[1] 0.1521069

We can visualize the difference in the empirical covariance matrices of \(X\) and \(\tilde{X}\) with a heatmap.

# visualize the difference in the empirical covariance
library(ggcorrplot)
ggcorrplot(cor(X) - cor(Xk), ggtheme = ggplot2::theme_bw, legend.title = "Corr diff", 
    title = "Difference in correlation structure") + scale_y_reverse()

We see that the difference in the empirical correlation matrix of \(X\) and the empirical correlation matrix of \(\tilde{X}\) are small, as expected, since \(X\) and \(\tilde{X}\) come from the same distribution. Note that we used more observations with the covariance guided proposals, so the correlation matrices match more closely that in the MTM case.

Heavy-tailed Markov Chain Tutorial

Stephen Bates and Wenshuo Wang

2019-02-24

Multiple-try Metropolis

Knockoff quality

Diagnostics: checking means and covariances

Covariance-guided proposals

Knockoff quality

Diagnostics: checking means and covariances