This notebook demonstrates the knockoff construction for the heavy-tailed Markov Chain where each variable has t-distributed tails. The model is \[X_1=\sqrt{\frac{\nu-2}\nu}Z_1, \quad X_{j+1}=\rho_j X_j + \sqrt{1-\rho_j^2}\sqrt{\frac{\nu-2}\nu}Z_{j+1}, \quad Z_j\stackrel{i.i.d.}{\sim} t_\nu\] for \(j=1,\dots,p\), and we will take \(\nu = 5\). Section 5.2.1 of the accompanying paper presents simulation results in this setting.
We demonstrate the multiple-try metropolis technique (Section 3.3 of the paper) with the recommended settings.
# simulation parameters
df.t = 5 # degree of freedom of t-distribution
p = 50 # dimension of the random vector X
n_samples = 300 # number of repititions
rhos = rep(0.6, p - 1) # correlations in the generative distribution
# algorithm parameters
halfnumtry = 4 # m/half number of candidates
stepsize = 1.5 # step size in the unit of 1/\sqrt((\Sigma)^{-1}_{jj})
# configure parallelism
library(foreach)
library(doMC)
n_cores = 3
registerDoMC(cores = n_cores)
# additional packages
library(latex2exp)
library(ggplot2)
source("../heavy-tailed-t/t_MTM_core.R") #load the knockoff functions
First, we compute the stepsize of the grid of proposals for each variable. The recommended stepsize for variable \(j\) (Section 3.3 of the paper) is \[ 1.5 / \sqrt{(\Sigma^{-1})_{jj}}. \]
# compute the proposal size at each step using an exact analytic calculation
quantile.x = matrix(0, ncol = 2 * halfnumtry + 1, nrow = p)
sds = rep(0, p)
sds[1] = sqrt(1 - rhos[1]^2)
for (i in 2:(p - 1)) {
sds[i] = sqrt((1 - rhos[i - 1]^2) * (1 - rhos[i]^2)/(1 - rhos[i - 1]^2 *
rhos[i]^2))
}
sds[p] = sqrt(1 - rhos[p - 1]^2)
for (i in 1:p) {
quantile.x[i, ] = ((-halfnumtry):halfnumtry) * sds[i] * stepsize
}
Then, we sample the Markov chain and the knockoffs with the SCEP.MH.MC function.
set.seed(100)
temp = matrix(0, ncol = 2 * p, nrow = 1)
start = Sys.time()
result_mtm = foreach(i = 1:n_samples, .combine = "rbind") %dopar% {
# sample the MC
temp[1] = rt(1, df.t) * sqrt((df.t - 2)/df.t)
for (j in 2:p) temp[j] = rt(1, df.t) * sqrt((df.t - 2)/df.t) * sqrt(1 -
rhos[j - 1]^2) + rhos[j - 1] * temp[j - 1]
# generate the knockoffs
temp[(p + 1):(2 * p)] = SCEP.MH.MC(temp[1:p], 0.999, quantile.x)
temp
}
end = Sys.time()
print(paste0("Average time per observation + knockoff (seconds): ", round((end -
start) * n_cores/n_samples, digits = 3)))
[1] "Average time per observation + knockoff (seconds): 0.017"
X = result_mtm[, 1:p]
Xk = result_mtm[, (p + 1):(2 * p)]
dim(result_mtm)
[1] 300 100
We can compute the mean correlation between \(X_j\) and \(\tilde{X}_j\) to measure the knockoff quality:
cors = c()
for (j in 1:p) {
cors = c(cors, cor(X[, j], Xk[, j]))
}
mean(cors)
[1] 0.5798123
We can also plot the correlation between \(X_j\) and \(\tilde{X}_j\) across different coordinates:
# plot the correlation versus the variable index
qplot(1:p, cors) + theme_bw() + xlab("variable index") + ylab(TeX("cor($X_j$, $\\tilde{X}_j$)")) +
scale_y_continuous(limits = c(0, 1))
We see that the first and last knockoff have higher quality, since these variables only depend on one other variable.
As a basic diagnostic, we check that each coordinate is mean zero and the the empirical covariance matrices of \(X\) and \(\tilde{X}\) are close. As the number of samples increases, the empirical covariance of \(X\) and \(\tilde{X}\) will converge to the same population covariance.
max(abs(colMeans(X))) #find the coordinate with mean farthest away from 0
[1] 0.1344224
max(abs(colMeans(Xk))) #find the coordinate with mean farthest away from 0
[1] 0.1147447
S = cor(X) #correlation matrix of X
Sk = cor(Xk) #correlation matrix of Xk
max(abs(S - Sk)) #largest difference in the correlation matrix.
[1] 0.2481599
We can visualize the difference in the empirical covariance matrices of \(X\) and \(\tilde{X}\) with a heatmap.
# visualize the difference in the empirical covariance
library(ggcorrplot)
ggcorrplot(cor(X) - cor(Xk), ggtheme = ggplot2::theme_bw, legend.title = "Corr diff",
title = "Difference in correlation structure") + scale_y_reverse()
We see that the difference in the empirical correlation matrix of \(X\) and the empirical correlation matrix of \(\tilde{X}\) are small, as expected, since \(X\) and \(\tilde{X}\) come from the same distribution.
We can also generate knockoffs with the covariance-guided proposals (Section 3.2 of the paper) with the SCEP.MH.MC.COV function. These proposals are motivated by a Gaussian approximation (indeed, they are optimal for Gaussian distributions), but they produce exact knockoffs because of the Metropolis–Hastings correction.
n_samples = 1000 # number of samples to generate knockoffs for
set.seed(100)
# generate the covariance guided knockoffs
temp = matrix(0, ncol = 2 * p, nrow = 1)
start = Sys.time()
result_cov = foreach(i = 1:n_samples, .combine = "rbind", .packages = "MASS") %do%
{
# sample one observation from the Markov Chain
temp[1] = rt(1, df.t) * sqrt((df.t - 2)/df.t)
for (j in 2:p) temp[j] = rt(1, df.t) * sqrt((df.t - 2)/df.t) * sqrt(1 -
rhos[j - 1]^2) + rhos[j - 1] * temp[j - 1]
# generate the knockoffs
xk = SCEP.MH.MC.COV(temp[1:p], 1, rep(0, p), prop.mat, cond.means.coeff,
cond.vars)
temp[(p + 1):(2 * p)] = xk[1:p]
temp
}
end = Sys.time()
print(paste0("Average time per observation + knockoff (seconds): ", round((end -
start)/n_samples, digits = 3)))
[1] "Average time per observation + knockoff (seconds): 0.008"
X = result_cov[, 1:p]
Xk = result_cov[, (p + 1):(2 * p)]
dim(result_cov)
[1] 1000 100
We can again compute the mean correlation between \(X_j\) and \(\tilde{X}_j\) to check the knockoff quality:
cors = c()
for (j in 1:p) {
cors = c(cors, cor(result_cov[, j], result_cov[, j + p]))
}
mean(cors)
[1] 0.6292854
We can also plot the correlation between \(X_j\) and \(\tilde{X}_j\) across different coordinates:
# plot the correlation versus the variable index
qplot(1:p, cors) + theme_bw() + xlab("variable index") + ylab(TeX("cor($X_j$, $\\tilde{X}_j$)")) +
scale_y_continuous(limits = c(0, 1))
As a basic diagnostic, we check that each coordinate is mean zero and the the empirical covariance matrices of \(X\) and \(\tilde{X}\) are close. As the number of samples increases, the empirical covariance of \(X\) and \(\tilde{X}\) will converge to the same population covariance.
max(abs(colMeans(X))) #find the coordinate with mean farthest away from 0
[1] 0.05669989
max(abs(colMeans(Xk))) #find the coordinate with mean farthest away from 0
[1] 0.07744425
S = cor(X) #correlation matrix of X
Sk = cor(Xk) #correlation matrix of Xk
max(abs(S - Sk)) #largest difference in the correlation matrix.
[1] 0.1521069
We can visualize the difference in the empirical covariance matrices of \(X\) and \(\tilde{X}\) with a heatmap.
# visualize the difference in the empirical covariance
library(ggcorrplot)
ggcorrplot(cor(X) - cor(Xk), ggtheme = ggplot2::theme_bw, legend.title = "Corr diff",
title = "Difference in correlation structure") + scale_y_reverse()
We see that the difference in the empirical correlation matrix of \(X\) and the empirical correlation matrix of \(\tilde{X}\) are small, as expected, since \(X\) and \(\tilde{X}\) come from the same distribution. Note that we used more observations with the covariance guided proposals, so the correlation matrices match more closely that in the MTM case.