This notebook demonstrates the knockoff construction for the asymmetric Markov Chain experiments. The data is generated as follows. Let $$ Z_j\stackrel{i.i.d.}{\sim} \frac{I\cdot \mid Y_\text{G} \mid -(1-I)\cdot Y_\text{E}- \mu}{\sigma} \ \text{ for } j=1,\dots,p, $$ where $\mu$ and $\sigma$ are chosen so that the variables are centered with unit variance, and then define $$ X_1=Z_1, \quad X_{j+1}=\rho_j X_j + \sqrt{1-\rho_j^2}Z_{j+1} \ \text{ for } j=2,\dots,p. $$ Section 5.2.2 of the accompanying paper presents a large set of simulation results in this setting.
We demonstrate the Multiple-try Metropolis (Section 3.3 of the paper) proposals below.
import math
import numpy as np
import scipy
import time
import matplotlib.pyplot as plt
import seaborn as sns
%run ../asymmetric/asym_core
#simulation parameters
p = 30 # dimension of the random vector X
numsamples = 500 # number of samples to generate knockoffs for
rhos = [0.6]*(p-1) # the correlations
#algorithm parameters
halfnumtry = 2 # m/half number of candidates
stepsize = 1.5 # step size in the unit of 1/\sqrt((\Sigma)^{-1}_{jj})
#distributional constants
asymean = 1/math.sqrt(2*math.pi) - 1/2
variance = 1.5 - (1/math.sqrt(2*math.pi)- 1/2)**2
We first compute the proposal scaling for each variable. Recall that the recommended scaling for the proposal for variable $j$ is $1.5 / \sqrt{(\Sigma^{-1})_{jj}}$ (Section 3.3 of the paper).
#generate the proposal grid
quantile_x = np.zeros([p, 2*halfnumtry + 1])
sds = [0]*p
sds[0] = math.sqrt(1 - rhos[0]**2)
for i in range(1,p - 1):
sds[i] = math.sqrt((1 - rhos[i - 1]**2)*(1 - rhos[i]**2) /
(1 - rhos[i - 1]**2*rhos[i]**2))
sds[p - 1] = math.sqrt(1 - rhos[p - 2]**2)
for i in range(p):
quantile_x[i] = [x*sds[i]*stepsize for x in list(
range(-halfnumtry, halfnumtry + 1))]
Next, we sample observations from the Markov Chain and generate knockoffs with the MTM technique using the SCEP_MH_MC function.
bigmatrix = np.zeros([numsamples,2*p]) #store simulation data
#generate each observation and knockoff
start = time.time()
for i in range(numsamples):
#sample one instance from the Markov Chain
if np.random.uniform() >= 0.5:
bigmatrix[i, 0] = (abs(np.random.normal()) - asymean) / \
math.sqrt(variance)
else:
bigmatrix[i, 0] = (log(np.random.uniform()) - asymean) / \
math.sqrt(variance)
# log(np.random.uniform()) is a negative exponential
for j in range(1, p):
if np.random.uniform() >= 0.5:
bigmatrix[i,j] = ((abs(np.random.normal()) - asymean)/
math.sqrt(variance)) * \
math.sqrt(1 - rhos[j - 1]**2) + rhos[j - 1] * \
bigmatrix[i,j - 1]
else:
bigmatrix[i, j] = ((log(np.random.uniform())-asymean)/
math.sqrt(variance)) * \
math.sqrt(1- rhos[j - 1]**2) + rhos[j - 1] * \
bigmatrix[i, j - 1]
#generate a knockoff for the observation
bigmatrix[i, p:(2*p)] = SCEP_MH_MC(bigmatrix[i, 0:p], 0.999,
quantile_x, rhos)
end = time.time()
print("Average time per observation + knockoff (seconds): " + \
'%.3f'%((end - start) / numsamples))
np.shape(bigmatrix)
We can compute the mean correlation between $X_j$ and $\tilde{X}_j$ to measure the knockoff quality:
cors = []
for j in range(p):
cors += [np.corrcoef(bigmatrix[:, j], bigmatrix[:, j + p])[0, 1]]
np.mean(cors)
We can also plot the correlation between $X_j$ and $\tilde{X}_j$ across different coordinates:
plt.scatter(range(p), cors)
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.ylim(0,1)
plt.ylabel(r"cor$(X_j, \tilde{X_j})$")
plt.xlabel("variable index")
plt.show()
We see that the first and last knockoff have higher quality, since these variables only depend on one other variable.
As a basic diagnostic, we check that each coordinate is mean zero and the the empirical covariance matrices of $X$ and $\tilde{X}$ are close. As the number of samples increases, the empirical covariance of $X$ and $\tilde{X}$ will converge to the same population covariance.
#largest mean of the columns of X
print(np.max(np.abs(np.mean(bigmatrix[:, 0:p], axis = 0))))
#largest mean of the columns of Xk
print(np.max(np.abs(np.mean(bigmatrix[:, p:(2*p)], axis = 0))))
#empirical correlation matrices
S = np.corrcoef(bigmatrix[:, 0:p].T)
Sk = np.corrcoef(bigmatrix[:, p:(2*p)].T)
#largest difference in population correlation
np.max(np.abs(S - Sk))
We can visualize the difference in the empirical covariance matrices of $X$ and $\tilde{X}$ with a heatmap.
S = np.corrcoef(bigmatrix[:, 0:p].T)
Sk = np.corrcoef(bigmatrix[:, p:(2*p)].T)
ax = sns.heatmap(S - Sk, vmin = -1, vmax = 1, cmap="coolwarm")
We see that the difference in the empirical correlation matrix of $X$ and the empirical correlation matrix of $\tilde{X}$ are small, as expected, since $X$ and $\tilde{X}$ come from the same distribution.
We can also generate knockoffs with the covariance-guided proposals (Section 3.2 of the paper) with the SCEP_MH_COV function. These proposals are motivated by a Gaussian approximation (indeed, they are optimal for Gaussian distributions), but they produce exact knockoffs because of the Metropolis--Hastings correction.
# solve the sdp to find the
# optimal proposal parameters
param_list = compute_proposals(rhos)
bigmatrix_cov = bigmatrix.copy() #use the samples from MTM
#generate the covariance-guided knockoffs
start = time.time()
for i in range(numsamples):
#generate a knockoff for the observation
bigmatrix_cov[i, p:(2*p)] = SCEP_MH_COV(bigmatrix_cov[i, 0:p], 0.999,
np.zeros(p), rhos, param_list)
end = time.time()
print("Average time per observation + knockoff (seconds): " + \
'%.3f'%((end - start) / numsamples))
np.shape(bigmatrix_cov)
We can compute the mean correlation between $X_j$ and $\tilde{X}_j$ to measure the knockoff quality:
cors = []
for j in range(p):
cors += [np.corrcoef(bigmatrix_cov[:, j], bigmatrix_cov[:, j + p])[0, 1]]
np.mean(cors)
We can also plot the correlation between $X_j$ and $\tilde{X}_j$ across different coordinates:
plt.scatter(range(p), cors)
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.ylim(0,1)
plt.ylabel(r"cor$(X_j, \tilde{X_j})$")
plt.xlabel("variable index")
plt.show()
We see that the first and last knockoff have higher quality, since these variables only depend on one other variable.
As a basic diagnostic, we check that each coordinate is mean zero and the the empirical covariance matrices of $X$ and $\tilde{X}$ are close. As the number of samples increases, the empirical covariance of $X$ and $\tilde{X}$ will converge to the same population covariance.
#largest mean of the columns of X
print(np.max(np.abs(np.mean(bigmatrix_cov[:, 0:p], axis = 0))))
#largest mean of the columns of Xk
print(np.max(np.abs(np.mean(bigmatrix_cov[:, p:(2*p)], axis = 0))))
#empirical correlation matrices
S = np.corrcoef(bigmatrix_cov[:, 0:p].T)
Sk = np.corrcoef(bigmatrix_cov[:, p:(2*p)].T)
#largest difference in population correlation
np.max(np.abs(S - Sk))
We can visualize the difference in the empirical covariance matrices of $X$ and $\tilde{X}$ with a heatmap.
S = np.corrcoef(bigmatrix_cov[:, 0:p].T)
Sk = np.corrcoef(bigmatrix_cov[:, p:(2*p)].T)
sns.heatmap(S - Sk, vmin = -1, vmax = 1, cmap="coolwarm")
We see that the difference in the empirical correlation matrix of $X$ and the empirical correlation matrix of $\tilde{X}$ are small, as expected, since $X$ and $\tilde{X}$ come from the same distribution.