In these notes, we’ll study the pivot \[ \begin{aligned} {\cal P}(D, \hat{\theta}, \theta^*) = \frac{\int_{\hat{\theta}}^{\infty} \pi(t, D - \Gamma \hat{\theta}) \phi((t-\theta^*) / \sigma^*) \; dt} {\int_{-\infty}^{\infty} \pi(t, D - \Gamma \hat{\theta}) \phi((t-\theta^*) / \sigma^*) \; dt} \end{aligned} \] for \[ \pi(t,n) = \pi_q(t,n) = P((\omega,n + \Gamma t) \in E_q \vert n, t) \] for selection event \(E_q\).
We want a selective CLT for \(g \in C^3\) \[ |E_F^*[g({\cal P})] - E_{\Phi}^*[g(\cal P)] | \leq C(n, g, \pi, \dots) \qquad \forall F=F_n \in {\cal M}_n \]
Almost enough to get a uniform result (but not a rate) – (anticoncentration?)
Gaussian?
Log-lipschitz density (i.e. Laplace mechanism)?
Basic approach is standard i.e. compare expectation of \(C^3\) functions of \({\cal P}\) under a Gaussian \(\Phi=\Phi_n\) and a pre-asymptotic \(F_n\).
We assume \(\omega \sim N(0, I_p)\), linearly transforming \(D\) if needed.
We’ll make assumption \(A=A_q\) is an affine set \[ A = \left\{z: L_Az \leq b_A \right\}. \]
Define the \(\Phi\)-normalized selective likelihood ratio \[ \ell^*_{E_F[D]}(D) = \frac{\pi(\hat{\theta},D-\Gamma \hat{\theta})}{E_{\Phi_{E_F[D]}}[\pi(\hat{\theta},D-\Gamma \hat{\theta})]}. \]
It is clear we will need to understand laws like \(N(\mu, I)\) restricted to \(C\) i.e. \[ \frac{d\Phi^*_{\mu}}{d\Phi}(z) \propto e^{\mu^Tz} 1_C(z) \] with \(\Phi=N(0,I)\).
These laws shop in evaluating derivatives of \(\log({\cal P}), \log(\ell^*), \dots\)
Observation: these derivatives are closely related to CGFs of restricted Gaussian.
Pivot can be rewritten \[ \begin{aligned} {\cal P}(D, \hat{\theta}, \theta^*) &= \frac{\int_{Z}^{\infty} P[\omega + {\cal N}_0 + \Gamma_0 Z \in A \vert {\cal N}_0=n, Z=t] \phi(t) \; dt}{\int_{-\infty}^{\infty} P[\omega + {\cal N}_0 + \Gamma_0 Z \in A \vert {\cal N}_0=n, Z=t] \phi(t) \; dt} \\ %&= \frac{\int_{0}^{\infty} E[1_A(\omega + \Gamma_0 t + {\cal N}_0 + \Gamma_0 Z) e^{-tZ - Z^2/2} \vert {\cal N}_0, Z] \phi(t) \; dt}{\int_{-\infty}^{\infty} E[1_A(\omega + \Gamma_0 t + {\cal N}_0 + \Gamma_0 Z) e^{-tZ - Z^2/2} \vert {\cal N}_0, Z] \phi(t) \; dt} \\ %&= \frac{\int_{0}^{\infty} \int 1_A(\omega + \Gamma_0 t ) e^{-tZ - Z^2/2} e^{\omega^T({\cal N}_0 + \Gamma_0 Z) - \frac{1}{2} \|{\cal N}_0 + \Gamma_0 Z\|^2_2} \; \phi(\omega) \phi(t) \; d\omega \; dt}{\int_{-\infty}^{\infty} 1_A(\omega + \Gamma_0 t ) e^{-tZ - Z^2/2} e^{\omega^T({\cal N}_0 + \Gamma_0 Z) - \frac{1}{2} \|{\cal N}_0 + \Gamma_0 Z\|^2_2} \phi(\omega) \phi(t) \; d\omega \; dt} \\ &= \frac{\int_{0}^{\infty} \int_{\mathbb{R}^p} 1_A(\omega + \Gamma_0 t ) e^{-tZ - Z^2/2} e^{\omega^TD - \frac{1}{2} \|D\|^2_2} \; \phi(\omega) \; \phi(t) \; d\omega \; dt}{\int_{-\infty}^{\infty} \int_{\mathbb{R}^p} 1_A(\omega + \Gamma_0 t ) e^{-tZ - Z^2/2} e^{\omega^TD - \frac{1}{2} \|D\|^2_2} \phi(\omega) \; \phi(t) \; d\omega \; dt} \\ \end{aligned} \]
Where \({\cal N}_0 = D - \Gamma_0 Z = {\cal N} + \Gamma \cdot \theta^*\) and \(\Gamma_0= \sigma \Gamma\).
This is slightly different notation from usual…we will proceed going forward with \(\Gamma=\Gamma_0\).
We see \(\log({\cal P})\) is a difference of CGFs!
Each numerator and denominator is a CGF minus a quadratic in \((D, Z)\) (which cancel)…
We can write \[ \ell^* = \frac{e^{-\|D\|^2/2} E[e^{D^T\omega} 1_A(\omega) \vert D]}{E_{\Phi_{E_F[D]}}\left[e^{-\|D\|^2/2} E[e^{D^T\omega} 1_A(\omega) \vert D]\right]} \]
We see \(\log(\ell^*)\) is a CGF minus a quadratic…
Each restricted Gaussian \(\Phi^*_{\mu}\) has a corresponding projection \[ \pi_C(\mu) = \text{argmin}_{z \in C} \frac{1}{2} \|z-\mu\|^2_2 \]
When \(\mu\) is far from \(C\), \(\Phi^*_{\mu}\) concentrates around \(\pi_C(\mu)\).
How good a proxy is it?
What about higher moments?
Gaussian law \(\Phi_{\mu}\) has bounded cumulants of order 2 and higher – uniformly in \(\mu\)? What about \(\Phi_{\mu}^*\)?
Claim 1: \(E_{\Phi^*_{\mu}}[Z] - \pi_C(\mu)\) is bounded uniformly in \(\mu\) (bound depends on \(C\))
Claim 2: Cumulants of order 2 and higher of \(\Phi^*_{\mu}\) are bounded uniformly in \(\mu\) (essentially a corollary of Claim 1 + concentration around \(\pi_C(\mu)\)).
\(\implies\) first derivative of \(\log( {\cal P})\) are bounded by difference of projections of \((-Z,D)\) (an affine function of \(D\)).
\(\implies\) derivatives of order 2 and higher of \(\log({\cal P})\) and \(\log(\ell^*)\) uniformly bounded in \(D, E_F[D]\).
Numerator and denominator of \(\ell^*\) related to \[ (D,\omega) \sim N\left( \begin{pmatrix} E_F[D] \\ 0 \end{pmatrix}, \begin{pmatrix} \Sigma & 0 \\ 0 & I \end{pmatrix} \right) = \Phi_n \] restricted to \(\omega + D \in A\).
Set \[ (d^*(\mu), \omega^*(\mu)) = \text{argmin}_{(d,\omega): d+\omega \in A} \frac{1}{2}(d-\mu)^T\Sigma^{-1}(d-\mu) + \frac{1}{2} \|\omega\|^2_2 \]
Under \(\Phi_n^*\), \(D-d^*(E_F[D])\) has bounded moments of all orders uniformly in \(\mu\).
Differentiating \[ \begin{aligned} \|\nabla \log (\ell^*(D))\|_2 &= \|D- \pi_A(D)\|_2 + C(A) \\ %&= \|D - d^*(E_F[D]) + d^*(E_F[D]) - \pi_A(D) + \pi_A(d^*(E_F[D])) - %\pi_A(d^*(E_F[D]))\|_2 + C(A) \\ %& \leq \|D-d^*(E_F[D])\|_2 + \|\pi_A(D) - \pi_A(d^*(E_F[D]))\|_2 + %\|d^*(E_F[D]) - \pi_A(d^*(E_F[D]))\|_2 + C(A) \\ & \leq 2 \|D-d^*(E_F[D])\|_2 + \|d^*(E_F[D]) - \pi_A(d^*(E_F[D]))\|_2 + C(A) \end{aligned} \]
When \(E_F[D]\) is such that event is rare under \(\Phi^*_n\) this second term grows… a selection effect!
Recognizing \((D,Z)\) is an affine function of \(D\), define \[ \begin{aligned} (t^-(d), \omega^-(d)) &= \text{argmin}_{t,\omega: \omega + \Gamma t \in A} \frac{1}{2}\|\omega - d\|^2_2 + \frac{1}{2} (t+z)^2 \\ (t^+(d), \omega^+(d))&= \text{argmin}_{t,\omega: \omega + \Gamma t \in A, t \geq 0} \frac{1}{2}\|\omega - d\|^2_2 + \frac{1}{2} (t+z)^2 \\ \end{aligned} \]
Note \[(t^+(d), \omega^+(d))= \begin{cases}(t^-(d), \omega^-(d)) & t^-(d) > 0 \\ (0, \pi_A(d)) & t^-(d) \leq 0 \end{cases} \]
Note \[ \omega^-(d) - \pi_A(d) = \pi_A(d - \Gamma t^-(d)) - \Gamma t^-(d) - \pi_A(d) \]
We see then that \[ \|(t^+(d),\omega^+(d)) - (t^-(d),\omega^-(d))\|_2 \leq (1 + 2 \Gamma) |t^-(d)| \]
The pair \((t^-(d), \omega^-(d))\) can be expressed in terms of \[ (t^0(x), \omega^0(x)) = \text{argmin}_{(t,\omega): \omega + \Gamma t \in A} \frac{1}{2} \|\omega - x\|^2_2 + \frac{1}{2} t^2. \]
The relation being \[ (t^-(d), \omega^-(d)) = (t^0(d-\Gamma z) - z, \omega^0(d-\Gamma z) + \Gamma z) \]
Define \[ \begin{aligned} t^*(\mu) &= \frac{L (d^*(\mu) - \mu)}{\sigma} \\ n^*(\mu) &= d^*(\mu) - \Gamma t^*(\mu) \end{aligned} \]
We can re-express denominator of \(\ell^*\) in terms of \[ (t^*(\mu), n^*(\mu), \omega^*(\mu)) = \text{argmin}_{t,n,\omega: \omega + n + \Gamma t \in A} \frac{1}{2} \|\omega\|^2_2 + \frac{t^2}{2} + (n-\mu)^T\Sigma_{\cal N}(n-\mu) \]
Claim: \(t^0(n^*(\mu)) = t^*(\mu)\).
We conclude that \[ |t^-(d)| \leq |z-t^*(E_F[D])| + |t^0(n)-t^0(n^*(E_F[D]))| \leq |z-t^*(E_F[D])| + \|n-n^*(E_F[D])\|_2 \leq C \|d-d^*(E_F[D])\|_2 \]
Under \(\Phi^*\) the difference of projections is controlled uniformly in \(E_F[D]\)!
First derivatives of \({\cal P}\) wrt \(D\) are bounded in norm by \[ C_1(A) \|D-d^*(E_F[D])\|_2 + C_2(A) \]
First derivative of \(\ell^*\) wrt \(D\) are bounded in norm by \[ C_3(A)\left( \|D-d^*(E_F[D])\|_2 + \|d^*(E_F[D]) + \pi_A(d^*(E_F[D]))\|_2 \right) + C_4(A) \]
Second and third derivatives of \({\cal P}, \ell^*\) uniformly bounded in norm by \(C(A)\).
\(\implies\) derivatives up to order 3 of \(g({\cal P}(D, \theta^*)) \ell^*(D)\) for \(g \in C^3\) are bounded in norm by \[ \left(C_3(A)\left( \|D-d^*(E_F[D])\|_2 + \|d^*(E_F[D]) + \pi_A(d^*(E_F[D]))\|_2 \right) + C_4(A) \right) \ell^*(D) \]
Actual noise scale \(\tau\) of randomization will show as \(\tau^{-3}\) in third derivatives.
Under \(\Phi^*_{E_F[D]}\) the random variable \(\|D-d^*(E_F[D])\|_2\) has moments bounded uniformly in \(E_F[D]\).
Assume \(\|D-d^*(E_F[D])\|_2\) has similar moments under \(F^*=F^*_n\) and \[ |E_F[\ell^*] - E_{\Phi}^*[\ell^*]| = |E_{F_n}^*[\ell^*] - E_{\Phi_n}^*[\ell^*]| = |E_{F_n}^*[\ell^*] - 1| \to 0 \]
\(\implies\) expectations of \(g({\cal P})\) under \(F^*\) for \(g \in C^3\) can be controlled (Selective CLT)
We have hidden \(n\) throughout… we should think of \(D_n=n^{1/2}\bar{\tilde{D}}_n\)
\(\implies\) if \(\mu_n=E_{F_n}[\tilde{D}]\) then \(E_{F_n}[D_n] = n^{1/2}\mu_n\).
If \(\mu_n = O(n^{-1/2})\) then we are in local asymptotics \[ \limsup_n \|E_{F_n}[D_n]\|_2 = O(1) \]
\(\implies\) Gaussian would concentrate around \(d^*(E_{F_n}[D_n]) = O(1)\)
\(\implies\) selection effect \(d^*(E_{F_n}[D_n]) - \pi_A(d^*(E_{F_n}[D_n]))\) is \(O(1)\).
What if we want to allow rare selection, i.e. \(\|d^*(E_{F_n}[D_n]) - \pi_A(d^*(E_{F_n}[D_n]))\|_2 \gg 1\)?
From bound on derivative of \(\ell^*\) we should get a bound still if \(\|d^*(E_{F_n}[D_n]) - \pi_A(d^*(E_{F_n}[D_n]))\|_2 = o(n^{-1/2})\) and we have concentration \(D_n\) at \(d^*(E_{F_n}[D_n])\).
Note: in this setting \(\|d^*(E_{F_n}[D_n])\|_2 \to \infty\) so \(D_n\) moves away from 0 but is near \(d^*(E_{F_n}[D_n])\)…
This is moderate deviations regime.
Exponential moments of \(\tilde{D}\) are sufficient.
Recall \[ \pi(t,n) = \int_{A - n - \Gamma t} g(\omega) \; d\omega = \int_{A - d} g(\omega) \; d\omega \]
Suppose density \(\psi\) of \(\omega\) is such that (and \(\log \psi\) smooth) \[ |\log \psi(x) - \log \psi(y)| \leq C \|y-x\|_2 \]
\(\implies\) \[ e^{ - C \|D-E_F[D]\|_2} \frac{\int_{A-D} g(\omega) \; d\omega}{\int_{A-E_F[D]} g(\omega) \; d\omega} \leq e^{ C \|D-E_F[D]\|_2} \]
No selection effect in anolog of \(\ell^*\).
If \(D-E_F[D]\) is subgaussian then expectations of \(g({\cal P})\) under \(F^*\) for \(g \in C^3\) can similarly be controlled (Selective CLT).
Let \[ \begin{aligned} \kappa^1_C(\mu) &= \frac{\int \omega \cdot 1_C(\omega) \phi(\omega-\mu) \; d\omega} {\int 1_C(\omega) \phi(\omega-\mu) \; d\omega} \\ \Delta_C(\mu) &= \frac{\int (\omega - \pi_C(\mu)) \cdot 1_C(\omega) \phi(\omega-\mu) \; d\omega} {\int 1_C(\omega) \phi(\omega-\mu) \; d\omega} \\ %&= \frac{\int (\omega - \pi_C(\mu)) \cdot 1_C(\omega) \frac{\phi(\omega-\mu)}{\phi(\omega - \pi_C(\mu))} \exp(\|\mu-\pi_C(\mu)\|^2_2/2) \phi(\omega - \pi_C(\mu)) \; d\omega} %{\int 1_C(\omega) \frac{\phi(\omega-\mu)}{\phi(\omega - \pi_C(\mu))} \exp(\|\mu-\pi_C(\mu)\|^2_2/2) \phi(\omega - \pi_C(\mu)) \; d\omega}. \end{aligned} \]
Now, \[ \begin{aligned} \frac{\phi(\omega-\mu)}{\phi(\omega - \pi_C(\mu))} \exp(\|\mu-\pi_C(\mu)\|^2_2/2) %&= \exp \left(\omega^T(\mu - \pi_C(\mu)) - \frac{1}{2} \|\mu\|^2_2 + \frac{1}{2} \|\pi_C(\mu)\|^2_2 + \frac{1}{2} \|\mu-\pi_C(\mu)\|^2_2\right) \\ %&= \exp \left(\omega^T(\mu - \pi_C(\mu)) + \|\pi_C(\mu)\|^2_2 - \mu^T\pi_C(\mu) \right) \\ %&= \exp \left(\omega^T(\mu - \pi_C(\mu)) - \pi_C(\mu)^T(\mu-\pi_C(\mu)) \right) \\ &= \exp \left((\omega-\pi_C(\mu))^T(\mu - \pi_C(\mu)) \right) \end{aligned} \]
For convex \(C\) fix a base point \(x \in C\).
Associated to \(x\) is \(S_xC\) cone generated by \[ \left\{y-x: y \in C \right\} \]
Its polar \[ N_xC = \left\{v: v^Tu \leq u \forall u \in S_xC \right\} \]
KKT conditions for \(\pi_C\) imply \[ \mu-\pi_C(\mu) \in N_{\pi_C(\mu)}(C) \]
\(\implies\) for any \(\omega \in C\) and any \(\mu\) \[ (\omega - \pi_C(\mu))^T(\mu-\pi_C(\mu)) \leq 0. \]
Finally, recall that for random variables \(U, V \geq 0\) \[ \frac{E[U e^{-UC}]}{E[e^{-UC}]} \leq E[U]. \]
Cumulants can be expressed in terms of central moments.
Central moments can be bounded by approximate central moments and \(\|\Delta_C(\mu)\|_2\) \[ \frac{\int \otimes^k(\omega - \pi_C(\mu)) \cdot 1_C(\omega) \exp \left((\omega-\pi_C(\mu))^T(\mu-\pi_C(\mu)) \right) \phi(\omega - \pi_C(\mu)) \; d\omega}{\int 1_C(\omega) \exp \left((\omega-\pi_C(\mu))^T(\mu-\pi_C(\mu)) \right) \phi(\omega - \pi_C(\mu)) \; d\omega} \]
Approximate central moments bounded in norm by \[ \sup_{x \in C} \frac{E[\|Z\|^k_2 1_{C_x}(Z)]}{E[ 1_{C_x}(Z)]} \leq C_{\phi} \cdot \frac{E[\|Z\|^k_2]}{\inf_{x \in C}\lambda(C_x \cap B(0,1))}. \] ## Asymptotics
Suppose we can embed \(F\) in an exponential family \[ \frac{dF_{\eta}}{dF} = \exp\left(D^T\eta - \Lambda(\eta) \right) \] and restrict to \(C\) with \(F=F_0\).
The analog of the projection considers exponential tilts of \(F_0\) so that the new mean is in \(C\).
Projection is determined by convex conjugate (Fenchel-Legendre transform) is \[ \begin{aligned} \Lambda^*(u) &= \sup_{\zeta} u^T\zeta - \Lambda(\zeta) \\ \end{aligned} \]
Corresponding optimization problem \[ u^*(F) = u^*_C(F) = \text{argmin}_{u \in C} \Lambda^*(u) \]
KKT conditions \[ \eta^*(F) + \nabla \Lambda^*(u^*(F)) = 0, \qquad \eta^* \in N_{u^*}C. \]
\(\implies\) Tilted law \(F_{-\eta^*}\) has mean \(u^* \in C\). ## Asymptotics
Set \[ \begin{aligned} \Delta_C(F) &= \frac{E_F[(D - u^*) 1_C(D)]}{E_F[1_C(D)]} \\ &= \frac{E_{F_{-\eta^*}}[e^{D^T\eta^*}(D - u^*) 1_C(D)]}{E_{F_{-\eta^*}}[e^{D^T\eta^*} 1_C(D)]} \\ &= \frac{E_{F_{-\eta^*}}[e^{(D-u^*)^T\eta^*}(D - u^*) 1_C(D)]}{E_{F_{-\eta^*}}[e^{(D-u^*)^T\eta^*}1_C(D)]} \\ \end{aligned} \]
Similar to Gaussian, we have \[ \|\Delta_C(F)\|_2 \leq \frac{E_{F_{-\eta^*(F)}}[\|D - u^*(F)\|_2 1_C(D)]}{E_{F_{-\eta^*(F)}}[ 1_C(D)]} \]
To get a bound we could use Cauchy-Schwarz in numerator and assume \[ \sup_{\eta} \|\nabla^2 \Lambda(\eta)\| \leq K \]
Denominator can be assumed to be bounded below \[ E_{F}[e^{-D^T\eta^*(F) + \Lambda(\eta^*(F))} 1_C(D)] \geq K' \]
Approximate centered moments can be bounded similarly, bounded in norm by \[ \frac{E_{F_{-\eta^*(F)}}[\|D - u^*(F)\|^k_2 1_C(D)]}{E_{F_{-\eta^*(F)}}[ 1_C(D)]} \]
All of this presumes \(u^*(F)\) exists…
\(\implies\) the law \(F^*\) concentrates around \(u^*(F)\). If \(d^*(E_F[D])-u^*(F)\) is bounded, then \(F^*\) concentrates around \(d^*(E_D[F])\). (Note: \(u^*(\Phi_{E_F[D]})=d^*(E_D[F])\)).
Quantity \(d^*(E_D[F])-u^*(F)\) can be controlled by studying related optimization problem…