David Kewei Lin

Welcome to my personal site!

A better bound for the mean-field gap

$\providecommand\opnorm[1]{\|#1\|_\mathrm{op}}$

$\providecommand\eps\varepsilon\providecommand\E{\mathbb E} \providecommand\GAP{\mathsf{GAP}} \providecommand\tr{\mathop{\mathrm{tr}}} \providecommand{\diag}{\mathop{\mathrm{diag}}} \providecommand\norm[1]{\left\|#1\right\|} \providecommand{\Var}{\mathrm{Var}} \providecommand\lrangle[1]{\langle#1\rangle}$

In a previous blogpost I proved a mean-field gap bound for PSD matrices in the high temperature regime that has the correct dependence on the temperature $O(\beta^2)$. I recently revisited this problem and found a much better tool to use that allows us to not only do this for all matrices $J$ but also deal with an external field $h$.

Statement

For the partition function $$Z = \sum_{\sigma\in \{-1,1\}^N} \exp\left(\frac \beta 2 \sigma^\top J \sigma + \lrangle{h,\sigma}\right)$$ then we have the following gap bound: $$\GAP \le C\beta^2 \|J^\circ\|_F^2 \qquad \text{when }\beta \le C'\|J^\circ\|_2$$ where $C,C'>0$ are absolute constants and $J^\circ$ is the off-diagonal part of $J$.

Recall the mean-field approximation:

$$ \GAP = \log Z - \sup_\nu \left\{E_\nu{f} + H(\nu)\right\} $$

Now, plugging $\nu = \mu_h$, we get the following:

$$ \begin{align*} \GAP &\le \log Z - \E_{\mu_h}\left[\frac \beta 2 \sigma^\top J \sigma + \lrangle{h,\sigma}\right] - \left(\log Z_h - \E_{\mu_h} [\lrangle{h,\sigma}]\right) \\ &= \log \E_{\mu_h} \exp\left(\frac \beta 2 \sigma^\top J \sigma - \E_{\mu_h}\left[ \frac \beta 2 \sigma^\top J \sigma\right]\right) \end{align*} $$

This is really saying that we need $\sigma^\top J \sigma$ to be subgaussian with a variance proxy proportional to $\beta^2$. I have actually blogged about this interpretation before here. Turns out, there’s an actual theorem that’s exactly about showing that quadratic forms are subgaussian!

So we’re able to modify it and prove something like this:

(Tilted Hanson-Wright, MGF version) Let $\sigma_i$ be independent $\pm 1$ variables with mean $m_i$. Then, there exist universal constants $c,C>0$ such that for all $|\lambda|\le c/\|J\|_{\rm op}$,

$$ \log \mathbb E\exp\left\{\lambda \sum_{i\lt j}J_{ij}(\sigma_i\sigma_j - m_i m_j)\right\} \le C\|J\|_F^2. $$

And then you can sort of see that we’re done. But where does this lemma come from?

Subgaussianity

First, a quick note on subgaussianity. A random variable $X$ is said to be subgaussian if its tails decay faster than that of some Gaussian variable - in particular there’s the idea of a variance proxy (denoted $K^2$), and so this would mean (concretely) that $$P(|X| \ge t)\le 2\exp(-t^2K^2/2) = P(|K\gamma|\ge t)$$ for $\gamma \sim \mathcal N(0,1)$. This means that there are a bunch of inequalities that mean the same thing, including:

where $K_1, K_2$ are within some fixed constant factor of $K$. (There are actually a bunch of these definitions.)

So what I failed to realize when I was looking at a theorem like this:

… I was also really looking at some equivalent bound on the MGF, which is something that is much more useful for me.

In particular, the first criterion means that we can switch $X-\E [X]$ out for $K_1\gamma$, where $\gamma$ is an independent standard normal: i.e. $$\E[e^{t(X-\E [X])}] \le \E[e^{tK_1\gamma}].$$ In particular, $t$ could be a random variable indepdent from $X$! (Or to the extreme: as long as $X$ is conditionally subgaussian.)

Chaos

Chaos is the term for a quadratic form of random variables (so what we’re dealing with here could be called Rademacher chaos). I outline the approach from Rudelson-Vershynin to prove that chaoses are subgaussian: the two ideas are (1) a known trick from HDP Ch 6.1 and (2) switching out things for independent gaussians. The first trick is the decoupling for chaoses:

(Decoupling argument) Let $X = (X_1, \ldots, X_n)$ be a mean-zero random vector with independent coordinates. Then, $$\E \exp(X^\top A X) \le \E \exp(4X^\top A X')$$ where $X'$ is an independent copy of $X$.

WLOG we remove the diagonal of $A$ and make it upper triangular (so we only care about $A_{ij}$ for $i

In this way, we have partitioned the coordinates into two independent parts: so distributionally we can switch out the indices of $S^c$ for their copies in $X'$ to get $$\E \exp(X^\top A X) = \E \exp\left(4\sum_{i\in S, j\notin S} A_{ij} X_i X_j'\right)$$

To finish, we consider the rest of the RHS sum:

$$X^\top A X' = \underbrace{\sum_{i\in S, j\notin S}}_{Y} + \underbrace{\sum_{i\in S, j\in S} + \sum_{i\notin S, j\in S} + \sum_{i\notin S, j\notin S}}_{Z}$$

Now, notice that $Z$ conditioned on $\{X_i\}_{i\in S}$ and $\{X_j'\}_{j\in S^c}$ has mean 0, so $Z$ conditioned on $Y$ has mean 0. This means that $\E \exp(4Y) \le \E \exp(4(Y+Z))$, and so the lemma is proven.


The implication here is that if $X_i$ were subgaussian (say, with variance proxy $K^2$), then one can comfortably bound the RHS. Recall from earlier that this means that $\E \exp(tX_i) \le \exp(t^2K^2/2)$. Thus, we can bound in the following way (removing the constant $4$ for convenience): $$ \begin{align*} \E \exp(X^\top A X') &\le \E \exp\left(K X^\top A \gamma'\right) \\ &= \E \exp\left(K^2 \gamma^\top A \gamma'\right)\\ &= \E \exp\left(K^4 \|A \gamma'\|_2^2/2\right)\\ &= \det(I - K^4 A^\top A)^{-1/2} \end{align*} $$ where we require that $K^2 \|A\|_2< 1$. Furthermore, by expansion we get a bound which is a multiple of $\tr(A^\top A) =\|A\|_F^2$.

This is exactly the kind of bound that we want to do here with some mild adjustments - we mainly need to account for the fact that $\sigma$ is uncentered under $\mu_h$.

The mild adjustment(s)

Without loss of generality, switch out $J$ for just the off-diagonal part. (We can do this, because the net contribution of any diagonal term is zero.)

Start from the left hand side here: $$\log \mathbb E\exp\left(\lambda \sigma^\top J \sigma - m^\top J m\right).$$ Notice that if we had a separate copy of $\sigma$, then $$\sigma^\top J \sigma - m^\top J m = \E_{\sigma'} \left[(\sigma-\sigma')^\top J (\sigma-\sigma')\right].$$

Therefore, $$\log \mathbb E\exp\left(\lambda \sigma^\top J \sigma - m^\top J m\right) \le \log \mathbb E\exp\left(\lambda (\sigma-\sigma')^\top J (\sigma-\sigma')\right).$$

Now we need a way to measure how subgaussian $Y_i := \sigma_i-\sigma'_i$ is. Recall that we’re looking for some $K_i$ where $$\E[\exp(tY_i)] \le \exp(t^2K_i^2/2) = \E[\exp(tK_i\gamma)].$$

A simple thing to use is that $Y_i$ has variance proxy $1$ (which is the fact used to prove Hoeffding’s inequality): $$ \begin{align*} &\log \mathbb E\exp\left(\lambda (\sigma-\sigma')^\top J (\sigma-\sigma')\right) \\ &= \log \E \exp(\lambda Y^\top J Y)\\ &= \log \E \exp(4\lambda Y^\top J Y') \\ &\le \det(I-16\lambda^2 J^\top J) = O(\lambda^2 \|J\|_F^2) \end{align*} $$

where the last line comes from applying the centered Hanson-Wright lemma on the matrix $\lambda J$. So our tilted version applies.

What’s left?

For $h=0$, this basically gives a gap bound which is within a constant factor of optimal (and in particular, has the right dependence on $\beta$). We know this because of the quenched spin-glass construction. However, with nonzero $h$ the correct answer is actually $$\frac {\beta^2} 4 \sum_{i\lt j} J_{ij} (1-m_im_j).$$

Unfortunately, there is no chance that we obtain this bound with the same technique - in the symmetrization step we lose the “tightness” of the second derivative that is required. I think we start to stand a chance if we can understand how to perform the decoupling in a more general (and maybe optimal) way.