David Kewei Lin

Welcome to my personal site!

A bound for the mean-field gap.

In this article, we present a novel result for the mean-field gap bound in the case of PSD matrices. In particular, it is the only known bound with the “correct” dependence on the inverse temperature $\beta$ - all results in the literature give a $O(\beta)$ gap, while the optimal construction (using the GOE matrix) is actually $O(\beta^2)$.

$\providecommand\opnorm[1]{\|#1\|_\mathrm{op}}$

$\providecommand\eps\varepsilon\providecommand\E{\mathbb E} \providecommand\GAP{\mathsf{GAP}} \providecommand\tr{\mathop{\mathrm{tr}}} \providecommand{\diag}{\mathop{\mathrm{diag}}} \providecommand\norm[1]{\left\|#1\right\|} \providecommand{\Var}{\mathrm{Var}}$

Setup.

For a symmetric matrix $J$ and an inverse temperature $\beta \ge 0$, we can define the log-partition function of the Ising model:

$$ Z = \sum_{\sigma \in \{\pm 1\}} e^{\frac \beta 2 \sigma^\top J \sigma} $$

Jensen’s inequality immediately gives $$ Z \ge 2^n \E_{\sigma} [\beta \sigma^\top J \sigma] = 2^n \exp(\tfrac \beta 2\tr J)$$ where $\E_\sigma$ is taken over the uniform distribution on $\{\pm 1\}^n$.

A weighted version of this argument gives the Gibbs’ inequality:

$$\log Z \ge \E_\nu f + H(\nu)$$

and the sup over product distributions gives the mean-field free energy.

In particular, if $\beta < 1/\opnorm J$, then one can check that the RHS is maximized at the uniform distribution over $\{\pm 1\}^n$, which gives us the same naive bound as in Jensen’s inequality.

Physicists often hope that the RHS of ths inequality gives the correct value of the free energy density $\frac 1 n \log Z$. However, we know at least one case where this fails in the “high-temperature regime” (i.e. $\beta$ small): the so-called Gaussian Orthogonal Ensemble.

Take $J_{ij} \sim N(0,1/n)$ independently over all pairs $(i,j), i < j$. We obtain $$E_J[Z] = 2^n \cdot e^{(n-1)\beta^2/4}$$ and if one believes that $Z$ concentrates about the mean, then with high probability $\log Z$ differs from $n\log 2$ by around $n\beta^2 / 4$.

In the literature, many papers seek to bound the mean-field gap $$\GAP = \log Z - \sup_{\nu \text{ prod.}} \left\{ E_\nu f + H(\nu) \right\} $$ but none of these results attain $O(\beta^2)$ in the high-temperature regime.

This article achieves precisely this.

Result

We will first establish the result for PSD matices.

Theorem. Let $\beta\ge 0$ and $J$ be a PSD matrix. Suppose $\beta \opnorm J < 1$. Then we claim that

$$Z \le 2^n \cdot (\det(I-\beta J))^{-1/2}$$

Proof. We will make use of the identity

$$\E_{\gamma \sim N(0,I)} e^{b^\top \gamma +\frac{1}{2}\gamma^\top A\gamma} = e^{-\frac{1}{2} b^\top(I-A)^{-1} b}\cdot \det(I-A)^{-1/2}$$ whenever $\opnorm{A} < 1$. A helpful special case is $$e^{\frac{\beta}{2} x^\top J x} = \E_{\gamma \sim N(0,I)} e^{\langle (\beta J)^{1/2} x, \gamma \rangle }$$

We will also need the inequality $$\log \cosh x = \log \frac{e^x + e^{-x}}{2} \le \frac {x^2} 2.$$

For convenience, we will vectorize functions $\vec{f} (x)= (f(x_i))_{i=1,2,...,n}$.

We compute $$ \begin{aligned} Z &= \sum_{\sigma \in \{\pm 1\}^n} e^{\frac \beta 2 \sigma^\top J \sigma}\\ &= \E_\gamma \sum_{\sigma\in \{\pm 1\}^n} e^{\langle (\beta J)^{1/2} \gamma, \sigma \rangle }\\ &= 2^n \cdot \E_\gamma \prod_i \vec\cosh ((\beta J)^{1/2} \gamma)_i \\ &\le 2^n \cdot \E_\gamma \exp(\frac{\| (\beta J)^{1/2} \gamma\|^2}{2}) \\ \end{aligned} $$ as desired. $\blacksquare$

It is not so obvious, but this is indeed the desired $O(\beta^2)$ bound. On the interval $[0, 1-\eps)$, there exists some constant $C_\eps$ such that $$-\frac{1}{2}\log (1-\lambda) \le \frac{\lambda}{2} + \frac{\lambda^2}{4} + C_\eps \lambda^3 \le \frac \lambda 2 + C_\eps' \lambda^2 $$ so if $\beta \opnorm{J} < 1-\eps$, $$ \begin{align*} \log Z &\le n\log 2 - \frac{1}{2}\det(1-\beta J)^{-1/2} \\ &\le n\log 2 + \frac{\beta} 2 \tr J + \frac {\beta^2}{4} \|J\|_F^2 + C_\epsilon \cdot \beta^3(\tr J^3)\\ &\le n\log 2 + \frac{\beta} 2 \tr J + C_\eps'\cdot \beta^2 \|J\|_F^2 \end{align*} $$

It is quick to extend this to general matrices $J$, since we note that on the hypercube, $\sigma^\top D \sigma = \tr D$ is constant for diagonal matrices $D$, so $\log Z(J+D) = \log Z + \tr D$.

Corollary. For an arbitrary symmetric matrix $J$, if $\beta \opnorm{J}< \frac 1 2 - \eps$ then $$Z \le 2^n \cdot \det(I-\beta (J + \opnorm{J} I))^{-1/2}$$

so the gap is at most $C_\eps' \cdot \beta^2 \cdot (\|J\|_F^2 + n \opnorm{J}^2)$.

Proof. Note that $J + \opnorm{J} I$ is PSD, so applying the above bound we get the desired answer.

One slight disadvantage is that we expect this to work poorly for low-rank matrices. Say that $J = - \frac 1 n 1 1^\top$, then $\norm{J}_F = 1$ but $\opnorm{J} = 1$ as well, so

Non-zero external field

We can generalize slightly for non-zero external fields. Suppose instead that

$$f(\sigma) = \frac 1 2 \sigma^\top J \sigma + \langle h, \sigma \rangle $$

and again, we would like to bound $Z = \sum_{\sigma \in \{\pm 1\}} e^{\beta f(\sigma)}$.

Theorem. If $\beta \ge 0$, $J$ is $PSD$ and $\beta \opnorm{J} < 1$, then

$$Z \le 2^n \exp\left( \frac{\beta^2}{2} h^\top (1+\beta J)(1-\beta J)^{-1}h \right) \cdot \det(1-\beta J)^{-1/2}$$

Proof. We compute $$ \begin{aligned} Z &= \sum_{\sigma \in \{\pm 1\}^n} e^{\beta \langle h,\sigma \rangle } \cdot e^{\frac \beta 2 \sigma^\top J \sigma}\\ &= \E_\gamma \sum_{\sigma\in \{\pm 1\}^n} e^{\beta \langle h,\sigma \rangle } \cdot e^{\langle (\beta J)^{1/2} \gamma, \sigma \rangle }\\ &= 2^n \cdot \E_\gamma \prod_i \vec\cosh ((\beta J)^{1/2} \gamma + \beta h)_i \\ &\le 2^n \cdot \E_\gamma \exp\left(\frac{\| (\beta J)^{1/2} \gamma + \beta h\|^2}{2}\right) \\ &= 2^n \cdot \exp\left(\frac{\| \beta h \|^2}{2}\right) \cdot \E_\gamma \exp\left( \langle \beta^{3/2}J^{1/2}h,\gamma \rangle + \frac{\beta \gamma^\top J \gamma}{2} \right)\\ &= 2^n \cdot \exp\left( \frac{\|\beta h\|^2}{2} + \frac{\beta^3}{2}h^\top J^{1/2}(1-\beta J)^{-1} J^{1/2} h \right) \cdot \det(1-\beta J)^{-1/2}\\ &= 2^n \cdot \exp\left( \frac{\beta^2}{2} h^\top (1+\beta J)(1-\beta J)^{-1}h \right) \cdot \det(1-\beta J)^{-1/2} \end{aligned} $$ as desired.

Independence of $h$?

Ideally (see Eldan’s bound), the bound should be “independent” of $h$. To see this more clearly, we substitute $\beta h \mapsto h$: so our partition function is instead $$Z = \sum_\sigma \exp\left( \frac{\beta}{2} \sigma^\top J \sigma + \langle h, \sigma \rangle \right)$$

Then the bound is $$Z \le 2^n \cdot \exp\left( \frac{1}{2} h^\top(1+\beta J)(1-\beta J)^{-1}h \right) \cdot \det(1-\beta J)^{-1/2}$$

This is a “big” dependence on $h$ - consider $\beta \to 0$, then we don’t get the right answer. To fix this, we will do a mean-field approximation that is correct at $\beta = 0$: so the distribution we will use is $\xi(\sigma) \propto e^{\langle h, \sigma \rangle }$, or equivalently with mean $\tilde h := \vec\tanh (h)$.

The mean-field approximation gives $$ \providecommand{\Jo}{J^{\circ}} \begin{aligned} \log Z &\ge \E_\xi f + H(\xi) \\ &= \E_\xi f - \langle \tilde h, h \rangle + \langle \tilde h, h \rangle + H(\xi) \\ &= \E_\xi f - \langle \tilde h, h \rangle + \sum_i \log \cosh (h_i) \\ &= \frac{\beta}{2}(\tilde h^\top \Jo \tilde h + \tr J) + \sum_i \log \cosh (h_i) \end{aligned} $$

so our mean-field lower bound looks like $$Z \ge 2^n \prod_i \cosh (h_i) \cdot \exp\left( \frac{\beta}{2}(\tilde h^\top \Jo \tilde h + \tr J) \right)$$

For the upper bound, we just do the same thing, except now we expand about $h$ instead of $0$. So we use the inequality

$$\log \cosh (x+h) \le \log \cosh h + x \cdot \tanh h + \frac{x^2}{2}$$

and so we compute

$$ \begin{align*} Z &= \sum_{\sigma \in \{\pm 1\}^n} e^{ \langle h,\sigma \rangle } \cdot e^{\frac \beta 2 \sigma^\top J \sigma}\\ &= \E_\gamma \sum_{\sigma\in \{\pm 1\}^n} e^{ \langle h,\sigma \rangle } \cdot e^{\langle (\beta J)^{1/2} \gamma, \sigma \rangle }\\ &= 2^n \cdot \E_\gamma \prod_i \vec\cosh ((\beta J)^{1/2} \gamma + h)_i \\ &\le 2^n \prod_i \cosh(h_i) \cdot \E_\gamma \exp\left(\langle (\beta J)^{1/2} \tilde h, \gamma \rangle + \frac \beta 2 \gamma^\top J \gamma \right) \\ &= 2^n \prod_i \cosh(h_i) \cdot \exp\left(\frac{\beta}{2}\tilde h^\top J^{1/2}(I-\beta J)^{-1} J^{1/2} \tilde h\right) \cdot \det(I-\beta J)^{-1/2}\\ &= 2^n \exp\left(\frac \beta 2 \left( \tilde h^\top J \tilde h + \tr J \right) + O(\beta^2 \|J\|_F^2)\right) \end{align*} $$

It almost works, but not quite. On the $\beta$-term, we’re off by $\frac \beta 2 \tilde h^\top J^{diag} \tilde h$. We can further bound this by

$$\GAP \le \frac \beta 2 \tilde h^\top J^{diag} \tilde h + O(\beta^2 \| J\|_F^2) \le \frac \beta 2 \|\tilde h\|^2_\infty \cdot \sum_i |J_{ii}| + O(\beta^2 \|J\|_F^2)$$

The good news is that as $h\to 0$, we recover the original bound. The bad news is that we have an undesirable $O(\beta)$ term. We will show in the next section that the tight quadratic approximation gives the ideal bound.

Effectiveness

Let’s do a quick comparison with [Aug2021], which shows that $\GAP = O(\beta \sqrt{n} \norm{J}_F)$ but for general matrices $J$. Our bound is better on PSD matrices, since $$\sum_{i} |J_{ii}| \le (n\sum |J_{ii}|^2)^{1/2} \le \sqrt{n} \norm{J}_F$$ and much more so if most of the squared weight in $\norm{J}$ is not on the diagonal! The problem is that for non-PSD $J$, doing the same trick only gets us $\beta n\opnorm{J}$, which is worse than their bound.

Thinking back to the GOE matrix example: again, we fall short when we extend from PSD to non-PSD.

The “ideal” bound

Turns out, if you just plug the tight quadratic approximation to $\log\cosh$… $$\log \cosh (x+h) \approx \log \cosh h + x \tanh h + \frac{x^2(1-\tanh^2 h)}{2}$$

Then we just plug this into the calculation we’ve been doing all along (let $D=\diag(1-\tilde h_i^2) $):

$$ \begin{align*} & 2^n \cdot \E_\gamma \prod_i \vec\cosh ((\beta J)^{1/2} \gamma + h)_i \\ &\approx 2^n \prod_i \cosh(h_i) \cdot \E_\gamma \exp\left(\langle (\beta J)^{1/2} \tilde h, \gamma \rangle + \frac \beta 2 \gamma^\top J^{1/2} D J^{1/2} \gamma \right) \\ &= 2^n \prod_i \cosh(h_i) \cdot \exp\left(\frac{\beta}{2}\tilde h^\top J^{1/2}(I-\beta J)^{-1} J^{1/2} \tilde h\right) \cdot \det(I-\beta J D)^{-1/2}\\ &= 2^n \exp\left(\frac \beta 2 \left( \tilde h^\top J \tilde h + \tr (JD) \right) + O(\beta^2 \|J\|_F^2)\right) \end{align*} $$

Then this is precisely correct, since $\tr (JD) = \tr J - \tilde h^\top J^{diag} \tilde h$. Our only lament is that we cannot turn our approximation into a bound…

Further directions

(1) Is there a better distribution for $\gamma$ rather than a multivariate normal?

(2) At the moment we’ve expanded around $\tilde h$, but the true mean-field optimizer is the solution to $m = \tanh (\beta Jm + h)$. I suspect this doesn’t matter since $$m - \tilde h = \tanh(\beta J m + h) - \tanh(h) = \beta J m + O(\beta^3)$$ so $m = (1+\beta J) \tilde h + O(\beta^2)$. This should have no effect on the $O(\beta)$ term in our computation, but I could be wrong.

(3) Direct expansion in terms of $\beta$: We make the $\beta$ explicit in $Z_\beta$. We have

$$\log Z_\beta = \log Z_0 + \beta\cdot \frac{\partial \log Z_\beta}{\partial \beta}|_{\beta = 0} + \int_0^\beta (\beta - \alpha) \frac{\partial^2 \log Z}{\partial \beta^2}|_{\beta = \alpha}\, d\alpha$$

where we abuse notation slightly.

The linear term is the mean-field approximation: it expands into $$\frac{\partial \log Z_\beta}{\partial \beta}|_{\beta = 0} = \E_{\tilde h} \frac{\sigma^\top J \sigma}{2}$$ so the GAP is exactly the second order term. If the second derivative is $O(\beta)$ up to some constant $\beta$ we would also be done.