David Kewei Lin

Welcome to my personal site!

Mean-field approximation as the maximum of the regularized log-partition

$\newcommand\E{\mathbb E}$ $\renewcommand\R{\mathbb R}$ $\newcommand\normal{\mathcal N}$ $\newcommand\diag{\mathop{\mathrm {diag}}}$ $\newcommand\tr{\mathop{\mathrm {tr}}}$ $\newcommand\braces[1]{\left\lbrace #1 \right\rbrace}$

I just want to capture a quick idea here.

In my past articles about computing/bounding the log-partition function of the Ising model (for PSD interaction matrices $J$), we used the Gaussian decomposition/H-S trick: $$Z := \sum_{\sigma} \exp(\tfrac 1 2 \beta \sigma^\top J \sigma) = \E_\gamma \sum_{\sigma} \exp(\langle (\beta J)^{1/2}\sigma, \gamma\rangle) = \E_\gamma e^{f(\gamma)} $$ where $$f(\gamma) = n\log 2 + \sum_i \log \cosh(e_i^\top(\beta J)^{1/2} \gamma)$$ and $\gamma \sim N(0,I)$.

The usual Laplace approximation gives us $$\E_\gamma e^{f(\gamma)} \approx \frac{1}{-\det H(f)} \exp \max_{\gamma} \left\lbrace \left( f(\gamma) - \frac{\|\gamma\|^2}{2} \right) \right\rbrace.$$ The approximation holds up to $(1+o(1))$-multiplicative order if $\gamma$ is taken over a fixed space and $f$ scales like $n$. This is unfortunately not quite true for our case.

The interesting thing is as follows: we have the following Gibbs’ inequality bound: $$\log Z \ge \E_\nu (f) + H(\nu)$$ for any product distribution $\nu$ on $\{\pm 1\}^n$, so in particular if it has mean $m\in [-1,1]^n$ then $$\log Z \ge \frac 1 2 \beta m^\top J m + \sum_i h(\tfrac{1+m_i}{2})$$ where $h$ is the binary entropy function $h(p) = -p\log p - (1-p)\log (1-p)$.

Here is the surprise: the one dimensional case of this inequality is

$$\log (e^x + e^{-x}) = \sup_m \lbrace mx + h(\tfrac{1+m}2)\rbrace$$ so by expressing $\frac 1 2 \beta m^\top J m$ as a supremum over affine functions (using its convex conjugate representation), we get $$ \begin{align*} \sup_m \braces{ \frac 1 2 \beta m^\top J m + \sum_i h(\tfrac{1+m_i}{2}) } &= \sup_{m,\gamma} \braces{ \langle m, (\beta J)^{1/2} \rangle - \frac{\|\gamma\|^2}{2} + \sum_i h(\tfrac{1+m_i}{2}) }\\ &= \sup_{\gamma} \braces{ \sum_i \log(e^{e_i^\top (\beta J)^{1/2}\gamma}+e^{-e_i^\top (\beta J)^{1/2}\gamma})- \frac{\|\gamma\|^2}{2}}\\ &= \sup_{\gamma} \braces{f(\gamma) - \frac{\|\gamma\|^2}{2}} \end{align*} $$

In comparison, $$\log \E_\gamma e^{f(\gamma)} = \log \int_{\R^n} \exp(f(\gamma) - \tfrac{\|\gamma\|^2}{2}) d\gamma - \frac n 2\log 2\pi$$

The term $(2\pi)^{n/2}$ corresponds roughly the volume of a ball of radius of order $\sqrt{n}$… I take this to mean that the proper estimation of $\log Z$ involves in fact the “average” value of the functional $f(\gamma) - \|\gamma\|^2/2$ in a ball around the maxima with radius of $\sim\sqrt{n}$. Alternatively, perhaps one thinks of this as related to the volume of the superlevel set around the maxima.