We discuss Laplace’s method, Gaussian decomposition (the H-S trick), and applications to the replica method and free energy computations.
$\newcommand\E{\mathbb E}$ $\renewcommand\R{\mathbb R}$ $\newcommand\normal{\mathcal N}$ $\newcommand\diag{\mathop{\mathrm {diag}}}$ $\newcommand\tr{\mathop{\mathrm {tr}}}$ $\newcommand\FMF{\mathcal F_{MF}}$ $\newcommand\hax{}$ $\newcommand\ub[2]{\underbrace{#1}_{#2}}$
Let’s say we have a sum that looks like $$S_n = \sum_i a_i^n$$ where $a_i>0$. How does $S_n$ grow for large $n$?
We expect this sum to (1) grow like the order of $A^n$, where $A = \max a_i$, and (2) have leading coefficient equal to the number of $a_i$’s which are equal to $A$. A very quick and cheap reason comes from considering $$S_n / A^n = \sum_i \left( \frac {a_i} A \right)^n$$ and noticing that every term goes to zero unless $a_i = A$.
A very similar thing holds for integrals: $$I_n = \int_a^b e^{nf(x)} dx \sim \sqrt{\frac{2\pi}{-f''(x_0)}} e^{nf(x_0)}$$
where $x_0 = \max f(x_0)$ is assumed to be the unique maximum. Heuristically, around the maximum $x_0$, $f$ expands like $$f(x) \approx f(x_0) + \frac 1 2 f''(x) (x-x_0)^2 $$ so all the points where $f(x) - f(x_0)$ are within $O(1)$ are in the range $x_0\pm c(-f''(x_0))^{-1/2}$.
The constant $\sqrt{2\pi}$ comes from drawing analogy with a Gaussian integral and if we set $f(x) = c(x-x_0)^2$. (The exact computation can be found on Wikipedia… we might revisit this later.)
It makes sense to also provide a Gaussian version: $$\E_\gamma e^{nf(x)} = (-f''(x_0))^{-1/2}\cdot \exp(f(x_0) - \tfrac {x_0} {2})$$ where $x_0$ minimizes the terms in the exponent and $\gamma \sim \mathcal N(0,1)$.
There’s a really good reason to care about this.
Recall the moment generating funcation of a Gaussian: $$\E_\gamma e^{t\gamma} = e^{t^2/2}, \qquad \gamma\sim \mathcal N(0,1)$$ and cheaply we can extend this to $$\E_\gamma e^{t\gamma} = e^{at^2/2}, \qquad \gamma\sim \mathcal N(0,a)$$
The implication here is that “quadratic” exponents can be decomposed as a weighted average of “linear” ones over a Gaussian, and this is fantastic for us, because we can deal with linear exponents easily but not quadratic ones.
This trick is known in physics as the Hubbard-Stratonovich transformation (communicated to me by Jensen).
We previously did this in The Curie-Weiss Model, but let’s see how we can use these ideas to do the same thing and to glean some insight in the process. We would like to compute $$Z = \sum_{\sigma\in \lbrace \pm 1 \rbrace^n} e^{\frac \beta {2n} (\sum\sigma_i)^2}$$ and so let’s express this an average over the Gaussian $\gamma\sim \mathcal N(0,1/n)$: $$= \sum_{\sigma\in \lbrace \pm 1 \rbrace^n}\E_\gamma \exp(\beta^{1/2}\gamma \sum \sigma_i)$$ and of course we can commute the sum and the expectation freely, but since the exponent is linear now it factors: $$= \E_\gamma \prod_i \sum_{\sigma_i\in \lbrace \pm 1 \rbrace} \exp(\beta^{1/2} \gamma) = \E_\gamma \left(\exp(\beta^{1/2}\gamma) + \exp(-\beta^{1/2} \gamma) \right)^n$$ so the crucial “(neg)energy” to be maximized is $$f(\gamma) = \log((\exp(\beta^{1/2} \gamma) + \exp(-\beta^{1/2}\gamma))) - \frac{\gamma^2}{2}$$ Actually, we reparametrize to to make this a little easier (replace $\gamma \mapsto \beta^{1/2}\gamma$): $$f(\gamma) = \log(\exp(\beta \gamma) + \exp(-\beta \gamma)) - \frac{\beta \gamma^2}{2}$$
(One can simplify this a little using $\cosh$, but we only care if it simplifies the computations/expressions.)
When is this maximized? Differentiating gives $$f'(\gamma) = \beta(-\tanh(\beta\gamma) + \gamma)$$
so we spot the usual mean-field relations, and $\gamma$ corresponds to the “mean” of the spins.
Furthermore, we know a priori that the mean-field approximation gives the correct answer, so we do expect that $$\max_m \left\lbrace \frac {\beta m^2} {2} + h\left( \frac {1+m} 2 \right)\right \rbrace = \max_\gamma \left\lbrace \log(e^{\beta \gamma} + e^{-\beta \gamma}) - \frac{\beta \gamma^2} 2 \right \rbrace$$
Shockingly, this is actually true if you plot this in Desmos. Try it!
To make this slightly less mysterious, consider the 1-dimensional Gibbs inequality gives $$\log(e^{-x} + e^x) = mx + h(\tfrac {1+m} 2) $$ where $m = \tanh (x) = \frac{e^x}{e^x + e^{-x}} - \frac{e^{-x}}{e^x + e^{-x}}$, so really the RHS is more like
$$\max_\gamma \left\lbrace \beta \gamma \tanh (\beta\gamma) + h\left( \frac{1+\tanh(\beta\gamma)}{2} \right) - \frac{\beta \gamma^2}{2} \right \rbrace$$ hence plugging in $\gamma = \tanh(\beta\gamma)$ we see a parallel between the two. But why not just wrap it up now? We realize that we’re really just doing $$\max_{m, \gamma} \left\lbrace \beta \gamma m + h\left( \frac{1+m}{2} \right) -\frac{\beta\gamma^2}{2} \right \rbrace $$ two ways!
The computation that inspired this approach actually comes from the replica trick, which I will show here. Just for this section, I will use $s$ instead of $\sigma$ to denote the spins.
Here we consider the Sherrington-Kirkpatrick model. Let our neg-energy be $f(\sigma) = \sum_{A\lt B} J_{AB} s_As_B$, but the interaction terms $J_{AB}$ are random and indepenent with $J_{AB} \sim \normal(0,1/N)$. Naturally, $Z$ also becomes random, and our job is to find the free energy density $\frac 1 N\E \log Z$.
$J_{AB}$ being random has some good consequences. For instance, we can easily compute $\E Z$, since the randomness of the $J$’s allow us to ignore the spins. With some work, we can even do $\E Z^2$.
The first step replica trick requires us to compute $\E Z^n$ for each $n$, and the physical interpetation is that this is the normalizing constant for $n$ copies of the $N$ particle system. (Here we follow the physicists’ convention for the letters. We will use $\sigma_A^i$ to denote that spin of the $A$-th particle in the $i$-th site.)
So here it begins:
$$ \begin{aligned} \E Z^n &= \E \sum_{\lbrace s^i \rbrace} \prod_i \exp\left(\sum_{A\lt B} \beta J_{AB} s_A^i s_B^i\right)\\ &= \sum_{\lbrace s^i \rbrace} \prod_{A\lt B, i} \E \exp(\beta J_{AB}s_A^i s_B^i)\\ &= \sum_{\lbrace s^i \rbrace} \prod_{A\lt B} \E\exp(J_{AB}\cdot \beta\sum_i s_A^i s_B^i ) \\ \end{aligned} $$ Now we use the Gaussian MGF - $\E e^{tJ} = e^{t^2/(2n)}$: $$ \begin{aligned} \hax &= \sum_{\lbrace s^i \rbrace} \prod_{A\lt B}\exp\left(\frac {\beta^2} {2N} \left( \sum_i s_A^i s_B^i \right)^2 \right)\\ &= \sum_{\lbrace s^i \rbrace} \exp\left(\frac {\beta^2} {2N} \sum_{A\lt B} \left( \sum_i s_A^i s_B^i \right)^2 \right)\\ &= \sum_{\lbrace s^i \rbrace} \exp\left(\frac {\beta^2} {2N} \sum_{A\lt B} \left( n + 2\sum_{i\lt j} s_A^i s_B^i s_A^j s_B^j \right) \right)\\ \end{aligned} $$ Now is the crucial part: we swap the order of summation between over $\lbrace A\lt B \rbrace$ and $\lbrace i \lt j \rbrace$: $$ \begin{aligned} \hax &= \sum_{\lbrace s^i \rbrace} \exp\left(\frac {\beta^2} {2N} \left( \frac{nN(N-1)}{2}+ 2\sum_{A\lt B, i\lt j} s_A^i s_B^i s_A^j s_B^j \right) \right)\\ &= \sum_{\lbrace s^i \rbrace} \exp\left(\frac {\beta^2} {2N} \left( \frac{nN(N-1)}{2}-\frac{Nn(n-1)} 2 + \sum_{i\lt j} \left(\sum_{A} s_A^i s_A^j\right)^2 \right) \right)\\ &= \exp(\tfrac{\beta^2n(N-n)}{4}) \cdot \sum_{\lbrace s^i \rbrace} \exp\left(\frac{\beta^2}{2N} \sum_{i\lt j}\left(\sum_{A} s_A^i s_A^j\right)^2\right) \end{aligned} $$ Now we do the only trick we know how to do: let $Q_{ij} \sim N(0,1/N)$ independently. Then, $$ \begin{aligned} \hax &= \exp(...) \cdot \sum_{\lbrace s^i \rbrace} \E_Q \exp\left( \beta \sum_{i\lt j, A} Q_{ij} s_A^i s_A^j \right) \\ &= \exp(...) \cdot \E_Q \left( \sum_{s^\bullet\in \lbrace \pm 1 \rbrace^n} \exp\left( \beta \sum_{i\lt j} Q_{ij} s^i s^j \right) \right)^N\\ \end{aligned} $$ By now we know how this plays out. In the limit $N\to \infty$, the answer is $$\frac{n \beta^2}{4} + \max_Q \left\lbrace \log \left(\sum_{s\in \lbrace \pm 1 \rbrace^n} \exp(\tfrac \beta 2 s^\top Q s) \right) - \frac{\| Q\|^2_F}{2}\right \rbrace $$
The so called “replica-symmetric” situation holds when we expect that all the $Q_{ij}$’s to be the same. But it is known that this is not the right answer at higher $\beta$, and “replica symmetry breaking” happens.
Before we dive into an idea I had, it will be fruitful to consider multivariate versions of the two tricks we’ve seen so far.
Firstly, for a PSD matrix $J$, we have $$\E_{\gamma\sim \normal(0, J)}[e^{\langle v, \gamma\rangle}] = e^{\frac 1 2 v^\top J v}$$
Next, we also have $$\E_{\gamma \sim \normal(0,J)} e^{Nf(\gamma)} \sim\sqrt{\frac{1}{\det J \det (-H(...))}} \exp \left(N\cdot \max_\gamma \underbrace{\left\lbrace f(x) - \frac{1}{2} x^\top J^{-1} x\right \rbrace}_{...} \right)$$
where $H$ is the Hessian.
Notation: back to using $n$ for the number of particles.
Back to the Curie-Weiss computation. This got me thinking a little bit: what if we could just use the eigendecomposition of $J$ and see what we get?
For now, let $J$ be positive semidefinite.
We try to replicate the computation: $$ \begin{aligned} Z &= \sum_\sigma \exp\left(\frac\beta2 \sigma^\top J \sigma\right) \\ &= \E_{\gamma\sim \normal (0,\beta^{-1}J)} \sum_\sigma \exp\left(\beta \langle \gamma, \sigma \rangle \right) \\ &= \E_{\gamma\sim\normal (0,\beta^{-1}J)} \prod_i \left( e^{\beta\gamma_i} + e^{-\beta\gamma_i} \right) \end{aligned} $$
Nothing insane yet. But what if, we did Laplace’s method here? At first sight, there’s nothing wrong, but actually $\gamma$ has $n$ variables, which is scaling with $n$!
Ok, let us suspend our judgement for a second. Maybe the approximation still works, but now we also have to pay attention to the Hessian term, which could scale in $n$ as well. so let’s do some scratch work: the “naive” neg-energy is $$f_1(\gamma) = \sum_i \log (e^{\beta\gamma_i} + e^{-\beta\gamma_i}) - \beta \cdot \frac{\gamma^\top J^{-1} \gamma}{2}$$ Now we get a sense of what the Hessian is: $$H(f_1)(\gamma) = \beta^2 \diag(\tanh^2(\beta \gamma_i)-1) - \beta J^{-1}$$ because $(\log (e^x + e^{-x}))'' = (-\tanh x)' = -(1-\tanh^2 x)$, so writing $D = \diag(1-\tanh^2(\beta \gamma_i))$, the extra term comes to $$\sqrt{\frac{1}{\det (\beta^{-1}J) \det (-H(...))}} = \det(1-\beta JD)^{-1/2}$$ Hence, the true neg-energy is $$f(\gamma) = f_1(\gamma) - \frac{1}{2} \log(1-\beta J D)$$ and perhaps we expand the first two orders using $-\log(1-X) =\sum_{k\ge 1} X^k/k$ for symmetric $X$: $$f(\gamma) = f_1(\gamma) + \frac{\beta}{2} \tr (JD) + \frac{\beta^2}{4} \|D^{1/2}JD^{1/2}\|_F^2 + ...$$
Now, let’s say $\gamma$ maximized $f_1(\gamma)$ (which is not strictly speaking the mean-field). The consistency equations tell us that $$\tanh(\beta \gamma) = J^{-1}\gamma =: m$$ so plugging this back in, we get $$ \begin{aligned} &= f_1(\gamma) + \frac\beta 2 \sum_i J_{ii} (1-m_i^2)+ \frac{\beta^2}{4} \sum_{i,j} J_{ij}^2 (1-m_i^2)(1-m_j^2) + ...\\ &= \ub{\frac{\beta}{2}\E_{\xi(m)} x^\top J x + H(\xi(m))}{\FMF} + \frac{\beta^2}{4} \sum_{i,j} J_{ij}^2 (1-m_i^2)(1-m_j^2) + ... \end{aligned} $$ so the next term is precisely the Onsager correction!
This is also really strong evidence that we should be able to give an upper bound similar to $-\frac{1}{2} (\log(1-\beta J) - \beta\tr J)$.
Some questions left over: