$\providecommand{\E}{\mathbb E} \providecommand{\Unif}{\mathrm {Unif}} \providecommand{\GAP}{\mathsf{GAP}} \providecommand{\norm}[1]{\left\| #1 \right\|}$
We prove Gibbs’ Inequality, which gives a lower bound of the log-partition function in terms of an optimization problem over distributions. This naturally leads us to mean-field approximations.
Pre-requisites: Setup for Statistical Physical Models
Recall that the (neg)energy is defined as
$$f(\sigma) = \sum_{i
Notice that on the uniform distribution over $\{\pm 1\}$, $f(\sigma)$ averages to 0 (since $\sigma_i,\sigma_j$ are independent). Thus by Jensen’s inequality, $$ \begin{align*} \log Z = \log \sum_\sigma e^{\beta f(\sigma)} &= n\log 2 + \log \E_{\sigma \sim \Unif\{\pm 1\}^n} e^{\beta f(\sigma)} \\ &\ge n\log 2 + \beta \cdot \E_{\sigma \sim \Unif\{\pm 1\}^n} f(\sigma) = n\log 2 \end{align*} $$ This is, unfortunately, potentially a very loose bound. In the limit $\beta \to \infty$, the sum in $Z$ is dominated by $e^{\beta\max_{\sigma} f(\sigma)}$, so we expect $\log Z \approx \beta \max_{\sigma} f(\sigma)$, and this bound misses the dependence on $\beta$ completely.
By taking more care with the choice of distribution, we can obtain a much better bound. Let $\mu$ be some distribution over $\{\pm 1\}^n$, then: $$ \begin{align*} \log Z = \log \sum_\sigma e^{\beta f(\sigma)} &= \log \sum_{\sigma} e^{\beta f(\sigma) -\log \mu(\sigma)} \cdot \mu(\sigma)\\ &\ge \sum_{\sigma} (\beta f(\sigma) -\log \mu(\sigma)) \cdot \mu(\sigma)\\ &= \beta \cdot \E_{\sigma \sim \mu} f(\sigma) + H(\mu) \end{align*} $$ where $H(\mu) = \E_{\sigma \sim \mu}[-\log \mu(\sigma)]$ is the entropy of $\mu$. Equality is attained when $\mu(\sigma) \propto e^{\beta H(\sigma)}$, so this is often written as: $$\log Z = \sup_\mu \left\{\beta \cdot \E_{\sigma \sim \mu} f(\sigma) + H(\mu) \right\}$$ where we take supremum over all distributions over the hypercube $\{\pm 1\}^n$.
The space of distributions over $\{\pm 1\}^n$ has dimension $2^n-1$, and we can’t possibly search over it for the best distribution. Is there a smaller distribution class that gives the correct answer?
The mean-field hypothesis is that under certain assumptions, we only have to look at product measures, i.e. the mean field gap $$\GAP := \log Z - \sup_{\xi \text{ prod.}} \{\beta \cdot \E_{\sigma \sim \xi} f(\sigma) + H(\xi) \}$$ is somewhat small. This is much better, since for product measures (over the hypercube $\{\pm 1\}^n$) there are only $n$ free variables to optimize.
When $\beta = 0$, $Z = 2^n$, so $\log Z$ grows like $\Theta(n)$. This is true in general if $\sup_{\sigma} f(\sigma) = O(n)$. For this reason, the scaling of $J$ is usually selected such that $\norm{J}_{op} = \Theta(1)$.
One talks of the asymptotic free energy $$F(\beta) = \lim_{n\to\infty}\frac{Z(n,\beta)}{n}$$
and we can compute this accurately using the product measure if $\GAP = o(n)$.