Over the summer, I gave weekly talks as part of a reading group. The first half of the talks were centered on information complexity, while the second half was about random kSAT threshold and methods from statistical physics. Here’s an attempt to transcibe (and revise) the “sales pitch” I gave for statistical physics.
(Credits. The first example is largely also a sales pitch for large deviations theory, which I took from Chatterjee’s introduction to this subject.)
$\newcommand\E{\mathbb E}$ $\newcommand\I{\mathbb I}$ $\newcommand\eps{\epsilon}$
Here’s a problem:
Flip $N$ coins. What is the probability that at least $2/3$ of them are tails?
The exact probability here can be computed directly in terms of binomial coefficients, but let’s try to get a more qualitative understanding (especially about how it changes in $N$).
Rewrite the problem slightly:
Let $X_i$ take the values $\pm 1$ uniformly and independently. Establish the asymptotics of $$P(X_1+X_2+...+X_N \ge N/3)$$ in $N$.
Here’s an (incorrect) approach:
Ok, so $P(X\ge N/3)$ is exponentially decreasing in $N$. That much is correct, but it turns out that this argument gives the wrong base constant $e^{-1/18}$. Where exactly we went wrong is that CLT only states that $$\frac{X_1+...+X_N}{\sqrt N} \overset d \to Z$$ so the convergence is guaranteed for any $P(X\ge c\sqrt{N})$ with $c$ constant, but not a “moving target” value like $c=\sqrt N$.
So the puzzle here is to figure out:
For which $\alpha$ do we have $$P(X\ge N/3) \overset \cdot = \alpha^N ?$$
Let’s look at the hypercube $\{\pm 1\}^N$. We want to understand the size of the subset $$S= \{(x_1,x_2,...,x_N)\in \{\pm 1\}^N: x_1+x_2+...+x_N \ge N/3 \}.$$
We need a bunch of inputs from physics:
So now assume that $\nu$ is a product distribution (i.e. it is made of independent $X_i$’s). We try to maximize the LHS (for $\beta$ large enough!) by understanding various parts of it:
At this point, we can plug this distribution back and obtain $$\beta + \log |S| \approx \log Z \approx \beta + N h_2(1/3)$$ where $h_2(p)$ is the binary entropy function defined by $$h_2(p) = -p\log p -(1-p) \log (1-p).$$ (This also happens to be the entropy of a two-valued r.v. where the split in probability mass is $p$ and $1-p$.)
In any case, the answer is $|S| = (2^{h_2(1/3)})^N = (3/2^{2/3})^N$, so $\alpha = 3/2^{5/3}$. Did you guess it right?
That’s where my talk ended.
Gibbs’ inequality lives in the “log-space”, and we can get a combinatorial interpretation by exponentiating it. But first, we have to take a detour.
Let’s say $(X_1,X_2,...,X_n)$ was drawn from $n$ independent copies of $\nu$. The typical likelihood is $2^{-nH(\nu)}$, due to the law of large numbers, so $$\frac{1}{n} \log p(X_1,X_2,...,X_n) \overset p \to -H(\nu).$$ Concretely, this means that for any $\eps > 0$, then for sufficiently large $n$ we must have $$P\left(p(X_1,X_2,...,X_n) = 2^{n(H(\nu) \pm \eps)}\right)\ge 1-\eps$$
This has a few consequences:
We consider $n$ large enough such that the following is true: $$ \begin{aligned} P\left(p(X_1,X_2,...,X_n) = 2^{n(H(\nu) \pm \eps)}\right) &\ge 1-\eps,\\ P\left(\beta(f(X_1)+...+f(X_n))) = n(\beta \E f(X) \pm \eps)\right) &\ge 1-\eps. \end{aligned}$$
Their intersection has probability mass at least $1-2\eps$, so the number of such sequences is at least $(1-2\eps)2^{n(H(\nu)-\eps)}$. On the other hand, $Z^n$ is the sum of $e^{\beta(f(x_1) + ... + f(x_n))}$ over all length $n$ sequences. Thus, their contribution to $Z$ is at least $$ \begin{aligned} Z^n &\ge (1-2\eps) 2^{n(H(\nu) - \eps)} \cdot 2^{n(\beta \E_\nu f(X) -\eps )}\\ &\ge (1-2\eps) 2^{n(\beta \E_\nu f + H(\nu) - 2\eps).} \end{aligned} $$ Thus taking logs and dividing by $n$, we get $$\log Z \ge \beta \E_{\nu} f + H(\nu) - 2\eps$$ then we simply take $\eps \to 0$.
The actual way to prove this is to use large deviations theory. There’s a whole textbook about this, so I will refrain from jamming it in the footnote of this blog post. Until next time!