We will give the basic setup for statistical mechanics, and define the log-partition function, the Gibbs distribution, and the Ising model.
Imagine you have $n$ particles, each of which can have one of two states $\{-1, +1\}$. These spins can be collectively encoded as $\sigma \in \{-1,+1\}^n$, the spin vector.
We want to estimate (for $\beta > 0$) the sum $$Z = \sum_{\sigma\in \{\pm 1\}^n} e^{\beta f(\sigma)}$$ for various functions $f$.
Here is a really easy case. Suppose that $f(\sigma) = \sum_i h_i\sigma_i$. Then, we can simply factor the sum $$Z = \sum_{\sigma\in \{\pm 1\}^n} e^{\beta f(\sigma)} = \prod_{i=1}^n \left(e^{h_i} + e^{-h_i} \right).$$
The first nontrivial case for $f$ are quadratic functions of $\sigma$, which is known as the Ising model.
In the above, $f$ is actually the (neg)energy of the spins. When the energy is lower (i.e. $f$ is higher), the system is more stable and prefers to be in that state.
In particular, for the given (neg)energy $f$, we can define the probability distribution $$p(\sigma) \propto e^{\beta f(\sigma)}$$ known as the Boltzmann distribution. (Why this? See this later section.)
The parameter $\beta > 0$ is the inverse temperature of the system. We can study the behavior of the probability distribution for extreme $\beta$:
In this framework, $Z$ is the normalizing constant for the distribution. This is important if we want to compute the probability $p(\sigma)$ explicitly.
One last thing to note is that for linear $f$ (as we saw earlier), we get a product distribution over $\{\pm 1\}^n$, which is the same as saying that each spin is independent.
Suppose that every pair of spins can either prefer alignment or disalignment. We encode this using interaction coefficients $\{J_{ij}\}$:
The sum of the overall pairs of preferences form the overall (neg)energy $$f(\sigma) = \sum_{i < j} J_{ij} \sigma_i \sigma_j.$$