Insights from statistical mechanics

Thinking about forward and backward processes

Dec 27, 2025

I ran into a wonderfully insightful blog from Alex Alemi which ties modern machine learning thinking to classical statistical mechanics, and is able to straightforwardly derive some famous statistical results that have always intrigued me. Go read his well-argued blog, this post is to write up the formulas so as to have them at hand for future reference. I’m putting this here as a follow-up to my previous post:

Statistical origins of contact geometry

December 22, 2025

Read full story

The argument considers distributions given by a canonical ensemble or Gibbs measure

\(p_\lambda(x) = \frac{1}{Z(\beta, \lambda)}e^{-\beta H(x, \lambda)}= e^{\beta (F(\lambda)-H(x,\lambda))} \; \text{ where } \; Z(\beta, \lambda) = \int e^{-\beta H(x, \lambda)} dx=:e^{-\beta F(\lambda)}\)

so that -βF := log Z. We use Alex’s idea to consider the joint probabilities of forward process p and backward process q, assuming x₁ is some event occuring after x₀. Considering the two joint probabilities in distinct ways, forward and backward:

\(\frac{q(x_0,x_1)}{p(x_0,x_1)}=\frac{q(x_1)q(x_0|x_1)}{p(x_0)p(x_1|x_0)} = \frac{Z(\beta, \lambda_0)}{Z(\beta,\lambda_1)}e^{-\beta\left(H(x_1,\lambda_1) - H(x_0,\lambda_0)\right)} \frac{q(x_0|x_1)}{p(x_1|x_0)} = e^{\beta(\Delta F - W)} \frac{q(x_0|x_1)}{p(x_1|x_0)} \)

where βΔF = β(F₁-F₀)=log (Z₀/Z₁) is the change in free energy and W:=H₁-H₀ is work done1. Now in equilibrium when q(x₀|x₁) = p(x₁|x₀)—the forward process x₀→x₁ and backward process x₁→x₀ are equally likely—we read off the Crooks Fluctuation Theorem (Wiki). The famous Jarzynski equality (Wiki) follows immediately:

\(\iint p(x_0,x_1) \frac{q(x_0,x_1)}{p(x_0,x_1)} dx_0 dx_1=1=\langle e^{\beta(\Delta F - W)}\rangle_p \quad \text{ so that } \quad \langle e^{-\beta W)}\rangle_p = e^{-\beta\Delta F } \)

since F does not depend on any x. These results from non-equilibrium thermodynamics address the Lorschmidt paradox (Wiki) about the arrow of time.

If you’re reading along Alex’s blog, I think I have a sign difference here because I always confuse work done ON the system with work done BY the system.

Slow Money

Statistical origins of contact geometry

Discussion about this post

Ready for more?