Overview of basic statistical mechanics of NNs

charlesmartin141 216 views 24 slides Mar 07, 2025
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Overview of topics in the paper
A walk in the statistical mechanical formulation of neural networks (2014)
https://arxiv.org/abs/1407.5300

Audio: https://youtu.be/zIxg69Q8UTk


Slide Content

A Walk in the Statistical Mechanical Formulation
of Neural Networks
Based on Agliari et al.
March 7, 2025

Introduction
▶Statistical mechanics provides a theoretical framework for
neural networks.
▶Models inspired by spin systems describe memory and learning.
▶Key models: Curie-Weiss, Sherrington-Kirkpatrick, Hopfield,
Restricted Boltzmann Machines (RBM).

Hamiltonian in Statistical Mechanics
▶The Hamiltonian describes the energy of a system.
▶General form:
H=−
X
i<j
Jijσiσj−
X
i
hiσi (1)
▶Jijare interaction terms.
▶hirepresents external fields.
▶σiare binary spin variables (±1).

Free Energy and Partition Function
▶The free energyFdetermines system behavior.
▶Related to the partition functionZ:
Z=
X
{σ}
e
−βH
(2)
F=−
1
β
lnZ (3)
▶β= 1/kBT(inverse temperature).
▶LowFimplies stability.

Training and Generalization Errors
▶Following?, we define theAverage Training Errorin the
annealed approximation as:
¯E
an
train:=−
1
n
∂(lnZ
an
n)
∂β
(4)
=−
1
n
1
Zn
∂Z
an
n
∂β
.

Generalization Error
▶TheAverage Generalization Errormeasures how predictions
change as more data is introduced:
¯E
an
gen:=−
1
β
∂(lnZ
an
n)
∂n
+
1
β
lnz(β) (5)
=−
1
β
1
Zn
∂Z
an
n
∂n
+
1
β
lnz(β).

Curie-Weiss Model
▶Simplest spin system with all-to-all interactions:
H=−
J
N
X
i<j
σiσj (6)σi= +1−220.51Activated (σi= +1)Inactive (σi=−1)xf(x)
Figure: σi=±1.

Why is the Curie-Weiss Model Important?
▶**Simple but Powerful:** Captures the fundamental behavior
of phase transitions.
▶**Mean-Field Approximation:** Treats each spin as
interacting with an average field.
▶**Key Order Parameter:** The magnetization⟨m⟩
distinguishes phases.
▶**Connection to Neural Networks:** Leads to the Hopfield
model and associative memory.
⟨m⟩= tanh(βJ⟨m⟩). (7)

Paramagnetic vs. Ferromagnetic PhasesParamagnetic Phase (T > Tc)Ferromagnetic Phase (T < Tc)Random spin orientationsAligned spins (ordered state)Phase Transition atTc
Figure:
Paramagnetic Phase (T>Tc)
▶Spins are randomly oriented,
⟨m⟩= 0.
▶High thermal noise destroys
correlations.
▶Analogous to an **untrained
neural network** with
random weights.
Ferromagnetic Phase
(T<Tc)
▶Spins align, leading to
⟨m⟩ ̸= 0.
▶System has structured order.
▶Analogous to a **trained
neural network**, where
memories are stored.

Learning as a Phase Transition
▶In neural networks, learning corresponds to **moving from
disorder to order**.
▶Before training: **paramagnetic-like state** (random
weights).
▶During training: **system near criticality** (learning increases
correlations).
▶After training: **ferromagnetic-like state** (patterns are
stored and retrieved).
⟨m⟩= tanh(βJ⟨m⟩). (8)

Mean-Field Approximation: Concept
▶Mean-field approximation simplifies many-body interactions by
replacing them with an **average effect**.
▶Instead of considering all pairwise interactions, each spinσi
experiences an **effective field**:
HMF=−Jm
X
i
σi. (9)
▶The magnetizationmsatisfies:
m= tanh(βJm). (10)

Mean-Field Approximation: Saddle-Point and Large-N
▶More rigorously, mean-field theory arises from **saddle-point
approximations** in the **large-Nlimit**.
▶Define the magnetization order parameter:
m=
1
N
X
i
σi. (11)
▶Rewrite the partition function:
Z=
Z
dm e
−βNf(m)
. (12)
▶Evaluating this integral via the **saddle-point method**
gives:
Z≈e
−βNf(m

)
,where
df
dm




m

= 0. (13)

Magnetization vs. Temperature (From Paper)
Figure: ⟨m⟩versus temperatureTfor the
Curie-Weiss model.
▶Simulation results (black dots) match theoretical prediction
(green curve).
▶The inset shows a **power-law decay** near the critical
temperature.
▶The system transitions from an **ordered** state to a
**disordered** state atTc= 1.

Curie-Weiss Model and Phase Transition
▶The Curie-Weiss model exhibits a phase transition atTc= 1.
▶ForT>Tc, the system is **paramagnetic**:⟨m⟩= 0.
▶ForT<Tc, the system is **ferromagnetic**:⟨m⟩ ̸= 0.
⟨m⟩ ∼(T−Tc)
β
,withβ=
1
2
. (14)

Sherrington-Kirkpatrick (SK) Model
▶Generalization of Curie-Weiss with random interactions:Global MinimumLocal MinimumLocal MinimumEnergy BarrierEnergy BarrierConfiguration SpaceEnergy
Figure:
states and energy barriers.
H=−
X
i<j
Jijσiσj (15)
▶Jijdrawn from a Gaussian distribution.
▶Describes spin glasses.
▶Many metastable states→complex energy landscape.

Hopfield Model
▶Neural network with Hebbian learning rule:
Jij=
1
N
P
X
µ=1
ξ
µ
i
ξ
µ
j
(16)
▶StoresPpatternsξ
µ
.
▶Retrieval via attractor dynamics.
▶Works as an associative memory.

Hopfield Model: Phase Diagram Overview
Figure:Phase diagram of the Hopfield model.α=P/Ncontrols storage load,
whileTaffects noise.
Key Phases:
▶Paramagnetic (PM): HighT, no memory retrieval.
▶Retrieval (R): LowT, memory recall works.
▶Spin Glass (SG): Highα, patterns interfere.
▶Coexistence (SG+R): Some retrieval, but errors.

Hopfield Model: Paramagnetic vs. Retrieval Phases
Paramagnetic (PM) Phase:
▶High temperature (T>1).
▶Random neuron activity
(⟨m⟩= 0).
▶No memory retrieval –
noise dominates.
▶Analogous to an
**untrained neural
network**.
Retrieval (R) Phase:
▶Low temperature (T<1),
moderate storage load.
▶Neurons settle into
**stored patterns**
(⟨m⟩ ̸= 0).
▶Network correctly recalls
memories.
▶This is the **optimal
memory regime**.

Hopfield Model: Spin Glass and SG+R Phases
Spin Glass (SG) Phase:
▶High storage load
(α >0.14).
▶Overlapping patterns cause
**interference**.
▶Network gets stuck in
**spurious states**.
▶Analogous to **overfitting
in machine learning**.
Coexistence (SG+R):
▶Mix of **retrieval and
glassy behavior**.
▶Some patterns can be
recalled, but with errors.
▶Unstable memories due to
pattern interference.
▶Happens at intermediateα
andT.

Restricted Boltzmann Machines (RBMs)
▶RBMs are **energy-based models** with two layers:
▶Visible unitsvi(observed data).
▶Hidden unitshj(latent variables).
▶The RBM energy function is:
H=−
X
i,j
Wijvihj−
X
i
bivi−
X
j
cjhj (17)
where:
▶Wijconnects visible and hidden units.
▶bi,cjare biases for visible and hidden units.
▶**Key Property:** No intra-layer connections.

Mean-Field Approximation in RBMs
▶Since hidden units are **conditionally independent** given
visible units:
p(hj= 1|v) =σ

X
i
Wijvi+cj
!
(18)
▶Summing over hidden units simplifies the partition function:
Z=
X
{v}
e

P
i
bivi
Y
j
ı
1 +e
P
i
Wijvi+cj
ȷ
(19)
▶This eliminateshjand gives an **effective energy function**
for visible units.

RBMs and Their Connection to Other Models
▶After summing over hidden units, we obtain an **effective
energy function**:
Heff=−
X
i<j
Jijvivj−
X
i
bivi (20)
where:
Jij=
X
µ
WiµWjµ (21)
▶This links RBMs to:
▶**Hopfield Networks**: Weights define an associative memory
model.
▶**Sigmoidal Neural Networks**: Hidden units transform
inputs nonlinearly.
▶**Spin Glasses**: When weights are random, RBMs resemble
spin-glass systems.
▶**Key Insight:** RBMs serve as a bridge between statistical
mechanics and deep learning.

Summary
▶Statistical mechanics provides a framework for neural
networks.
▶Curie-Weiss model explains basic ferromagnetism.
▶SK model introduces disorder and complexity.
▶Hopfield model connects to associative memory.
▶RBM links statistical physics to machine learning.

References