Overview of topics in the paper
A walk in the statistical mechanical formulation of neural networks (2014)
https://arxiv.org/abs/1407.5300
Audio: https://youtu.be/zIxg69Q8UTk
Size: 348.18 KB
Language: en
Added: Mar 07, 2025
Slides: 24 pages
Slide Content
A Walk in the Statistical Mechanical Formulation
of Neural Networks
Based on Agliari et al.
March 7, 2025
Introduction
▶Statistical mechanics provides a theoretical framework for
neural networks.
▶Models inspired by spin systems describe memory and learning.
▶Key models: Curie-Weiss, Sherrington-Kirkpatrick, Hopfield,
Restricted Boltzmann Machines (RBM).
Hamiltonian in Statistical Mechanics
▶The Hamiltonian describes the energy of a system.
▶General form:
H=−
X
i<j
Jijσiσj−
X
i
hiσi (1)
▶Jijare interaction terms.
▶hirepresents external fields.
▶σiare binary spin variables (±1).
Free Energy and Partition Function
▶The free energyFdetermines system behavior.
▶Related to the partition functionZ:
Z=
X
{σ}
e
−βH
(2)
F=−
1
β
lnZ (3)
▶β= 1/kBT(inverse temperature).
▶LowFimplies stability.
Training and Generalization Errors
▶Following?, we define theAverage Training Errorin the
annealed approximation as:
¯E
an
train:=−
1
n
∂(lnZ
an
n)
∂β
(4)
=−
1
n
1
Zn
∂Z
an
n
∂β
.
Generalization Error
▶TheAverage Generalization Errormeasures how predictions
change as more data is introduced:
¯E
an
gen:=−
1
β
∂(lnZ
an
n)
∂n
+
1
β
lnz(β) (5)
=−
1
β
1
Zn
∂Z
an
n
∂n
+
1
β
lnz(β).
Curie-Weiss Model
▶Simplest spin system with all-to-all interactions:
H=−
J
N
X
i<j
σiσj (6)σi= +1−220.51Activated (σi= +1)Inactive (σi=−1)xf(x)
Figure: σi=±1.
Why is the Curie-Weiss Model Important?
▶**Simple but Powerful:** Captures the fundamental behavior
of phase transitions.
▶**Mean-Field Approximation:** Treats each spin as
interacting with an average field.
▶**Key Order Parameter:** The magnetization⟨m⟩
distinguishes phases.
▶**Connection to Neural Networks:** Leads to the Hopfield
model and associative memory.
⟨m⟩= tanh(βJ⟨m⟩). (7)
Paramagnetic vs. Ferromagnetic PhasesParamagnetic Phase (T > Tc)Ferromagnetic Phase (T < Tc)Random spin orientationsAligned spins (ordered state)Phase Transition atTc
Figure:
Paramagnetic Phase (T>Tc)
▶Spins are randomly oriented,
⟨m⟩= 0.
▶High thermal noise destroys
correlations.
▶Analogous to an **untrained
neural network** with
random weights.
Ferromagnetic Phase
(T<Tc)
▶Spins align, leading to
⟨m⟩ ̸= 0.
▶System has structured order.
▶Analogous to a **trained
neural network**, where
memories are stored.
Learning as a Phase Transition
▶In neural networks, learning corresponds to **moving from
disorder to order**.
▶Before training: **paramagnetic-like state** (random
weights).
▶During training: **system near criticality** (learning increases
correlations).
▶After training: **ferromagnetic-like state** (patterns are
stored and retrieved).
⟨m⟩= tanh(βJ⟨m⟩). (8)
Mean-Field Approximation: Concept
▶Mean-field approximation simplifies many-body interactions by
replacing them with an **average effect**.
▶Instead of considering all pairwise interactions, each spinσi
experiences an **effective field**:
HMF=−Jm
X
i
σi. (9)
▶The magnetizationmsatisfies:
m= tanh(βJm). (10)
Mean-Field Approximation: Saddle-Point and Large-N
▶More rigorously, mean-field theory arises from **saddle-point
approximations** in the **large-Nlimit**.
▶Define the magnetization order parameter:
m=
1
N
X
i
σi. (11)
▶Rewrite the partition function:
Z=
Z
dm e
−βNf(m)
. (12)
▶Evaluating this integral via the **saddle-point method**
gives:
Z≈e
−βNf(m
∗
)
,where
df
dm
m
∗
= 0. (13)
Magnetization vs. Temperature (From Paper)
Figure: ⟨m⟩versus temperatureTfor the
Curie-Weiss model.
▶Simulation results (black dots) match theoretical prediction
(green curve).
▶The inset shows a **power-law decay** near the critical
temperature.
▶The system transitions from an **ordered** state to a
**disordered** state atTc= 1.
Curie-Weiss Model and Phase Transition
▶The Curie-Weiss model exhibits a phase transition atTc= 1.
▶ForT>Tc, the system is **paramagnetic**:⟨m⟩= 0.
▶ForT<Tc, the system is **ferromagnetic**:⟨m⟩ ̸= 0.
⟨m⟩ ∼(T−Tc)
β
,withβ=
1
2
. (14)
Sherrington-Kirkpatrick (SK) Model
▶Generalization of Curie-Weiss with random interactions:Global MinimumLocal MinimumLocal MinimumEnergy BarrierEnergy BarrierConfiguration SpaceEnergy
Figure:
states and energy barriers.
H=−
X
i<j
Jijσiσj (15)
▶Jijdrawn from a Gaussian distribution.
▶Describes spin glasses.
▶Many metastable states→complex energy landscape.
Hopfield Model
▶Neural network with Hebbian learning rule:
Jij=
1
N
P
X
µ=1
ξ
µ
i
ξ
µ
j
(16)
▶StoresPpatternsξ
µ
.
▶Retrieval via attractor dynamics.
▶Works as an associative memory.
Hopfield Model: Phase Diagram Overview
Figure:Phase diagram of the Hopfield model.α=P/Ncontrols storage load,
whileTaffects noise.
Key Phases:
▶Paramagnetic (PM): HighT, no memory retrieval.
▶Retrieval (R): LowT, memory recall works.
▶Spin Glass (SG): Highα, patterns interfere.
▶Coexistence (SG+R): Some retrieval, but errors.
Hopfield Model: Paramagnetic vs. Retrieval Phases
Paramagnetic (PM) Phase:
▶High temperature (T>1).
▶Random neuron activity
(⟨m⟩= 0).
▶No memory retrieval –
noise dominates.
▶Analogous to an
**untrained neural
network**.
Retrieval (R) Phase:
▶Low temperature (T<1),
moderate storage load.
▶Neurons settle into
**stored patterns**
(⟨m⟩ ̸= 0).
▶Network correctly recalls
memories.
▶This is the **optimal
memory regime**.
Hopfield Model: Spin Glass and SG+R Phases
Spin Glass (SG) Phase:
▶High storage load
(α >0.14).
▶Overlapping patterns cause
**interference**.
▶Network gets stuck in
**spurious states**.
▶Analogous to **overfitting
in machine learning**.
Coexistence (SG+R):
▶Mix of **retrieval and
glassy behavior**.
▶Some patterns can be
recalled, but with errors.
▶Unstable memories due to
pattern interference.
▶Happens at intermediateα
andT.
Restricted Boltzmann Machines (RBMs)
▶RBMs are **energy-based models** with two layers:
▶Visible unitsvi(observed data).
▶Hidden unitshj(latent variables).
▶The RBM energy function is:
H=−
X
i,j
Wijvihj−
X
i
bivi−
X
j
cjhj (17)
where:
▶Wijconnects visible and hidden units.
▶bi,cjare biases for visible and hidden units.
▶**Key Property:** No intra-layer connections.
Mean-Field Approximation in RBMs
▶Since hidden units are **conditionally independent** given
visible units:
p(hj= 1|v) =σ
X
i
Wijvi+cj
!
(18)
▶Summing over hidden units simplifies the partition function:
Z=
X
{v}
e
−
P
i
bivi
Y
j
ı
1 +e
P
i
Wijvi+cj
ȷ
(19)
▶This eliminateshjand gives an **effective energy function**
for visible units.
RBMs and Their Connection to Other Models
▶After summing over hidden units, we obtain an **effective
energy function**:
Heff=−
X
i<j
Jijvivj−
X
i
bivi (20)
where:
Jij=
X
µ
WiµWjµ (21)
▶This links RBMs to:
▶**Hopfield Networks**: Weights define an associative memory
model.
▶**Sigmoidal Neural Networks**: Hidden units transform
inputs nonlinearly.
▶**Spin Glasses**: When weights are random, RBMs resemble
spin-glass systems.
▶**Key Insight:** RBMs serve as a bridge between statistical
mechanics and deep learning.
Summary
▶Statistical mechanics provides a framework for neural
networks.
▶Curie-Weiss model explains basic ferromagnetism.
▶SK model introduces disorder and complexity.
▶Hopfield model connects to associative memory.
▶RBM links statistical physics to machine learning.