Lecture 13 HMMs and the derivations for perusal.pdf

Hidden Markov Models
STATS 305C: Applied Statistics
Scott Linderman
May 16, 2023
1 / 23

Where are we?
Model Algorithm Application
Multivariate Normal Models Conjugate Inference Bayesian Linear Regression
Hierarchical Models MCMC (MH & Gibbs) Modeling Polling Data
Probabilistic PCA & Factor Analysis MCMC (HMC) Images Reconstruction
Mixture Models EM & Variational Inference Image Segmentation
Mixed Membership Models Coordinate Ascent VI Topic Modeling
Variational Autoencoders Gradient-based VI Image Generation
State Space Models Message Passing Segmenting Video Data
Bayesian Nonparametrics Fancy MCMC Modeling Neural Spike Trains
2 / 23

Gaussian Mixture Models
Recall the basic Gaussian mixture model,
z
t
iid
∼Cat(π) (1)
x
t|z
t∼ N(µ
z
t
,Σ
z
t
) (2)
where
▶z
t∈ {1, . . . ,K}is alatent mixture assignment
▶x
t∈R
D
is anobserved data point
▶π∈∆
K,µ
k
∈R
D
, andΣ
k∈R
D×D
⪰0
are parameters
(Here we’ve switched to indexing data points bytrather thann.)
LetΘdenote the set of parameters. We can be Bayesian and put a prior onΘand run Gibbs or VI, or
we can point estimateΘwith EM, etc.
3 / 23

Gaussian Mixture Models II
Draw the graphical model.
4 / 23

Gaussian Mixture Models III
Recall the EM algorithm for mixture models,
▶E step:Compute the posterior distribution
q(z
1:T) =p(z
1:T|x
1:T;Θ) (3)
=
TY
t=1
p(z
t|x
t;Θ) (4)
=
TY
t=1
q
t(z
t) (5)
▶M step:Maximize the ELBO wrtΘ,
L(Θ) =E
q(z
1:T)[logp(x
1:T,z
1:T;Θ)−logq(z
1:T)] (6)
=E
q(z
1:T)[logp(x
1:T,z
1:T;Θ)]+c. (7)
For exponential family mixture models, the M-step only requires expected sufficient statistics.
5 / 23

Hidden Markov Models
Hidden Markov Models (HMMs) are like mixture models with temporal dependencies between the
mixture assignments.
This graphical model says that the joint distribution factors as,
p(z
1:T,x
1:T) =p(z
1)
TY
t=2
p(z
t|z
t−1)
TY
t=1
p(x
t|z
t). (8)
We call this an HMM because thehiddenstates follow a Markov chain,p(z
1)
Q
T
t=2
p(z
t|z
t−1).
6 / 23

Hidden Markov Models II
An HMM consists of three components:
1. z
1∼Cat(π
0)
2. z
t∼Cat(P
z
t−1
)whereP∈[0,1]
K×K
is arow-stochastictransition matrix with
rowsP
k.
3. x
t∼p(· |θ
z
t
)
7 / 23

Example: Theoccasionallydishonest casino
Figure:Anoccasionallydishonest casino that sometimes throws loaded dice.
Fromhttps://probml.github.io/dynamax/notebooks/hmm/casino_hmm_inference.html
8 / 23

Example: HMM for splice site recognition
Figure:A toy model for parsing a genome to find 5’ splice sites. From Eddy [2004].
Question:Suppose the splice site always had aGTsequence. How would you change the model to
detect such sites?
9 / 23

Example: Autoregressive HMM for video segmentation
Figure:Segmenting videos of freely moving mice [Wiltschko et al., 2015]. (Show video.)
10 / 23

Hidden Markov Models III
We are interested in questions like:
▶What are thepredictive distributionsofp(z
t+1|x
1:t)?
▶What is theposterior marginaldistributionp(z
t|x
1:T)?
▶What is theposterior pairwise marginaldistributionp(z
t,z
t+1|x
1:T)?
▶What is theposterior mode z
⋆
1:T
= arg maxp(z
1:T|x
1:T)?
▶How can wesample the posterior p(z
1:T|x
1:T)of an HMM?
▶What is themarginal likelihood p(x
1:T)?
▶How can welearn the parametersof an HMM?
Question:Why might these sound like hard problems?
11 / 23

Computing the predictive distributions
The predictive distributions give the probability of the latent statez
t+1given observationsup to but
not includingtimet+1. Let,
α
t+1(z
t+1)≜p(z
t+1,x
1:t) (9)
=
KX
z
1=1
· · ·
KX
z
t=1
p(z
1)
tY
s=1
p(x
s|z
s)p(z
s+1|z
s) (10)
=
KX
z
t=1



KX
z
1=1
· · ·
KX
z
t−1=1
p(z
1)
t−1Y
s=1
p(x
s|z
s)p(z
s+1|z
s)
!
p(x
t|z
t)p(z
t+1|z
t)

(11)
=
KX
z
t=1
α
t(z
t)p(x
t|z
t)p(z
t+1|z
t). (12)
We callα
t(z
t)theforward messages. We can compute them recursively! The base case is
p(z
1|∅)≜p(z
1).
12 / 23

Computing the predictive distributions II
We can also write these recursions in a vectorized form. Let
α
t=


α
t(z
t=1)
.
.
.
α
t(z
t=K)

=


p(z
t=1,x
1:t−1)
.
.
.
p(z
t=K,x
1:t−1)

 and l
t=


p(x
t|z
t=1)
.
.
.
p(x
t|z
t=K)

 (13)
both be vectors inR
K
+
. Then,
α
t+1=P
⊤
(α
t⊙l
t) (14)
where⊙denotes the Hadamard (elementwise) product andPis the transition matrix.
13 / 23

Computing the predictive distributions III
Finally, to get the predictive distributions we just have to normalize,
p(z
t+1|x
1:t)∝p(z
t+1,x
1:t) =α
t+1(z
t+1). (15)
Question:What does the normalizing constant tell us?
14 / 23

Computing the posterior marginal distributions
The posterior marginal distributions give the probability of the latent statez
tgivenall the observations
up to timeT.
p(z
t|x
1:T) =
KX
z
1=1
· · ·
KX
z
t−1=1
KX
z
t+1=1
· · ·
KX
z
T=1
p(z
1:T,x
1:T) (16)
=
?KX
z
t=1
· · ·
KX
z
t−1=1
p(z
1)
t−1Y
s=1
p(x
s|z
s)p(z
s+1|z
s)
?
×p(x
t|z
t)
×
?KX
z
t+1=1
· · ·
KX
z
T=1
TY
u=t+1
p(z
u|z
u−1)p(x
u|z
u)
?
(17)
=α
t(z
t)×p(x
t|z
t)×β
t(z
t) (18)
where we have introduced thebackward messagesβ
t(z
t).
15 / 23

Computing the backward messages (vectorized)
Let
β
t=


β
t(z
t=1)
.
.
.
β
t(z
t=K)

 (22)
be a vector inR
K
+
. Then,
β
t=P(β
t+1
⊙l
t+1). (23)
Letβ
T=1
K.
Now we have everything we need to compute the posterior marginal,
p(z
t=k|x
1:T) =
α
t,kl
t,kβ
t,k
P
K
j=1
α
t,jl
t,jβ
t,j
. (24)
We just derived theforward-backward algorithmfor HMMs [Rabiner and Juang, 1986].
17 / 23

What do the backward messages represent?
Question:If the forward messages represent the predictive probabilitiesα
t+1(z
t+1) =p(z
t+1,x
1:t),
what do the backward messages represent?
18 / 23

Computing the posterior pairwise marginals
Exercise:Use the forward and backward messages to compute the posterior pairwise marginals
p(z
t,z
t+1|x
1:T).
19 / 23

Normalizing the messages for numerical stability
If you’re working with long time series, especially if you’re working with 32-bit floating point, you need
to be careful.
The messages involve products of probabilities, which can quickly overflow.
There’s a simple fix though: after each step, re-normalize the messages so that they sum to one. I.e
replace
α
t+1=P
⊤
(α
t⊙l
t) (25)
with
eα
t+1=
1
A
t
P
⊤
(eα
t⊙l
t) (26)
A
t=
KX
k=1
KX
j=1
P
jkeα
t,jl
t,j≡
KX
j=1
eα
t,jl
t,j(sincePis row-stochastic). (27)
This leads to a nice interpretation: The normalized messages are predictive likelihoods
eα
t+1,k=p(z
t+1=k|x
1:t), and the normalizing constants areA
t=p(x
t|x
1:t−1).
20 / 23

EM for Hidden Markov Models
Now we can put it all together. To perform EM in an HMM,
▶E step:Compute the posterior distribution
q(z
1:T) =p(z
1:T|x
1:T;Θ). (28)
(Really, run theforward-backward algorithmto get posterior marginals and pairwise marginals.)
▶M step:Maximize the ELBO wrtΘ,
L(Θ) =E
q(z
1:T)[logp(x
1:T,z
1:T;Θ)]+c (29)
=E
q(z
1:T)
?
KX
k=1
I[z
1=k]logπ
0,k
?
+E
q(z
1:T)


T−1X
t=1
KX
i=1
KX
j=1
I[z
t=i,z
t+1=j]logP
i,j


+E
q(z
1:T)
?
TX
t=1
KX
k=1
I[z
t=k]logp(x
t;θ
k)
?
(30)
For exponential family observations, the M-step only requires expected sufficient statistics.
21 / 23

What else?
▶How can we sample the posterior?
▶How can we find the posterior mode?
▶How can we choose the number of states?
▶What if my transition matrix is sparse?
22 / 23

References
Sean R Eddy. What is a hidden Markov model?Nature biotechnology, 22(10):1315–1316, 2004.
Alexander B Wiltschko, Matthew J Johnson, Giuliano Iurilli, Ralph E Peterson, Jesse M Katon, Stan L
Pashkovski, Victoria E Abraira, Ryan P Adams, and Sandeep Robert Datta. Mapping sub-second
structure in mouse behavior.Neuron, 88(6):1121–1135, 2015.
Lawrence Rabiner and Biinghwang Juang. An introduction to hidden Markov models.ieee assp
magazine, 3(1):4–16, 1986.
23 / 23

Lecture 13 HMMs and the derivations for perusal.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Lecture 13 HMMs and the derivations for perusal.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......