1. It is explained from the View of �Probabilistic Graphical Models.
2. It is a SUPPLEMENTS FOR
Bayesian Networks Course.
Size: 3.37 MB
Language: en
Added: May 13, 2011
Slides: 44 pages
Slide Content
Markov Random Field Explained from the View of
Probabilistic Graphical Models SUPPLEMENTSFOR
BAYESIAN NETWORKS COURSE YUAN‐KAI WANG, 2011/05/05
1
Goal of This Unit
•We have seen that directed graphical models
specify a factorization of the joint distribution
over a set of variables into a product of local conditional distributions
•We turn now to the second major class of
graphical models that are described by
undirected graphs and that again specify both
a factorization and a set of conditional
independence relations.
•We will talk Markov Random Field (MRF).
•No inference algorithms
•But more on modelingand energy function
2
Self‐Study Reference
•Source of this unit
•Section 8.3 Markov Random Fields, Pattern
Recognition and Machine Learning, C. M. Bishop, 2006.
•Background of this unit
•Chapter 8 Graphical Models, Pattern Recognition
and Machine Learning, C. M. Bishop, 2006.
•Probabilistic Graphical Models, Yuan‐Kai Wang’s
Lecture Notes for Bayesian Networks Courses, 2011.
3
1. Background
•We have seen that directedgraphical
models
•Specify a factorization of the joint distribution
over a set of variables
•Into a product of local conditional distributions
5
Bayesian Networks
Directed Acyclic Graph (DAG)
6
Bayesian Networks
General Factorization
7
What Is Markov Random Field (MRF)
•A Markov random field (MRF) has a set of
•Nodes
•Each node corresponds to a variable or group of
variables
•Links
•Each connects a pair of nodes.
•The links are undirected
•They do not carry arrows
•MRF is also known as
•Markov network, or
•Undirected graphical model
(Kindermann and Snell, 1980)
8
Why Use MRF for Computer Vision
•Image de‐noising
•Image de‐blurring
•Image segmentation
•Image super‐resolution
9
2. Conditional Independence Property
•In the case of directed graphs, we can test
whether a conditional independence (CI)
property holds by applying a graphical test
called d‐separation.
•This involved testing whether or not the paths
connecting two sets of nodes were ‘blocked’.
•The CI definition will not apply to MRF and
undirected graphical models (UGMs).
•But we will find alternative semantics of CI
property for MRF and UGMs.
10
CI Definition for UGM
•Suppose that in an UGM we identify three
sets of nodes, denoted A, B, and C,
•And we consider the CI property
•To test whether CI property is satisfied by a
probability distribution defined by a UGM
•We consider all possible paths that connect
nodes in set A to nodes in set B through C.
11
An Example of CI
•Every path from any
node in set A to any
node in set B passes
through at least one
node in set C.
•Consequently the
conditional
independence property
holds for any probability
distribution described by
this graph.
12
Markov Blanket
•The Markov blanket for a UGM
takes a particularly simple form,
•Because a node will be conditionally
independent of all other nodes
conditioned only on the
neighbouringnodes.
Markov Blanket
13
3. Factorization Property
•It is a factorization rule for UGM
corresponding to the conditional
independence test.
•What is factorization?
•Expressing the joint distribution p(x)as a
product of functions defined over sets of
variables that are local to the graph.
•Remember the factorization rule in directed
graphs
Product of factors
14
The Factorization Rule –Two nodes
•Consider two nodes x
i
and x
j
that are not
connected by a link
•Then these variables must be conditionally
independent given all other nodes in the graph.
•There is no direct path between the two nodes.
•And all other paths pass through nodes that are
observed, and hence those paths are blocked.
•This CI property can be expressed as
x
\{i,j}
denotes the set xof all variables with x
i
and x
j
removed.
15
The Factorization Rule –All Nodes
•Extend the factorization of two nodes to
the joint distribution p(x)of all nodes
•It must be the product of a set of factors
•Each factor has some nodes X
c
={x
i
… x
j} that do
not appear in other factors
•In order for the CI property to hold for all possible
distributions belonging to the graph.
L
L?L
?
?
, L
?
+
?
?
16
Clique
•How to find the set of {x
c
}?
•We need to consider a graph terminology:
clique
•It is a subset of the nodes in a graph such that
there exists a link between allpairs of nodes in
the subset.
•The set of nodes in a clique is fully connected.
L
L?L
?
?
, L
?
+
?
?
17
Cliques and Maximal Cliques
•A maximal clique is a clique that
•It is not possible to include any other nodes
from the graph to the set without it ceasing to
be a clique.
Clique
Maximal Clique
18
An Example of Clique
•This graph has five cliques of two nodes
•{x
1
, x
2
}, {x
2
, x
3
}, {x
3
, x
4
}, {x
4
, x
2
}, {x
1
, x
3
}
•It has two maximal cliques
•{x
1
, x
2
, x
3
}, {x
2
, x
3
, x
4
}
•The set {x
1
, x
2
, x
3
, x
4
} is
nota clique because of
the missing link
from x1 to x4.
Clique
Maximal Clique
19
Factorization by Maximal Clique
•We can define the factors in the
decomposition of the joint distribution to be
functions of the variables in the cliques.
•The set of nodes x
C
is a clique
•In fact, we can consider functions of the
maximal cliques, without loss of generality,
•Because other cliques must be subsets of maximal
cliques.
•The set of nodes x
C
is a maximal clique
L
L?L
?
?
, L
?
+
?
?
20
The Factorization Rule
•Denote a clique by Cand the set of
variables in that clique by x
C
.
•The joint distribution p(x)is written as a
product of potential functions
C
(x
C
) over
the maximal cliques of the graph
•The quantity Z, sometimes called partition
function, is a normalization constant and is
given by
21
Why Not Probability Function
for the Factorization Rule (1/2)
•Why potential function
but not probability function?
•In directed graphs, each factor represents the
conditional distribution corresponding to its
parents.
•But here we do not restrict the choice of
potential functions to a specific probabilistic
distribution.
L
L?L
?
?
?
22
Why Not Probability Function
for the Factorization Rule (2/2)
•Why potential function
but not probability function?
•It is all for flexibility
•We can define any function as we want
•But a little restriction (compared to probability)
has still to be made for potential function
C
(x
C
).
•And note that the p(x)is still a probability
function
L
L?L
?
?
?
23
The Potential Function
•Potential function
C
(x
C
)
•
C
(x
C
) 0, to ensure that p(x)0.
•Therefore it is usually convenient to express
them as exponentials
•E(x
C
) is called an energy function, and the
exponential representation is called the
Boltzmann distribution.
24
Energy Function
•The joint distribution is defined as the
product of potentials
•The total energy is obtained by adding the
energies of each of the maximal cliques.
L
L
1
<
?
ψ
?
:
?
;
?
L
1
<
?exp <F':
?
;=
?
L
1
<
exp
F?':
?
;
?
L
1
<
exp <F': ;=
25
Total Energy Function
•The total energy function can define the
property of a MRF
?
?
Next, we will use an example, image de‐noising,
to illustrate how to design the energy function.
26
Binary Image
•Let the observed noisy image be described
by an array of binary pixel values
y
i
{−1,+1}, where the index i= 1, . . . , D
runs over all pixels.
28
Simulation: generate noisy images
•We have the original noise‐free image,
described by binary pixel values x
i{−1,+1}.
•We randomly flipping the sign of pixels with
some small probability, say 10%.
Simulation
29
Modelling
•Because the noise level is small,
•There is a strong
correlation between
x
i
and y
i.
•We also know that
•Neighbouringpixels
x
i
and x
j
in an image
are strongly correlated.
(noisy pixel)
(noise‐free pixel)
x
j
30
Modelling ‐Cliques
•This graph has two types of cliques, each of
which contains two variables.
•{ x
i
, y
i
}
•{ x
i
, x
j
}
•We need to define
two energy functions
for the two cliques.
x
j
31
Modelling –Energy Function (1/2)
•{ x
i
, y
i
} energy function
•Expresses the correlation
between these variables
•−
x
iy
i
•
is a positive constant
•Why?
•Remember that
5
?
•A lower energy encouraging
a higher probability
•Low energy when x
i
and y
i
have the same sign
•Higher energy when they have the opposite sign
32
Modelling –Energy Function (2/2)
•{ x
i
, x
j
} energy function
•Expresses the correlation
between these variables
•−
x
ix
j
•
is a positive constant
•Why?
•Low energy when x
i
and x
j
have the same sign
•Higher energy when they have the opposite sign.
x
j
33
Modelling ‐Total Energy Function (2/2)
•The complete energy function
for the model
(noisy pixel)
(noise‐free pixel)
•We add an extra term hx
i
for each
pixel iin the noise‐free image.
•It has the effect of
•Biasing the model towards pixel
values that have one particular
sign in preference to the other
'
,! LD?T
?
?
F??T
?
T
?
?,?
F?T
?
U
?
?
35
Modelling–Final Representation
L
,! L
1
<
exp <F': ,!;=
' ,! LD?T
?
?
F??T
?
T
?
?,?
F?T
?
U
?
?
= {T
?
}, !L<U
?
=
36
Two Algorithms for Solutions
•How to find solution of
•Iterated Conditional Modes (ICM)
•Proposed by Kittler & Foglein, 1984
•Simply a coordinate‐wise gradient ascent algorithm
•Local maximumsolution
•Description in Wikipedia
•Graph Cuts
•Guaranteed to find the global maximum solution •Description in Wikipedia
5. Relation to Directed Graphs
•We have introduced two graphical
frameworks for representing probability
distributions, corresponding to directed and
undirected graphs
•It is instructive to discuss the relation
between these.
•Details is TBU(To Be Updated)
40
Converting Directed to Undirected Graphs
(1)
41
Converting Directed to Undirected Graphs
(2)
Additional links
42