The Multivariate Gaussian Probability Distribution

198 views 14 slides Nov 24, 2021
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

The Multivariate Gaussian Probability Distribution


Slide Content

The Multivariate Gaussian Probability
Distribution
Peter Ahrendt
IMM, Technical University of Denmark
mail : [email protected], web : www.imm.dtu.dk/»pa
January 7, 2005

Contents
1 De¯nition 2
2 Functions of Gaussian Variables 4
3 Characteristic function and Moments 6
4 Marginalization and Conditional Distribution 9
4.1 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Conditional distribution . . . . . . . . . . . . . . . . . . . . . . . 10
5 Tips and Tricks 11
5.1 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Useful integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1

Chapter 1
De¯nition
The de¯nition of a multivariate gaussian probability distribution can be stated
in several equivalent ways. A random vectorX= [X1X2: : :XN] can be said to
belong to a multivariate gaussian distribution if one of the following statements
is true.
²Any linear combinationY=a1X1+a2X2+: : :+aNXN,ai2Ris a
(univariate) gaussian distribution.
²There exists a random vectorZ= [Z1; : : : ;ZM] with components that are
independent and standard normal distributed, a vector¹= [¹1; : : : ; ¹N]
and an N-by-M matrixAsuch thatX=AZ+¹.
²There exists a vector¹and a symmetric, positive semi-de¯nite matrix¡
such that the characteristic function ofXcan be writtenÁx(t)´ he
it
T
X
i=
e

T

1
2
t
T
¡t
.
Under the assumption that the covariance matrix§is non-singular, the prob-
ability density function (pdf) can be written as :
Nx(¹;§) =
1
p
(2¼)
d
j§j
exp
µ
¡
1
2
(x¡¹)
T
§
¡1
(x¡¹)

=j2¼§j
¡
1
2exp
µ
¡
1
2
(x¡¹)
T
§
¡1
(x¡¹)
¶ (1.1)
Then¹is themean value,§is thecovariance matrixandj ¢ jdenote the
determinant. Note that it is possible to have multivariate gaussian distributions
with singular covariance matrix and then the above expression cannot be used
for the pdf. In the following, however, non-singular covariance matrices will be
assumed.
In the limit of one dimension, the familiar expression of the univariate gaussian
pdf is found.
2

Nx(¹; ¾
2
) =
1
p
2¼¾
2
exp
µ
¡
(x¡¹)
2

2

=
1
p
2¼¾
2
exp
µ
¡
1
2
(x¡¹)¾
¡2
(x¡¹)

(1.2)
Neither of them have a closed-form expression for the cumulative density func-
tion.
Symmetries
It is noted that in the one-dimensional case there is a symmetry in the pdf.
Nx(¹; ¾
2
) which is centered on¹. This can be seen by looking at "contour
lines", i.e. setting the exponent¡
(x¡¹)
2

2=c. It is seen that¾determines the
width of the distribution.
In the multivariate case, it is similarly useful to look at¡
1
2
(x¡¹)
T
§
¡1
(x¡¹) =c.
This is a quadratic form and geometrically the contour curves (for ¯xedc) are hy-
perellipsoids. In 2D, this is normal ellipsoids with the form (
x¡x0
a
)
2
+ (
y¡y0
b
)
2
=
r
2
, which gives symmetries along the principal axes. Similarly, the hyperellip-
soids show symmetries along their principal axes.
Notation: If a random variableXhas a gaussian distribution, it is written as
XsN(¹;§). The probability density function of this variable is then given by
Nx(¹;§).
3

Chapter 2
Functions of Gaussian
Variables
Linear transformation and addition of variables
LetA;B2M
c¢d
andc2R
d
. LetXsN(¹
x;§x) andYsN(¹
y;§y) be
independent variables. Then
Z=AX+BY+csN(A¹
x+B¹
y+c;A§xA
T
+B§yB
T
) (2.1)
Transform to standard normal variables
LetXsN(¹;§). Then
Z=§
¡
1
2(X¡¹)sN(0;1) (2.2)
Note, that by§
¡
1
2is actually meant a unique matrix, although in general
matrices with fractional exponents are not. The matrix that is meant can be
found from the diagonalisation into§=U¤U
T
= (U¤
1
2)(U¤
1
2)
T
where¤
is the diagonal matrix with the eigenvalues of§andUis the matrix with the
eigenvectors. Then§
¡
1
2= (U¤
1
2)
¡1

¡
1
2U
¡1
.
In the one-dimensional case, this corresponds to the transformation of Xs
N(¹; ¾
2
) into Y =¾
¡1
(X¡¹)sN(0;1).
Addition
LetXisN(¹
i;§i); i21; :::; Nbe independent variables. Then
N
X
i
XisN(
N
X
i
¹
i;
N
X
i
§i) (2.3)
Note: This is a direct implication of equation (2.1).
4

Quadratic
Let XisN(0;1); i21; :::; Nbe independent variables. Then
N
X
i
X
2
isÂ
2
n (2.4)
Alternatively letXsN(¹
x;§x). Then
Z = (X¡¹)
T
§
¡1
(X¡¹)sÂ
2
n (2.5)
This is, however, the same thing since Z=
~
X
T
~
X=
P
N
i
~
Xi
2
, where
~
Xiare the
decorrelated components (see eqn. (2.2)).
5

Chapter 3
Characteristic function and
Moments
The characteristic function of the univariate gaussian distribution is given by
Áx(t)´ he
itX
i=e
it¹¡¾
2
t
2
=2
. The generalization to multivariate gaussian distri-
butions is
Áx(t)´ he
it
T
X
i=e

T

1
2
t
T
§t
(3.1)
The pdfp(x) is related to the characteristic function.
p(x) =
1
(2¼)
d
Z
R
d
Áx(t)e
¡it
T
x
dt (3.2)
It is seen that the characteristic function is the inverse Fourier transform of the
pdf.
Moments of a pdf are generally de¯ned as :
hX
k1
1
X
k2
2
¢ ¢ ¢X
kN
N
i ´
Z
R
d
x
k1
1
x
k2
2
¢ ¢ ¢x
kN
N
p(x) dx (3.3)
wherehX
k1
1
X
k2
2
¢ ¢ ¢X
kN
N
iis thek'th order moment,k= [k1; k2; : : : ; kN] (ki2N)
andk=k1+k2+: : :+kN. A well-known example is the ¯rst order moment,
called the mean value¹i(of variable Xi) - or the mean¹´[¹1¹2: : : ¹N] of the
whole random vectorX.
Thek'th order central momentis de¯ned as above, but with Xireplaced by
Xi¡¹iin equation (3.3). An example is the second order central moment,
called the variance, which is given byh(Xi¡¹i)
2
i.
Any moment (that exists) can be found from the characteristic function [8]:
6

hX
k1
1
X
k2
2
¢ ¢ ¢X
kN
N
i= (¡j)
k
@
k
Áx(t)
@t
k1
1
: : : @t
kN
N
¯
¯
¯
¯
t=0
(3.4)
wherek=k1+k2+: : :+kN.
1. Order Moments
Mean¹´ hXi (3.5)
2. Order Moments
Variancecii´ h(Xi¡¹i)
2
i=hX
2
ii ¡¹
2
i (3.6)
Covariancecij´ h(Xi¡¹i)(Xj¡¹j)i (3.7)
Covariance matrix§´ h(X¡¹)(X¡¹)
T
i ´[cij] (3.8)
3. Order Moments
Often the skewness is used.
Skew(X)´
h(Xi¡ hXii)
3
i
h(Xi¡ hXii)
2
i
3
2
=
h(Xi¡¹i)
3
i
h(Xi¡¹i)
2
i
3
2
(3.9)
All 3. order central moments are zero for gaussian distributions and thus also
the skewness.
4. Order Moments
The kurtosis is (in newer literature) given as
Kurt(X)´
h(Xi¡¹i)
4
i
h(Xi¡¹i)
2
i
2
¡3 (3.10)
LetXsN(¹;§). Then
h(Xi¡¹i)(Xj¡¹j)(Xk¡¹k)(Xl¡¹l)i=cijckl+cilcjk+cikclj(3.11)
and
Kurt(X) = 0 (3.12)
N. Order Moments
Any central moment of a gaussian distribution can (fairly easily) be calculated
with the following method [3] (sometimes known as Wicks theorem).
LetXsN(¹;§). Then
7

²Assume k is odd. Then the central k'th order moments are all zero.
²Assume k is even. Then the central k'th order moments are equal to
P
(cijckl: : : cxz). The sum is taken over alldi®erentpermutations of the k
indices, where it is noted thatcij=cji. This gives (k¡1)!=(2
k=2¡1
(k=2¡
1)!) terms which each is the product ofk=2 covariances.
An example is illustrative. The di®erent 4. order central moments ofXare found
with the above method to give
h(Xi¡¹i)
4
i= 3c
2
ii
h(Xi¡¹i)
3
(Xj¡¹j)i= 3ciicij
h(Xi¡¹i)
2
(Xj¡¹j)
2
i=ciicjj+ 2c
2
ij
h(Xi¡¹i)
2
(Xj¡¹j)(Xk¡¹k)i=ciicjk+ 2cijcik
h(Xi¡¹i)(Xj¡¹j)(Xk¡¹k)(Xl¡¹l)i=cijckl+cilcjk+cikclj
(3.13)
The above results were found by seeing that the di®erent permutations of the
k=4 indices are (12)(34), (13)(24) and (14)(23). Other permutations are equiv-
alents, such as for instance (32)(14) which is equivalent to (14)(23). When cal-
culating e.g.h(Xi¡¹i)
2
(Xj¡¹j)(Xk¡¹k)i, the assignment (1!i;2!i;3!
j;4!k) gives the termsciicjk,cijcikandcijcikin the sum.
Calculations with moments
Letb2R
c
,A; B2M
c¢d
. LetXandYbe random vectors andfandgvector
functions. Then
hAf(X) +Bg(X) +bi=Ahf(X)i+Bhg(X)i+b (3.14)
hAX+bi=AhXi+b (3.15)
hhYjXii ´E(E(YjX)) =hYi (3.16)
If Xiand Xjare independent then
hXiXji=hXiihXji (3.17)
8

Chapter 4
Marginalization and
Conditional Distribution
4.1 Marginalization
Marginalization is the operation of integrating out variables of the pdf of a
random vectorX. Assume thatXis split into two parts (since the order-
ing of the Xiis arbitrary, this corresponds to any division of the variables),
X= [X
T
1:cX
T
c+1:N]
T
= [X1X2: : :XcXc+1: : :XN]
T
. Let the pdf ofXbep(x) =
p(x1; : : : ; xN), then :
p(x1; : : : ; xc) =
Z
¢ ¢ ¢
Z
R
c+1:N
p(x1; : : : ; xN) dxc+1: : : xN (4.1)
The nice part about gaussian distributions is thatevery marginal distribution
of a gaussian distribution is itself a gaussian. More speci¯cally, letXbe split
into two parts as above andXsN(¹;§), then :
p(x1; : : : ; xc) =p(x1:c) = N1:c(¹
1:c;§1:c) (4.2)
where¹
1:c= [¹1; ¹2; : : : ; ¹c] and
§1:c=
0
B
B
B
B
@
c11c21: : : cc1
c12c22
.
.
.
.
.
.
.
.
.
.
.
.
cc1: : : : : : ccc
1
C
C
C
C
A
In words, themean and covariance matrix of the marginal distribution is the
same as the corresponding elements of the joint distribution.
9

4.2. CONDITIONAL DISTRIBUTION
4.2 Conditional distribution
As in the previous, letX= [X
T
1:cX
T
c+1:N]
T
= [X1X2: : :XcXc+1: : :XN]
T
be a
division of the variables into two parts. LetXsN(¹;§) and use the notation
X= [X
T
1:cX
T
c+1:N]
T
= [X
T
(1)X
T
(2)]
T
and
¹=
µ
¹
(1)
¹
(2)

and
§=
µ
§11§12
§21§22

It is found that the conditional distributionp(x
(1)jx
(2)) is in fact again a gaus-
sian distribution and
X
(1)jX
(2)sN(¹
(1)+§12§
¡1
22
(x2¡¹
(2));§11¡§12§
¡1
22
§
T
12) (4.3)
10

Chapter 5
Tips and Tricks
5.1 Products
Consider the productNx(¹
a;§a)¢Nx(¹
b;§b) and note that they both havex
as their "random variable". Then
Nx(¹
a;§a)¢Nx(¹
b;§b) =zcNx(¹
c;§c) (5.1)
where§c= (§
¡1
a+§
¡1
b
)
¡1
and¹
c=§c(§
¡1

a+§
¡1
b
¹
b) and
zc=j2¼§a§b§
¡1
cj
¡
1
2exp
µ
¡
1
2

a¡¹
b)
T
§
¡1
a§c§
¡1
b

a¡¹
b)

=j2¼(§a+§b)j
¡
1
2exp
µ
¡
1
2

a¡¹
b)
T
(§a+§b)
¡1

a¡¹
b)
¶(5.2)
In words, the product of two gaussians is another gaussian (unnormalized). This
can be generalised to a product of K gaussians with distributionsXksN(¹
k;§k).
K
Y
k=1
Nx(¹
k;§k) = ~z¢Nx(~¹;
~
§) (5.3)
where
~
§=
³
P
K
k=1
§
¡1
k
´
¡1
and~¹=
~
§
³
P
K
k=1
§
¡1
k
¹
k
´
=
³
P
K
k=1
§
¡1
k
´
¡1³
P
K
k=1
§
¡1
k
¹
k
´
and
~z=
j2¼§dj
1
2
Q
K
k=1
j2¼§kj
1
2
Y
i<j
exp
µ
¡
1
2

i¡¹
j)
T
Bij(¹
i¡¹
j)

(5.4)
where
11

5.2. GAUSSIAN INTEGRALS
Bij=§
¡1
i
Ã
K
X
k=1
§
¡1
k
!¡1
§
¡1
j
(5.5)
5.2 Gaussian Integrals
A nice thing about the fact that products of gaussian functions are again a
gaussian function, is that it makes gaussian integrals easier to calculate since
R
Nx(~¹;
~
§)dx= 1. Using this with the equations (5.1) and (5.3) of the previous
section gives the following.
Z
R
d
Nx(¹
a;§a)¢Nx(¹
b;§b) dx=
Z
R
d
zcNx(¹
c;§c) dx
=zc
(5.6)
Similarly,
Z
R
d
K
Y
k=1
Nx(¹
k;§k) dx= ~z (5.7)
Equation (5.3) can also be used to calculate integrals such as
R
jxj
q
(
Q
k
Nx(¹
k;§k)) dx
or similar by using the same technique as above.
5.3 Useful integrals
LetXsN(¹;§) anda2 R
d
an arbitrary vector. Then
he
a
T
x
i ´
Z
Nx(¹;§)e
a
T
x
dx=e
a
T
¹+
1
2
a
T
§a
(5.8)
From this expression, it is possible to ¯nd integrals such as
R
exp(x
T
Ax+
a
T
x) dx. Another useful integral is
he
x
T
Ax
i ´
Z
Nx(¹;§)e
x
T
Ax
dx=jI¡2§Aj
¡
1
2e
¡
1
2
¹
T
(§¡2A
¡1
)
¡1
¹
(5.9)
whereA2M
d¢d
is a non-singular matrix.
12

Bibliography
[1]T.W. Anderson,An introduction to multivariate statistical analysis. Wiley,
1984.
[2]C. M. Bishop,Neural Networks for Pattern Recognition. Oxford University
Press, 1995.
[3]K. Triantafyllopoulos, \On the central moments of the multidimensional
Gaussian distribution,"The Mathematical Scientist, vol. 28, pp. 125{128,
2003.
[4]S. Roweis, \Gaussian Identities," http://www.cs.toronto.edu/ roweis/notes.html.
[5]J. Larsen, \Gaussian Integrals,"Tech. Rep., http://isp.imm.dtu.dk/sta®-
/jlarsen/pubs/frame.htm
[6]www.Wikipedia.org.
[7]www.MathWorld.wolfram.com
[8]P. Kidmose, \Blind Separation of Heavy Tail Signals,"Ph.d. Thesis,
http://isp.imm.dtu.dk/sta®/kidmose/pkpublications.html
13