Probabilistic Machine Learning
Lecture 03
Continuous Variables
Philipp Hennig
24 April 2023
Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
Recap from Lecture 1
Plausibility as a Measure [A.N. Kolmogorov.Grundbegriffe der Wahrscheinlichkeitsrechnung, 1933]
Definition (Measure & Probability Measure)
Let(Ω,F)be ameasurable space(aka. Borel space). A nonnegative real functionP:F_R0,+(III.) is
called ameasureif it satisfies the following properties:
1.P(∅) =0
2.For any countable sequencefAi2Fgi=1,...,of pairwise disjoint sets (Ai\Aj=∅ifi6=j),P
satisfiescountable additivity(aka.σ-additivity):
P
Cumulative Distributions
Connecting probabilities to integration
Definition (Cumulative Distribution Function (CDF))
LetBbe the Borelσ-algebra inR
d
. For probability measuresPon(R
d
,B), thecumulative distribution
functionis the function
F(x) =P
Probability Densities
a convenient way to write things down
Definition (Probability Density Functions (pdf’s))
A probability measurePon(R
d
,B)has adensitypifpis a non-negative (Borel) measurable function
onR
d
satisfying, for allB2B
P(B) =
Z
B
p(x)dx=:
Z
B
p(x1, . . . ,x
d)dx1. . .dx
d
In particular, if the CDFFofPis sufficiently differentiable, thenPhas a density, given by
p(x) =
∂
d
F
∂x1 ∂x
d
Change of Measure
The transformation law
Theorem (Change of Variable for Probability Density Functions)
Let X be a continuous random variable with PDF pX(x)over c1<x<c2. And, let Y=u(X)be a
monotonic differentiable function with inverse X=v(Y). Then the PDF of Y is
pY(y) =pX(v(y))
dv(y)
dy
=pX(v(y))
du(x)
dx
−1
.
Proof:foru
′
(X)>0:8d1=u(c1)<y<u(c2) =d2
FY(y) =P(Y0y) =P(u(X)0y) =P(X0v(y)) =
Z
v(y)
c1
pX(x)dx
pY(y) =
dFY(y)
dy
=pX(v(y))
dv(y)
dy
=pX(v(y))
Change of Measure
The transformation law
Theorem (Change of Variable for Probability Density Functions)
Let X be a continuous random variable with PDF pX(x)over c1<x<c2. And, let Y=u(X)be a
monotonic differentiable function with inverse X=v(Y). Then the PDF of Y is
pY(y) =pX(v(y))
dv(y)
dy
=pX(v(y))
du(x)
dx
−1
.
Proof:foru
′
(X)<0:8d2=u(c2)<y<u(c1) =d1
FY(y) =P(Y0y) =P(u(X)0y) =P(X1v(y)) =1ΓP(X0v(y)) =1Γ
Z
v(y)
c1
pX(x)dx
pY(y) =
dFY(y)
dy
=ΓpX(v(y))
dv(y)
dy
=pX(v(y))
Example — inferring probability of wearing glasses (3)
Step 2: Define probability space, taking care of conditional independence
Probability of wearing glasses without observations
p(πj“nothing”) =p(π)
Probability of wearing glasses after one observation
p(πjx1) =
p(x1jπ)p(π)
R
p(x1jπ)p(π)dπ
=Z
−1
1
p(x1jπ)p(π)
Probability of wearing glasses after two observations
p(πjx1,x2) =Z
−1
2
p(x2jx1, π)p(x1jπ)p(π) =Z
−1
2
p(x2jπ)p(x1jπ)p(π)
…
Probability of wearing glasses after five observations
p(πjx1,x2,x3,x4,x5) =Z
−1
5
Example — inferring probability of wearing glasses (4)
Step 3: Define analytic forms of generative model
What is the likelihood?
p(x1jπ) =
6
πforx1=1
1Γπforx1=0
More helpful RVs:
▶RVNfor the number of observations being 1 (with valuesn)
▶RVMfor the number of observations being 0 (with valuesm)
Probability of wearing glasses after five observations
p(πjx1,x2,x3,x4,x5) =Z
−1
5
Laplace’s Approximation
What you do when it’s 1814 and you don’t have a computer P.-S. Laplace, 1814
Pierre-Simon, marquis de Laplace (1749–1827)
p(xja,b) =
x
a−1
(1Γx)
b−1
B(a,b)
logp(xja,b) = (aΓ1)logx+ (bΓ1)log(1Γx)Γconst.
∂logp(xja,b)
∂x
=
aΓ1
x
Γ
bΓ1
1Γx
)ˆx:=
aΓ1
a+bΓ2
∂
2
logp(xja,b)
∂x
2
Laplace’s Approximation
What you do when it’s 1814 and you don’t have a computer P.-S. Laplace, 1814
Pierre-Simon, marquis de Laplace (1749–1827)
p(xja,b) =
x
a−1
(1Γx)
b−1
B(a,b)
ˆx:=
aΓ1
a+bΓ2
Ψ:=Γ(a+bΓ2)
2
.
1
aΓ1
+
1
bΓ1
/
Z
p(x)dx5p(ˆx)
Z
exp
.
Γ
(xΓˆx)
2
2(ΓΨ
−1
)
/
dx=1
B(a,b)5ˆx
a−1
ˆx
b−1