bayes_machine_learning_book for data scientist

addamare554 9 views 38 slides Oct 10, 2024

Slide 1 of 38

About This Presentation

bayes theory.
which is very important in machine learning.
we can use this for ml

Size: 343.88 KB

Language: en

Added: Oct 10, 2024

Slides: 38 pages

Slide Content

BML lecture #1: Bayesics
http://github.com/rbardenet/bml-course
R´emi Bardenet
[email protected]
CNRS & CRIStAL, Univ. Lille, France
1 / 38

What comes toyourmind when you hear ”Bayesian ML”?
2 / 38

Course outline
3 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
4 / 38

Quotes from Gelman et al., 2013 on Bayesian methods
▶[...] practical methods for making inferences from data, using
probability models for quantities we observe
which we wish to learn.
▶The essential characteristic of Bayesian methods is their
of probability for quantifying uncertainty
statistical data analysis.
▶Three steps:
1Setting up a full probability model,
2Conditioning on observed data, calculating and interpreting the
appropriate “posterior distribution”,
3Evaluating the fit of the model and the implications of the resulting
posterior distribution. In response, one can alter or expand the
model and repeat the three steps.
5 / 38

Notation that I will try to stick to
▶y1:n= (y1, . . . ,yn)∈ Y
n
denote observable data/labels.
▶x1:n∈ X
n
denote covariates/features/hidden states.
▶z1:n∈ Z
n
denote hidden variables.
▶θ∈Θ denote parameters.
▶Xdenotes anX-valued random variable. Lowercasexdenotes either
a point inXor anX-valued random variable.
6 / 38

More notation
▶Whenever it can easily be made formal, we write densities for our
random variables and let the context indicate what is meant. So if
X∼ N(0, σ
2
), we write
Eh(X) =
Z
h(x)
e
−x
2
/2σ
2
σ
√
2π
dx=
Z
h(x)p(x)dx.
Similarly, forX∼ P(λ), we write
Eh(X) =
∞
X
k=0
h(k)e
−λ
λ
k
k!
=
Z
h(x)p(x)dx
▶All pdfs are denoted byp, so that, e. g.
Eh(Y, θ) =
Z
h(y, θ)p(y, θ)dydθ
=
Z
h(y, θ)p(y,x, θ)dxdydθ
=
Z
h(y, θ)p(y, θ|x)p(x)dxdydθ
7 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
8 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
9 / 38

Inference in regression models
10 / 38

Inference in regression models
11 / 38

Inference in regression models
12 / 38

Inference in regression models
13 / 38

Inference in regression models
14 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
15 / 38

Describing a decision problem under uncertainty
▶A state spaceS,
Every quantity you need to consider to make your decision.
▶ActionsA ⊂ F(S,Z),
Making a decision means picking one of the available actions.
▶A reward spaceZ,
Encodes how you feel about having picked a particular action.
▶A loss functionL:A × S →R+.
How much you would suffer from picking actionain states.
16 / 38

Classification as a decision problem
▶S=X
n
× Y
n
× X × Y, i.e.s= (x1:n,y1:n,x,y).
▶Z={0,1}.
▶A={ag:s7→1
y̸=g(x;x1:n,y1:n),g∈ G}.
▶L(ag,s) = 1
y̸=g(x;x1:n,y1:n).
PAC bounds; see e.g. (Shalev-Shwartz and Ben-David, 2014)
Let (x1:n,y1:n)∼P
⊗n
, and independently (x,y)∼P, we want an
algorithmg(·;x1:n,y1:n)∈ Gsuch that ifn⩾n(δ, ε),
P
⊗n
Θ
E
(x,y)∼PL(ag,s)⩽ε
Λ
⩾1−δ.
17 / 38

Regression as a decision problem
▶S=
▶Z=
▶A=
▶
18 / 38

Estimation as a decision problem
▶S=
▶Z=
▶A=
▶
19 / 38

Clustering as a decision problem
▶S=
▶Z=
▶A=
▶
20 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
21 / 38

SEU is what defines the Bayesian approach
The subjective expected utility principle
1ChooseS,Z,Aand a loss functionL(a,s),
2Choose poverS,
3Take the the corresponding
a
⋆
∈arg min
a∈A
Es∼pL(a,s). (1)
Corollary: minimize the posterior expected loss
Now partitions= (sobs,su), then
a
⋆
∈arg min
a∈A
Esobs
E
su|sobs
L(a,s).
In ML,A={ag}, withg=g(sobs), so that (1) is equivalent to
a
⋆
=ag
⋆, with
g
⋆
(sobs)≜arg min
g
E
su|sobs
L(a,s).
22 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
23 / 38

A recap on probabilistic graphical models 1/2
▶PGMs (aka “Bayesian” networks) represent the dependencies in a
joint distributionp(s) by G= (E,V).
▶Two important properties:
p(s) =
Y
v∈V
p(sv|s
pa(v)) and yv⊥y
nd(v)|y
pa(v).
24 / 38

A recap on probabilistic graphical models 2/2
Also good to know how to determine whetherA⊥B|C; see (Murphy,
2012, Section 10.5).
d-blocking
An undirected pathPinGisd-blocked byE⊂Vif at least one of the
following conditions hold.
▶Pcontains a “chain”a→b→candb∈E.
▶Pcontains a Ψent”a←b→candb∈E.
▶Pcontains a “v-structure”a→b←cand neitherbnor any of its
descendants are inE.
Theorem
25 / 38

Exercise
▶Doesx2⊥x6|x5,x1?
▶Doesx2⊥x6|x1?
▶Write the joint distribution as factorized over the graph.
26 / 38

Estimation as a decision problem: point estimates
27 / 38

Estimation as a decision problem: credible intervals
28 / 38

Choosing priors (see Exercises)
29 / 38

Classification as a decision problem
30 / 38

Regression as a decision problem 1/2
31 / 38

Regression as a decision problem 2/2
32 / 38

Dimensionality reduction as a decision problem
33 / 38

Clustering as a decision problem
34 / 38

Topic modelling as a decision problem
35 / 38

Outline
1A warmup: Estimation in regression models
2ML as data-driven decision-making
3Subjective expected utility
4Specifying joint models
550 shades of Bayes
36 / 38

50 shades of Bayes
An issue (or is it?)
Depending on how they interpret and how they implement SEU, you will
meet many types of Bayesians (46656, according to Good).
A few divisive questions
▶Using data or the likelihood to choose your prior; see Lecture #5.
▶Using MAP estimators for their computational tractability, like in
inverse problems
ˆxλ∈arg min∥y−Ax∥+λΩ(x).
▶When and how should you revise your model (likelihood or prior)?
▶MCMC vs variational Bayes (more in Lectures #2 and #3)
37 / 38

References
[1] Bayesian data analysis.
[2] Machine learning: a probabilistic perspective.
Press, 2012.
[3] Understanding machine
learning: From theory to algorithms.
2014.
38 / 38

bayes_machine_learning_book for data scientist

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

bayes_machine_learning_book for data scientist

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd