Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)

Insufficient Gibbs sampling
Christian P. Robert
U. Paris Dauphine PSL & Warwick U.
Joint work with Antoine Luciano and Robin Ryder

ERCall for oceaneers
Ongoing 2023-2030 ERC funding for PhD and postdoctoral
collaborations with
▶Michael Jordan (Berkeley)
▶Eric Moulines (Paris)
▶Gareth O Roberts (Warwick)
▶myself (Paris)
on OCEAN (On IntelligenCE And Networks) project

Outline
1Introduction
2Methodology for different statistics
Median, MAD
Quantiles
Median, IQR
3Impact on Bayesian Model Choice
4Experiments
Gaussian Case
Cauchy Example
3-Parameters Weibull Example
Normal-Laplace Comparison
5Conclusion

Motivations
▶Framework:Only robust statisticsT(X)of dataX∈R
N
are available, attached with a parametric modelFθonX.
▶Goal:Sample from the posterior distribution of the
parametersθgivenT(X),π(θ|T(X))
▶Issue:Posterior intractable for most robust statistics.
▶State-of-the-art:Approximate Bayesian Computation
(ABC) methods withT(X)as a summary statistics only
deliver a sample from an approximation ofπ(θ|X)(that
may be close toπ(θ|T(X))).
[Marin & al., 2011]
▶Connections with

Gibbs Sampler
Resolution by data completion since full posterior can directly
be simulated (e.g., conjuguate prior) or using a MH step
[Tanner & Wong, 1987]
X
0
←Xinit(N, T(X),F);
fort = 1,. . . ,Tdo
X
t
|T(X), θ
t−1
∼π(·|X
t−1
, T(X), F
θ
t−1);
θ
t
|X
t
∼π(·|X
t
);
end
Gibbs Sampler / Data Augmentation

Methodology for different statistics
1Introduction
2Methodology for different statistics
Median, MAD
Quantiles
Median, IQR
3Impact on Bayesian Model Choice
4Experiments
Gaussian Case
Cauchy Example
3-Parameters Weibull Example
Normal-Laplace Comparison
5Conclusion

Median,MAD
Here we haveT(X) = (median(X),MAD(X))where:
(a) the median is the 50% quantile of the empirical distribution,
i.e., for a sampleX= (X1, . . . , XN)
median(X) =
ffi
X
(k) ifN=2k+1
X
(k)+X
(k+1)
2
ifN=2k
whereX
(i)denotes theith order statistic of the sample.
(b) the median absolute seviation (MAD) is a measure of
statistical dispersion that is commonly used as a robust
alternative to the standard deviation, defined as:
MAD(X)=median(|X−median(X)|)

Median,MAD
Basic Idea :
50−k%k%50−k%k%med−MADmedmed+MAD50%
withk∈[0, 50].
The distribution ofkis unknown, thus a direct sampling of the
datavector is impossible.
Gibbs alternative: randomly select (or not) two coordinates
(i, j)∈{1, . . . , N}
2
for resampling
X
t+1
i
, X
t+1
j
|T(X
t
), X
t
−(i,j)
, θ
t

Median, MAD : Odd Case
Median and MAD being based on order statistics, two different
settings for odd and evenN.

Median, MAD : Odd Case
Randomly pick indicesiandj
1. (Xi, Xj) = (m, XMAD):(
˜
Xi,
˜
Xj) = (Xi, Xj)
2. Xj=m:
˜
Xi∼Fθ1
Zone(Xi)and
˜
Xj=Xj
3. Xj=XMAD:
3.1 (Xi−m)(Xj−m)> 0:
˜
Xi∼Fθ1
Zone(Xi)and
˜
Xj=XMAD
3.2
▶˜
Xi∼Fθ1Zone(X
i)∪Zone
S(X
i)
▶˜
Xj=

m−sif
˜
Xi> m
m+selse
4.
4.1 Xi) , Zone(Xj)) = (Z1, Z3)ou(Z2, Z4):
▶˜
Xi∼Fθ
▶˜
Xj∼Fθ1
Zone
C(
˜
X
i)
4.2
˜
Xi∼Fθ1
Zone(Xi)and
˜
Xj∼Fθ1
Zone(Xj)
(Xi, Xj) = (
˜
Xi,
˜
Xj)

Median, MAD : Even Case

Quantiles
Now assume one observes only sample quantiles,(q1, . . . , qK)
and(p1, . . . , pK)withpk∈[0, 1]andqk∈Rfork∈{1, . . . , K}
such that
Q(pk) =
c
Pθ(Xi≤pk) =qk
So
T(X) = ((q1, . . . , qK),(p1, . . . , pK))

Quantiles
General case :
p1%p2−p1%. . .pK−pK−1%100−pK%q1q2. . .qK−1qK
Example (K=3) :If(pk)
k∈{1,...,K}= (.25, .5, .75)and
(qk)
k∈{1,...,K}= (q1, med, q3)then we have :
25%25%25%25%q1medq3

Quantile specificity
Definition
Q(X, p) = (1−g)X
(i)+gX
(i+1)
whereh= (N−1)p+1,i
integer part ofhi.ei=⌊h⌋andgits fractional part i.eg=h−i.
[Hyndman, 1996]
▶used by default in R, Julia, NumPy, and Excel
▶Ifgk=0, we haveQ(X, pk) =X
(ik)“deterministic” andik
denotes its index.
▶Ifgk> 0we haveQ(X, pk) = (1−gk)X
(ik)+gkX
(ik+1)
Q(X, pk)a linear combination of the order statistics of
indexesikandik+1. One of the two need be “simulated”
(e.g., the smallest one) and the other is then deterministic.

Quantiles
Recall the joint density of arbitrary numberKorder statistics
f
(i1,...,iK)(x1, . . . , xK) =N!
K
Y
k=1
f(xk)
K
Y
k=0
(F(xk+1) −F(xk))
ik+1−ik−1
(ik+1−ik−1)!
wherefandF, pdf and pmf ofFθ, resp.

Quantiles
Denote
KD={k∈{1, . . . , K} |gk=0}
KS={k∈{1, . . . , K} |gk> 0}
We have{1, . . . , K}=KD∪KS, with
ffi
qk=X
(ik) ∀k∈KD
qk= (1−gk)X
(ik)+gkX
(ik+1)∀k∈KS

Quantiles
Denote
KD={k∈{1, . . . , K} |gk=0}
KS={k∈{1, . . . , K} |gk> 0}
We have{1, . . . , K}=KD∪KS, with
ffi
X
(ik)=qk ∀k∈KD
X
(ik+1)=
qk−X
(i
k
)(1−gk)
gk
∀k∈KS
We need sample|KS|order statistics to set theK+|KS|order
statistics ofXthat define the values of(Q(X, pk))
k∈{1,...,K}.

Quantiles
f
(X
(i
k
))k∈K
S
|(Q(X,p1),...,Q(X,pK))=(q1,...,qK)(x1, . . . , x
|KS|)
∝f
(X
(i
k
))k∈K
S
,(Q(X,p1),...,Q(X,pK))=(q1,...,qK)(x1, . . . , x
|KS|, q1, . . . , qK)
∝f
(i)i∈I
(ϕ
−1
(x1, . . . , x
|KS|, q1, . . . , qK))
(1)

Median, IQR
The median is the 50% quantile of a distribution. For a sample
X= (X1, . . . , XN)of random variables, the median is defined as:
median(X)=
ffi
X
(k), ifN=2k+1
X
(k)+X
(k+1)
2
, ifN=2k
whereX
(i)denotes theith order statistic ofX.
The Interquartile Range is the difference between the 75% and
25% quantiles of a distribution. For a sampleX= (X1, . . . , XN)
of i.i.d. random variables, the IQR is defined as:
IQR=Q(X, 0.75)−Q(X, 0.25)
whereQis the quantile function ofFθ.

Median, IQR
Basic Idea :
25%25%25%25%Q1medQ3Q3−Q1=IQR

Median, IQR
IfN=4k+1(simplest case) we have :











Q1=X
(k+1)1
m=X
(2k+1)
Q3=X
(3k+1)
i=Q3−Q1
⇐⇒





X
(k+1)=Q1
X
(2k+1)=m
X
(3k+1)=Q1+i

Impact on Bayesian Model Choice
1Introduction
2Methodology for different statistics
Median, MAD
Quantiles
Median, IQR
3Impact on Bayesian Model Choice
4Experiments
Gaussian Case
Cauchy Example
3-Parameters Weibull Example
Normal-Laplace Comparison
5Conclusion

Bayesian Model Choice
Several modelsM1, M2, . . .are considered simultaneously for a
datasetyand the model indexMis part of the inference.
Use of a prior distribution,π(M=m), plus a prior distribution
on the parameter conditional on the valuemof the model
index,πm(θm)
Goal is to derive the posterior distribution ofM, challenging
computational target when models are complex.

Bayesian model choice
Comparison of modelsMiby a Bayesian approach
Probabili-ses (!) the entire model/parameter space
▶allocate probabilitiespito all modelsMi
▶define priorsπi(θi)for each parameter spaceΘi
▶compute
π(Mi|x) =
pi
Z
Θi
fi(x|θi)πi(θi)dθi
X
j
pj
Z
Θj
fj(x|θj)πj(θj)dθj
[Wrinch & Jeffreys, 1919]

Bayesian model choice
Comparison of modelsMiby a Bayesian approach
Relies on a central notion: theevidence
evk=
Z
Θk
πk(θk)Lk(θk)dθk,
aka the marginal likelihood.
[Wrinch & Jeffreys, 1919]

Generic ABC for model choice
fort=1toTdo
Repeat
Generatemfrom the priorπ(M=m)
Generateθmfrom the priorπm(θm)
Generatezfrom the modelfm(z|θm)
Untilρ{η(z), η(y)}< ϵ
Setm
(t)
=mandθ
(t)
=θm
end
ABC-MC
[Toni, Welch, Strelkowa, Ipsen & Stumpf, 2009]

Back to sufficiency
‘Sufficient statistics for individual models are unlikely
to be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]
Ifη1(x)sufficient statistic for modelm=1and parameterθ1
andη2(x)sufficient statistic for modelm=2and parameterθ2,
(η1(x), η2(x))is not always sufficient for(m, θm)
©Potential loss of information at the testing level
[Marin & al., 2013]

Back to sufficiency
‘Sufficient statistics for individual models are unlikely
to be very informative for the model probability.’
[Scott Sisson, Jan. 31, 2011, X.’Og]Ifη1(x)sufficient statistic for modelm=1and parameterθ1
andη2(x)sufficient statistic for modelm=2and parameterθ2,
(η1(x), η2(x))is not always sufficient for(m, θm)
©Potential loss of information at the testing level
[Marin & al., 2013]

Limiting behaviour ofB12(under sufficiency)
Ifη(y)sufficient statistic for both models,
fi(y|θi) =gi(y)f
η
i
(η(y)|θi)
Thus
B12(y) =
R
Θ1
π(θ1)g1(y)f
η
1
(η(y)|θ1)dθ1
R
Θ2
π(θ2)g2(y)f
η
2
(η(y)|θ2)dθ2
=
g1(y)
R
π1(θ1)f
η
1
(η(y)|θ1)dθ1
g2(y)
R
π2(θ2)f
η
2
(η(y)|θ2)dθ2
=
g1(y)
g2(y)
B
η
12
(y).
[Didelot, Everitt, Johansen & Lawson, 2011]
©No discrepancy only when cross-model sufficiency

Insufficient Bridge Sampling
Inclusion of missing data into bridge estimate
b
B12≈
1
n1
n1X
i=1
˜π2(θ1i|x)
n1π1(θ1i,y1i) +n2π2(θ1i,y1i)
1
n2
n2X
i=1
˜π1(θ2i,y2i)
n1π1(θ2i,y2i) +n2π2(θ2i,y2i)

Results
1Introduction
2Methodology for different statistics
Median, MAD
Quantiles
Median, IQR
3Impact on Bayesian Model Choice
4Experiments
Gaussian Case
Cauchy Example
3-Parameters Weibull Example
Normal-Laplace Comparison
5Conclusion

Gaussian Case
Conjuguate Prior Inverse Gamma:
▶Prior :µ, σ
2
∼Normal-InverseGamma(µ0, τ, α, β)
▶Posterior :µ, σ
2
|X∼Normal-InverseGamma(M, C, A, B)
where
M=
νµ0+N
¯
X
ν+N
andC=ν+N
A=α+
N
2
andB=β+
1
2
(NS
2
+
Nν
ν+N
(
¯
X−µ0)
2
)
Robust statistics:
▶µ≈medandσ≈MAD/Φ
−1
(3/4) =IQR/2Φ
−1
(3/4)
whenNlarge enough
▶efficiency :effmed=2/π≈0.64and
effMAD=effIQR≈0.36

Gaussian Example
For comparison purposes, consider approximation of the
simulated posterior
˜
pi(µ, σ
2
|med, MAD)by
Normal-InverseGamma(
˜
M,
˜
C,
˜
A,
˜
B)
where
˜
M=
νµ0+N·effmed·med
ν+N·effmed
,
˜
C=ν+N·effmed,
˜
A=α+
N·effMAD
2
,
˜
B=β+
1
2
`
N·effMAD·(C·MAD)
2
+
N·effMADν
ν+N·effMAD
(
¯
X−µ0)
2
´
.

Gaussian Example
Figure: µandσfor a sample of size
N=1000, med= −5, MAD=3givenT(X) = (med, MAD)with the
Robust Gibbs in blue and the appproximation in red.

Cauchy Example
▶Density :
fθ(x) =
1
πγ
1
1+ (x−x0)
2
soθ= (x0, γ)wherex0∈R(location parameter) andγ > 0
(scale parameter)
▶Heavy tailed distribution :Eθ(X)not defined and
Varθ(X) =∞
▶Convergent estimators :x0≈medandγ≈MADwhen
N→ ∞

Cauchy Example
Figure: x0andγfor a sample of size
N=1000, med=0, MAD=1givenT(X) = (med, MAD)in blue and
givenXin orange.

3-Parameters Weibull Example
Used to modelize income data.
Density :
f(x) =
α
β
(
x−x0
β
)
α−1
e
−(
x−x
0
β
)
α
whereα > 0(shape parameter),β > 0(scale parameter) and
x0∈R(location parameter)
Quantiles method for different values of K.

Weibull Example
Figure:
for a sample of sizeN=1000given
T(X) = ((qk)k=1,...,K,(pk)k=1,...,K,)with the Robust Gibbs forK=3
(in blue),4(in orange) and9(in green).

Normal-Laplace Comparison
Comparison of model
M1:y∼N(θ1, 1)against modelM2:y∼L(θ2, 1/
√
2)
where Laplace distribution has meanθ2and scale parameter
1/sqrt2, resulting in a variance of 1.
Since median, first and second-order empirical moments
asymptotically match for these models, summary statistics are
useless for model selection.
Model selection is possible when using the MAD as summary
statistic
[Marin & al., 2013]

Normal-Laplace Comparison
Computation of Bayes factor when
T(y) = (med(y), MAD(y))
by bridge sampling (and not ABC approximation)Gauss Laplace
0.0
0.2
0.4
0.6
0.8
1.0
p(1|T0) Gauss Laplace
0.0
0.2
0.4
0.6
0.8
1.0
p(1|T0) Gauss Laplace
0.0
0.2
0.4
0.6
0.8
1.0
p(1|T0)
Figure:
modelM1as number of observationsNincreases, for Gaussian and
Laplace simulated samples, resp. Boxplots based on 100 Monte Carlo
runs with104bridge sampling simulations.

Conclusion
▶Among the three robust statistics addressed in this paper,
two exhibited similarities, yielding comparable results to
existing methods.
▶Case of median absolute deviation (MAD) novel challenge,
for which our partial data augmentation ensures ergodicity.

Conclusion
Exact method for Bayesian model choice by extending bridge
sampling estimators, while relying solely on observed statistics,
if limited by both the standard conditions of bridge sampling
and the size of the simulated data.
While focussed on continuous univariate distributions,
extension to various observed statistics, discrete distributions
and multivariate data formaly identical

The elephant in the ocean
Method falling victim to curse of dimensionality, when
dimension or size of data too large.

The elephant in the ocean
Method falling victim to curse of dimensionality, when
dimension or size of data too large.
▶A novel intermediary step similar to principal components
towards identifying most discriminating subspace would be
necessary for extension.
▶Since the best summary statistic for ABC model choice is
the pair of evidencesevk(k=1, 2), simulating directly this
pair sounds like our Holy Grail...
[Prangle & al., 2013]

Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......