Introduction to Diffusion Models on deep learning

1

2

3

4
Program
JDLS
12 mai

5
1
2
3
5
6
Introduction
➔ Reminder on generative model
➔ Diffusion Model vs VAE
Denoising Diffusion Probabilistic Models
➔ Principle
➔ Forward and Reverse Diffusion
➔ Training and Sampling
Example: Fashion MNIST
➔ Generation of Fashion MNIST
DDPM improvements
➔ Beta scheduling and Variance learning
➔ Fast sampling
➔ Latent diffusion
DDPM applications
➔ Text-to-image
➔ Other task : inpainting / outpainting / super-resolution

Rappel - VAE - 1
Source https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
6

7
Rappel - VAE - 2
VAE COST FUNCTION
Sampling process
Training process
metric
Decoder
-Gaussian
-Mixture of gaussian

Rappel - GAN - 1
2 networks in opposition :
-Generator
-Discriminator
Ideal solution :
-Generator ～ P(x|z)
-Discriminator = ½

Adding supervision concept in an
unsupervised task !
8
Data to train a
classification
network
Source
https://www.kdnuggets.com/2017/01/generative-adversari
al-networks-hot-topic-machine-learning.html

9
Rappel - GAN - 2
Sampling process
Training process
Generator
Train
discriminator
Train
generator
GAN COST FUNCTION
-Uniform

Rappel - GAN - 3 GAN convergence problems !
Vanishing gradient due to
discriminant being too
perfect, generator can’t train
anymore
Mode collapse due to
generator learning only some
good examples instead of the
whole data distribution
10
True Data
Generated Data
Source
https://lilianweng.github.io/posts/2017-08-20-gan/
No convergence due to the
nature of the problem
(MinMax)
True data
Generated data

11
VAE vs GAN
VAE
GAN
-Generate high
quality data
-Hard to train
-Have more
diversity
How to compare generative models ?

VAE vs DPM
12
VAE
Low dimensional
representation of the input.
DPM
Fixed
encoder
High dimensional
representation of the input.
Z
Decoder

DPM - Landscape
13
Source : https://github.com/bentoml/stable-diffusion-bentoml
Dhariwal & Nichol, 2021
Source : Dall-E 2

DDPM - Principle - 1
14
After the training the Diffusion Model will generate images from Gaussian noise:

DDPM - Principle - 2
15
There are three processes that characterize Diffusion Models:

1.Forward Diffusion Process
2.Reverse Diffusion Process
3.Sampling Process

DDPM - Principle - 3
16
Forward Diffusion Process
This process will add noise to any image gradually

0 ≤ t ≤ T; T is a hyperparameter

DDPM - Principle - 4
17
Forward Diffusion Process
Examples of images at different times t

Here we choose T=1000, but it can be different values (it’s an hyperparameter)

18
Reverse Diffusion Process
We train a model to predict x
t-1
from x
t

x
0
is any image from the dataset
DDPM - Principle - 5
a bit less
noisy than x
t
a bit more
noisy than x
t-1

19
Reverse Diffusion Process
The same model must predict every x
t-1
from x
t
DDPM - Principle - 6

20
Sampling Process
From a random noise we can generate an image

DDPM - Principle - 7

DDPM - Forward Diffusion - 1
21

DDPM - Forward Diffusion - 2
22

DDPM - Forward Diffusion - 3
23

DDPM - Forward Diffusion - 4
24

DDPM - Forward Diffusion - 5
25

DDPM - Forward Diffusion - 6
26
So we can sample a noised image at any time step directly from original image

Question break #1
27

DDPM - Reverse Diffusion - 1
28
We predict only the mean, we know the rest.

DDPM - Reverse Diffusion - 2
29
We can predict x
t-1
by predicting z
t

DDPM - Reverse Diffusion - 3
30
A little bit of explanation (a tiny bit):
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

DDPM - Reverse Diffusion - 4
31

DDPM - Reverse Diffusion - 5
32

Question break #2
33

DDPM - Training - 0
34 https://arxiv.org/abs/2006.11239

35
Dataset
DDPM - Training - 1
x
0

36
DDPM - Training - 3
Uniform
distribution

Between 1 and T
t = 50
x
0

37
DDPM - Training - 4
Gaussian
distribution

Same shape than x
0

z
t
( = ϵ )
x
0
t = 50

38
DDPM - Training - 5
x
0
t = 50
z
t

39
DDPM - Training - 5
x
t
t = 50
z
θ
( = ϵ
θ
)
x
0
t = 50
z
t
x
t

40
DDPM - Training - 6
z
θ

Loss
x
0
t = 50
z
t
z
θ
z
t

2
x
t

41
DDPM - Training - 7
Loss
Backward
∇
θ
Weight update

42
DDPM - Training - 8
and repeat !

It was just 1 iteration.

43
DDPM - Sampling - 0
https://arxiv.org/abs/2006.11239

44
DDPM - Sampling - 1
Gaussian
distribution

Same shape than training
dataset images
x
T

45
DDPM - Sampling - 2
x
T
T
z
θ
(x
T
,T) ≈ z
T

Reminder:

46
DDPM - Sampling - 3
Gaussian
distribution

Same shape than training
dataset images
z (noise)

47
DDPM - Sampling - 4

48
DDPM - Sampling - 5
and repeat !

Don’t generate x
T
, replace it by x
T-1
and T by T-1… and do it again T time.

Question break #3
49

DDPM
50
DDPM vs VAE vs GAN
VAE
GAN
-Generate high
quality data
-Hard to train
-Have more
diversity
-Long sampling
process

51
DDPM improvements
Improving the log-likelihood
metrics (Improved DDPM, 2021)
Improving image synthesis
Dhariwal & Nichol, 2021
Faster sampling
(Denoising Diffusion Implicit Model,
2021 / Latent Diffusion Model, 2021)

Beta scheduling - Cosine scheduling (IDDPM, 2021)
52
0.02 0.0001
1000
Almost pure noiseStrong noising

Variance learning (IDDPM, 2021)
53
Limitations :
“(...) learning reverse
process variances
(...) leads to
unstable training and
poorer sample
quality compared to
fixed variance.”
(DDPM, 2020)
0.001
DDPM
IDDPM
the early stage of diffusion are very important

54
Diffusion & reverse process (DDIM, 2021)
Mr. Gaussian
Mr. Data
Limitations :
“For example, it
takes around 20
hours to sample
50k images of size
32 x 32 from a
DDPM, but less
than a minute to do
so from a GAN on a
Nvidia 2080 Ti
GPU.” (DDIM,
2021)

55
Extension of DDPM (DDIM, 2021)
Generalisation to a bigger class of
inverse process (non-Markovian)
Important :
Same network and
training as a DDPM

56
Generation process (DDIM, 2021)
DDPMDDIM
can be computed !!!

57
Finding a better inverse process (DDIM, 2021)
=> DDIM
FID

58
Noise interpolation (DDIM, 2021)
+

Question break #4
59

Latent diffusion : Concept of latent space (reminder)
60
Latent space of MNIST database for AE and VAE
Source https://thilospinner.com/towards-an-interpretable-latent-space/
●Similar objects are close to one another in the latent space
●Usually lower dimension than original data (therefore does compression as well)
●Usually impossible to visualize by a human

Latent diffusion : Concept of latent space (example)
61
Example : Word embedding
Actor - Pierre Curie + Marie Curie ≈ Actress
Turn sparse data (for instance words) into
vectors

DPM
62
+noise
+diffusion

Latent diffusion model
63
Latent space
Encoder
Decoder
Image space
+noise +diffusion

Conditional diffusion
A picture of GENCI’s supercomputer Jean Zay. On the storage bays, a picture
of the eponymous minister, with a background representing a simulation of a
turbulent flow of liquid sodium, and a quote from Jean Zay’s memoirs.
Alongside the bays, the cooling equipment with the logo of the manufacturer
and the owner of the supercomputer.
How to control the output of a diffusion model
and make sure it generates what we want ?
64

Conditional diffusion : text → image
A picture of GENCI’s supercomputer Jean Zay. On
the storage bays, a picture of the eponymous
minister, with a background representing a
simulation of a turbulent flow of liquid sodium, and
a quote from Jean Zay’s memoirs. Alongside the
bays, the cooling equipment with the logo of the
manufacturer and the owner of the
supercomputer.
Latent space
65

Conditional diffusion: text → image
66

Conditional diffusion : cross-attention
Stable Diffusion uses cross-attention to make the denoising process consistent with the
provided sentence embedding
67
Source Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Conditional diffusion : other method
Spatial Self
Attention
Dense layer
Standard Unet layer
(conv, maxpool,
upsample, …)
68

➔Inpainting

➔Super-resolution

➔Outpainting
Other tasks
Diffusion models can solve a variety of tasks. We already know about image generation, as well as
conditional image generation (for instance with a short paragraph describing the picture)

Other tasks:

69Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

➔Thin mask

➔Right side mask
for halving the
image

➔Every second row
of pixels for
alternating lines
Inpainting through masking

➔Wide mask

➔Outer mask for
expanding the image

➔Every second pixel in
both directions for
super-resolution

We can solve many of these tasks through the usage of a mask
70Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting through masking
71
Step t :
Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting through masking: step t
72
+noise
+diffusion
New artifacts added (in this coarse example, our diffusion
model drew a sun), so we force the known background again!
x
t
x
0
x
t-1

Inpainting through masking: step t
73
+noise
+diffusion
x
t
x
0
x
t-1
This operation does not take into
account the generated information

Inpainting deharmonization
74
Picture deharmonization: the generated image has a satisfying texture but is wrong semantically.
The suggested solution is to resample.
Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting resampling
75
x
t
x
0
noising
diffusion
× mask
× (1 - mask)
+ x
t-1
resampling

This loop is performed several times (a hyperparameter) before moving on the next step

Inpainting resampling
76
n is the number of times the resampling loop was performed
Disadvantage: the number of required denoising steps is much higher

Advantage: it produces much more satisfying results
Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

77
Sources
Papers:
-Deep Unsupervised Learning using Nonequilibrium Thermodynamics (DPM) (https://arxiv.org/abs/1503.03585)
-Denoising Diffusion Probabilistic Models (DDPM) (https://arxiv.org/abs/2006.11239)
-Improved Denoising Diffusion Probabilistic Models (IDDPM) (https://arxiv.org/abs/2102.09672)
-Denoising Diffusion Implicit Models (DDIM) (https://arxiv.org/abs/2010.02502)
-Diffusion Models Beat GANs on Image Synthesis (https://arxiv.org/abs/2105.05233)
-High-Resolution Image Synthesis with Latent Diffusion Models (LDM) (https://arxiv.org/abs/2112.10752)
-Repaint: Inpainting using denoising diffusion probabilistic models (https://arxiv.org/pdf/2201.09865)
-Diffusion Models in Vision: A Survey (https://arxiv.org/abs/2209.04747)
-Diffusion Models: A Comprehensive Survey of Methods and Applications(https://arxiv.org/abs/2209.00796)

Other ressources:
-Lilian Weng’s article (https://lilianweng.github.io/posts/2021-07-11-diffusion-models)
-Yang Song’s article (https://yang-song.net/blog/2021/score)
-Outlier video (https://www.youtube.com/watch?v=HoKDTa5jHvg)

Question break #5 & Practice
78

Épisode 15 :
AI, droit, société et éthique

● Interprétabilité, reproductibilité, biais
● Cadre légal
● Privacy
● Session interactive

Durée : 2h
Next, on Fidle: Jeudi 23 mars, 14h00

To be continued...
Next on Fidle :
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
https://creativecommons.org/licenses/by-nc-nd/4.0/
Séquence 15 :
AI, droit, société et éthique
Jeudi 23 mars,
https://ﬁdle.cnrs.fr
Contact@ﬁdle.cnrs.fr
https://ﬁdle.cnrs.fr/youtube
Merci !

Introduction to Diffusion Models on deep learning

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Introduction to Diffusion Models on deep learning

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77