Introduction to Diffusion Models on deep learning

angelo119154 148 views 80 slides Sep 29, 2024
Slide 1
Slide 1 of 80
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80

About This Presentation

Introduction to Diffusion Models on deep learning.


Slide Content

1

2

3

4
Program
JDLS
12 mai

5
1
2
3
5
6
Introduction
➔ Reminder on generative model
➔ Diffusion Model vs VAE
Denoising Diffusion Probabilistic Models
➔ Principle
➔ Forward and Reverse Diffusion
➔ Training and Sampling
Example: Fashion MNIST
➔ Generation of Fashion MNIST
DDPM improvements
➔ Beta scheduling and Variance learning
➔ Fast sampling
➔ Latent diffusion
DDPM applications
➔ Text-to-image
➔ Other task : inpainting / outpainting / super-resolution

Rappel - VAE - 1
Source https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
6

7
Rappel - VAE - 2
VAE COST FUNCTION
Sampling process
Training process
metric
Decoder
-Gaussian
-Mixture of gaussian

Rappel - GAN - 1
2 networks in opposition :
-Generator
-Discriminator
Ideal solution :
-Generator ~ P(x|z)
-Discriminator = ½

Adding supervision concept in an
unsupervised task !
8
Data to train a
classification
network
Source
https://www.kdnuggets.com/2017/01/generative-adversari
al-networks-hot-topic-machine-learning.html

9
Rappel - GAN - 2
Sampling process
Training process
Generator
Train
discriminator
Train
generator
GAN COST FUNCTION
-Uniform

Rappel - GAN - 3 GAN convergence problems !
Vanishing gradient due to
discriminant being too
perfect, generator can’t train
anymore
Mode collapse due to
generator learning only some
good examples instead of the
whole data distribution
10
True Data
Generated Data
Source
https://lilianweng.github.io/posts/2017-08-20-gan/
No convergence due to the
nature of the problem
(MinMax)
True data
Generated data

11
VAE vs GAN
VAE
GAN
-Generate high
quality data
-Hard to train
-Have more
diversity
How to compare generative models ?

VAE vs DPM
12
VAE
Low dimensional
representation of the input.
DPM
Fixed
encoder
High dimensional
representation of the input.
Z
Decoder

DPM - Landscape
13
Source : https://github.com/bentoml/stable-diffusion-bentoml
Dhariwal & Nichol, 2021
Source : Dall-E 2

DDPM - Principle - 1
14
After the training the Diffusion Model will generate images from Gaussian noise:

DDPM - Principle - 2
15
There are three processes that characterize Diffusion Models:

1.Forward Diffusion Process
2.Reverse Diffusion Process
3.Sampling Process

DDPM - Principle - 3
16
Forward Diffusion Process
This process will add noise to any image gradually

0 ≤ t ≤ T; T is a hyperparameter

DDPM - Principle - 4
17
Forward Diffusion Process
Examples of images at different times t

Here we choose T=1000, but it can be different values (it’s an hyperparameter)

18
Reverse Diffusion Process
We train a model to predict x
t-1
from x
t


x
0
is any image from the dataset
DDPM - Principle - 5
a bit less
noisy than x
t
a bit more
noisy than x
t-1

19
Reverse Diffusion Process
The same model must predict every x
t-1
from x
t
DDPM - Principle - 6

20
Sampling Process
From a random noise we can generate an image

DDPM - Principle - 7

DDPM - Forward Diffusion - 1
21

DDPM - Forward Diffusion - 2
22

DDPM - Forward Diffusion - 3
23

DDPM - Forward Diffusion - 4
24

DDPM - Forward Diffusion - 5
25

DDPM - Forward Diffusion - 6
26
So we can sample a noised image at any time step directly from original image

Question break #1
27

DDPM - Reverse Diffusion - 1
28
We predict only the mean, we know the rest.

DDPM - Reverse Diffusion - 2
29
We can predict x
t-1
by predicting z
t

DDPM - Reverse Diffusion - 3
30
A little bit of explanation (a tiny bit):
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

DDPM - Reverse Diffusion - 4
31

DDPM - Reverse Diffusion - 5
32

Question break #2
33

DDPM - Training - 0
34 https://arxiv.org/abs/2006.11239

35
Dataset
DDPM - Training - 1
x
0

36
DDPM - Training - 3
Uniform
distribution

Between 1 and T
t = 50
x
0

37
DDPM - Training - 4
Gaussian
distribution

Same shape than x
0

z
t
( = ϵ )
x
0
t = 50

38
DDPM - Training - 5
x
0
t = 50
z
t

39
DDPM - Training - 5
x
t
t = 50
z
θ
( = ϵ
θ
)
x
0
t = 50
z
t
x
t

40
DDPM - Training - 6
z
θ

Loss
x
0
t = 50
z
t
z
θ
z
t

2
x
t

41
DDPM - Training - 7
Loss
Backward

θ
Weight update

42
DDPM - Training - 8
and repeat !

It was just 1 iteration.

43
DDPM - Sampling - 0
https://arxiv.org/abs/2006.11239

44
DDPM - Sampling - 1
Gaussian
distribution

Same shape than training
dataset images
x
T

45
DDPM - Sampling - 2
x
T
T
z
θ
(x
T
,T) ≈ z
T

Reminder:

46
DDPM - Sampling - 3
Gaussian
distribution

Same shape than training
dataset images
z (noise)

47
DDPM - Sampling - 4

48
DDPM - Sampling - 5
and repeat !

Don’t generate x
T
, replace it by x
T-1
and T by T-1… and do it again T time.

Question break #3
49

DDPM
50
DDPM vs VAE vs GAN
VAE
GAN
-Generate high
quality data
-Hard to train
-Have more
diversity
-Long sampling
process

51
DDPM improvements
Improving the log-likelihood
metrics (Improved DDPM, 2021)
Improving image synthesis
Dhariwal & Nichol, 2021
Faster sampling
(Denoising Diffusion Implicit Model,
2021 / Latent Diffusion Model, 2021)

Beta scheduling - Cosine scheduling (IDDPM, 2021)
52
0.02 0.0001
1000
Almost pure noiseStrong noising

Variance learning (IDDPM, 2021)
53
Limitations :
“(...) learning reverse
process variances
(...) leads to
unstable training and
poorer sample
quality compared to
fixed variance.”
(DDPM, 2020)
0.001
DDPM
IDDPM
the early stage of diffusion are very important

54
Diffusion & reverse process (DDIM, 2021)
Mr. Gaussian
Mr. Data
Limitations :
“For example, it
takes around 20
hours to sample
50k images of size
32 x 32 from a
DDPM, but less
than a minute to do
so from a GAN on a
Nvidia 2080 Ti
GPU.” (DDIM,
2021)

55
Extension of DDPM (DDIM, 2021)
Generalisation to a bigger class of
inverse process (non-Markovian)
Important :
Same network and
training as a DDPM

56
Generation process (DDIM, 2021)
DDPMDDIM
can be computed !!!

57
Finding a better inverse process (DDIM, 2021)
=> DDIM
FID

58
Noise interpolation (DDIM, 2021)
+

Question break #4
59

Latent diffusion : Concept of latent space (reminder)
60
Latent space of MNIST database for AE and VAE
Source https://thilospinner.com/towards-an-interpretable-latent-space/
●Similar objects are close to one another in the latent space
●Usually lower dimension than original data (therefore does compression as well)
●Usually impossible to visualize by a human

Latent diffusion : Concept of latent space (example)
61
Example : Word embedding
Actor - Pierre Curie + Marie Curie ≈ Actress
Turn sparse data (for instance words) into
vectors

DPM
62
+noise
+diffusion

Latent diffusion model
63
Latent space
Encoder
Decoder
Image space
+noise +diffusion

Conditional diffusion
A picture of GENCI’s supercomputer Jean Zay. On the storage bays, a picture
of the eponymous minister, with a background representing a simulation of a
turbulent flow of liquid sodium, and a quote from Jean Zay’s memoirs.
Alongside the bays, the cooling equipment with the logo of the manufacturer
and the owner of the supercomputer.
How to control the output of a diffusion model
and make sure it generates what we want ?
64

Conditional diffusion : text → image
A picture of GENCI’s supercomputer Jean Zay. On
the storage bays, a picture of the eponymous
minister, with a background representing a
simulation of a turbulent flow of liquid sodium, and
a quote from Jean Zay’s memoirs. Alongside the
bays, the cooling equipment with the logo of the
manufacturer and the owner of the
supercomputer.
Latent space
65

Conditional diffusion: text → image
66

Conditional diffusion : cross-attention
Stable Diffusion uses cross-attention to make the denoising process consistent with the
provided sentence embedding
67
Source Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Conditional diffusion : other method
Spatial Self
Attention
Dense layer
Standard Unet layer
(conv, maxpool,
upsample, …)
68

➔Inpainting





➔Super-resolution




➔Outpainting
Other tasks
Diffusion models can solve a variety of tasks. We already know about image generation, as well as
conditional image generation (for instance with a short paragraph describing the picture)

Other tasks:

69Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

➔Thin mask



➔Right side mask
for halving the
image



➔Every second row
of pixels for
alternating lines
Inpainting through masking

➔Wide mask



➔Outer mask for
expanding the image



➔Every second pixel in
both directions for
super-resolution

We can solve many of these tasks through the usage of a mask
70Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting through masking
71
Step t :
Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting through masking: step t
72
+noise
+diffusion
New artifacts added (in this coarse example, our diffusion
model drew a sun), so we force the known background again!
x
t
x
0
x
t-1

Inpainting through masking: step t
73
+noise
+diffusion
x
t
x
0
x
t-1
This operation does not take into
account the generated information

Inpainting deharmonization
74
Picture deharmonization: the generated image has a satisfying texture but is wrong semantically.
The suggested solution is to resample.
Source Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Inpainting resampling
75
x
t
x
0
noising
diffusion
× mask
× (1 - mask)
+ x
t-1
resampling


This loop is performed several times (a hyperparameter) before moving on the next step

Inpainting resampling
76
n is the number of times the resampling loop was performed
Disadvantage: the number of required denoising steps is much higher

Advantage: it produces much more satisfying results
Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

77
Sources
Papers:
-Deep Unsupervised Learning using Nonequilibrium Thermodynamics (DPM) (https://arxiv.org/abs/1503.03585)
-Denoising Diffusion Probabilistic Models (DDPM) (https://arxiv.org/abs/2006.11239)
-Improved Denoising Diffusion Probabilistic Models (IDDPM) (https://arxiv.org/abs/2102.09672)
-Denoising Diffusion Implicit Models (DDIM) (https://arxiv.org/abs/2010.02502)
-Diffusion Models Beat GANs on Image Synthesis (https://arxiv.org/abs/2105.05233)
-High-Resolution Image Synthesis with Latent Diffusion Models (LDM) (https://arxiv.org/abs/2112.10752)
-Repaint: Inpainting using denoising diffusion probabilistic models (https://arxiv.org/pdf/2201.09865)
-Diffusion Models in Vision: A Survey (https://arxiv.org/abs/2209.04747)
-Diffusion Models: A Comprehensive Survey of Methods and Applications(https://arxiv.org/abs/2209.00796)



Other ressources:
-Lilian Weng’s article (https://lilianweng.github.io/posts/2021-07-11-diffusion-models)
-Yang Song’s article (https://yang-song.net/blog/2021/score)
-Outlier video (https://www.youtube.com/watch?v=HoKDTa5jHvg)

Question break #5 & Practice
78

Épisode 15 :
AI, droit, société et éthique





● Interprétabilité, reproductibilité, biais
● Cadre légal
● Privacy
● Session interactive

Durée : 2h
Next, on Fidle: Jeudi 23 mars, 14h00

To be continued...
Next on Fidle :
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
https://creativecommons.org/licenses/by-nc-nd/4.0/
Séquence 15 :
AI, droit, société et éthique
Jeudi 23 mars,
https://fidle.cnrs.fr
Contact@fidle.cnrs.fr
https://fidle.cnrs.fr/youtube
Merci !
Tags