Introduction to Generative Adversarial Network

vaidehimadaan041 66 views 39 slides Sep 12, 2024
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Generative Adversarial Network


Slide Content

Generative Adversarial Networks
Aaron Mishkin
UBC MLRG 2018W2
1

Generative Adversial Networks
\Two imaginary celebrities that were dreamed up by a random
number generator."
https://research.nvidia.com/publication/2017-10Progressive-Growing-of
2

Why care about GANs?
Why to spend your limited time learning about GANs:
GANs are achieving state-of-the-art results in a large variety
of image generation tasks.
There's been a veritable explosion in GAN publications over
the last few years { many people are very excited!
GANs are stimulating new theoretical interest in min-max
optimization problems and \smooth games".
3

Why care about GANs: Hyper-realistic Image Generation
StyleGAN: image generatation with hierarchical style transfer [3].
https://arxiv.org/abs/1812.04948
4

Why care about GANs: Conditionally Generative Models
Conditional GANs: high-resolution image synthesis via semantic
labeling [8].
Input: Segmentation Output: Synthesized Image
https://research.nvidia.com/publication/2017-12High-Resolution-Image-Synthesis
5

Why care about GANs: Image Super Resolution
SRGAN: Photo-realistic super-resolution [4].
Bicubic Interp. SRGAN Original Image
https://arxiv.org/abs/1609.04802 6

Why care about GANs: Publications
Approximately 500 papers GAN papers as of September 2018!
See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers.Image Credit: https://github.com/bgavran.
7

Generative Models

Generative Modeling
Generative Modelsestimate the probabilistic process that
generated a set of observationsD.
=

x
i
;y
i

n
i=1
: supervised generative models learn the
joint distributionp(x
i
;y
i
), often to computep(y
i
jx
i
).
=

x
i

n
i=1
: unsupervised generative models learn the
distribution ofDfor clustering, sampling, etc. We can:
directly estimatep(x
i
),
introducing latentsy
i
and estimatep(x
i
;y
i
).
8

Generative Modeling: Unsupervised Parametric Approaches
Direct Estimation:Choose a parameterized familyp(xj)
and learnby maximizing the log-likelihood


=arg max
n
X
i=1
logp(x
i
j):
Latent Variable Models:Dene a joint distribution
p(x;yj) and learnby maximizing the log-marginal
likelihood


=arg max
n
X
i=1
log
Z
z
i
p(x
i
;z
i
j)dz:
Both approaches require thatp(xj) is easy to evaluate.
9

Generative Modeling: Models for (Very) Complex Data
How can we learn such models for very complex data?
https://www.researchgate.net/gure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10-
image-categories-Theg1322148855
10

Generative Modeling: Normalizing Flows and VAEs
Design parameterized densities with huge capacity!
Normalizing ows:sequence of non-linear transformations to
a simple distributionpz(z)
p(xj0:k) =pz(z) wherez=f
1
k
f
1
1
f
1
0
(x):
f
1
j
must be invertible with tractable log-det. Jacobians.
VAEs:latent-variable models where inference networks
specify parameters
p(x;yj) =p(xjf(y))py(y):
The marginal likelihood is maximized via the ELBO.
11

GANs

GANs: Density-Free Models
Generative Adversial Networks (GANs)instead use an
unrestricted generatorGg
(z) such that
p(xjg) =pz(fzg) wherefzg=G
1
g
(x):
Problem:the inverse image ofGg
(z) may be huge!
Problem:it's likely intractable to preserve volume through
G(z;g).
So, we can't evaluatep(xjg) and we can't learngby maximum
likelihood.
12

GANs: Discriminators
GANs learn by comparing model samples with examples fromD.
Sampling from the generator is easy:
^x=Gg
(^z);where^zpz(z):
Given a sample^x, a discriminator tries to distinguish it from
true examples:
D(x) = Pr (xpdata):
The discriminator \supervises" the generator network.
13

GANs: Generator + Descriminator
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-
training-upc-2016
14

GANs: Goodfellow et al. (2014)
Letz2R
m
andpz(z) be a simple base distribution.
The generatorGg
(z) :R
m
!~Dis a deep neural network.
~Dis the manifold of generated examples.
The discriminatorDd
(x) :D [~D !(0;1) is also a deep
neural network.
https://arxiv.org/abs/1511.06434
15

GANs: Saddle-Point Optimization
Saddle-Point Optimization:learnGg
(z) andDd
(x) jointly via
the objectiveV(d; g):
min
g
max
d
Epdata
[logDd
(x)]
| {z }
likelihood of true data
+E
pz(z)

log

1Dd
(Gg
(z))

| {z }
likelihood of generated data
16

GANs: Optimal Discriminators
Claim:GivenGg
dening an implicit distributionpg=p(xjg),
the optimal descriminator is
D

(x) =
pdata(x)
pdata(x) +pg(x)
:
Proof Sketch:
V(d; g) =
Z
D
pdata(x) logD(x)dx+
Z
~D
p(z) log(1D(Gg
(z)))dz
=
Z
D[~D
pdata(x) logD(x) +pg(x) log(1D(x))dx
Maximizing the integrand for allxis sucient and gives the result
(see bonus slides).
Previous Slide: https://commons.wikimedia.org/wiki/File:Saddlepoint.svg
17

GANs: Jensen-Shannon Divergence and Optimal Generators
Given an optimal discriminatorD

(x), the generator objective is
C(g) =Epdata

logD

d
(x)

+E
pg(x)

log

1D

d
(x)

=Epdata

log
pdata(x)
pdata(x) +pg(x)

+E
pg(x)

log
pg(x)
pdata(x) +pg(x)

/
1
2
KL

pdata








(pdata+pg)
2

+
1
2
KL

pg








(pdata+pg)
2

| {z }
Jensen-Shannon Divergence
C(g) achives its global minimum atpg=pdatagiven an optimal
discriminator!
18

GANs: Learning Generators and Discriminators
Putting these results to use in practice:
High-capacity discriminatorsDd
approximate the
Jensen-Shannon divergence when close to global maximum.
Dd
is a \dierentiable program".
We can useDd
to learnGg
with our favourite gradient
descent method.
https://arxiv.org/abs/1511.06434
19

GANs: Training Procedure
fori= 1: : :Ndo
fork= 1: : :Kdo
Sample noise samplesfz
1
; : : : ;z
m
g pz(z)
Sample examplesfx
1
; : : : ;x
m
gfrompdata(x).
Update the discriminatorDd
:
d=ddrd
1
m
m
X
i=1

logD

x
i

+ log

1D

G

z
i

:
end for
Sample noise samplesfz
1
; : : : ;z
m
g pz(z).
Update the generatorGg
:
g=ggrg
1
m
m
X
i=1
log

1D

G

z
i

:
end for
20

Problems (c. 2016)

Problems with GANs
Vanishing gradients:the discriminator becomes "too good"
and the generator gradient vanishes.
Non-Convergence:the generator and discriminator oscillate
without reaching an equilibrium.
Mode Collapse:the generator distribution collapses to a
small set of examples.
Mode Dropping:the generator distribution doesn't fully
cover the data distribution.
21

Problems: Vanishing Gradients
Theminimaxobjective saturates whenDd
is close to perfect:
V(d; g) =Epdata
[logDd
(x)]+E
pz(z)

log

1Dd
(Gg
(z))

:
Anon-saturating heuristicobjective for the generator is
J(Gg
) =E
pz(z)

log

Dd
(Gg
(z))

:
https://arxiv.org/abs/1701.00160
22

Problems: Addressing Vanishing Gradients
Solutions:
Change Objectives:use the non-saturating heuristic
objective, maximum-likelihood cost, etc.
Limit Discriminator:restrict the capacity of the
discriminator.
Schedule Learning:try to balance trainingDd
andGg
.
23

Problems: Non-Convergence
Simultaneous gradient descent is not guaranteed to converge for
minimax objectives.
Goodfellow et al. only showed convergence when updates are
made in the function space [2].
The parameterization ofDd
andGg
results in highly
non-convex objective.
In practice, training tends to oscillate { updates \undo" each
other.
24

Problems: Addressing Non-Convergence
Solutions:Lots and lots of hacks!
https://github.com/soumith/ganhacks
25

Problems: Mode Collapse and Mode Dropping
One Explanation:SGD may optimize the max-min objective
max
d
min
g
Epdata
[logDd
(x)] +E
pz(z)

log

1Dd
(Gg
(z))

Intuition:the generator maps allzvalues to the^xthat is mostly
likely to fool the discriminator.
https://arxiv.org/abs/1701.00160
26

A Possible Solution

A Possible Solution: Alternative Divergences
There are a large variety of divergence measures for distributions:
f-Divergences:(e.g. Jensen-Shannon, Kullback-Leibler)
Df(PjjQ) =
Z

q(x)f(
p(x)
q(x)
)dx
GANs [2], f-GANs [7], and more.
Integral Probability Metrics:(e.g. Earth Movers Distance,
Maximum Mean Discrepancy)
F(PjjQ) = sup
f2F




Z
fdP
Z
fdQ




Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and
more.
27

A Possible Solution: Wasserstein GANs
Wasserstein GANs:Strong theory and excellent empirical results.
\In no experiment did we see evidence of mode collapse for
the WGAN algorithm." [1]
https://arxiv.org/abs/1701.07875
28

Summary

Summary
Recap:
GANs are a class of density-free generative models with
(mostly) unrestricted generator functions.
Introducing adversial discriminator networks allows GANs to
learn by minimizing the Jensen-Shannon divergence.
Concurrently learning the generator and discriminator is
challenging due to
Vanishing Gradients,
Non-convergence due to oscilliation
Mode collapse and mode dropping.
A variety of alternative objective functions are being proposed.
29

Agknowledgements and References
There are lots of excellent references on GANs:
Sebastian Nowozin's
NIPS 2016
A
30

Bonus: Optimal Discriminators Cont.
The integrand
h(D(x)) =pdata(x) logD(x) +pg(x) log(1D(x))
is concave forD(x)2(0;1). We take the derivative and compute
a stationary point in the domain:
@h(D(x))
@D(x)
=
pdata(x)
D(x)

pg(x)
1D(x)
= 0
)D(x) =
pdata(x)
pdata(x) +pg(x)
:
This minimizes the integrand over the domain of the discriminator,
completing the proof.
31

References
Martin Arjovsky, Soumith Chintala, and Leon Bottou.
Wasserstein gan.
arXiv preprint arXiv:1701.07875, 2017.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.
Generative adversarial networks. arxiv e-prints.
arXiv preprint arXiv:1406.2661, 2014.
Tero Karras, Samuli Laine, and Timo Aila.
A style-based generator architecture for generative adversarial
networks.
arXiv preprint arXiv:1812.04948, 2018.
32

References
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, et al.
Photo-realistic single image super-resolution using a generative
adversarial network.
InProceedings of the IEEE conference on computer vision and pattern
recognition, pages 4681{4690, 2017.
Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng.
Sobolev gan.
arXiv preprint arXiv:1711.04894, 2017.
Youssef Mroueh and Tom Sercu.
Fisher gan.
InAdvances in Neural Information Processing Systems, pages 2513{2523,
2017.
33

References
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka.
f-gan: Training generative neural samplers using variational
divergence minimization.
InAdvances in neural information processing systems, pages 271{279,
2016.
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz,
and Bryan Catanzaro.
High-resolution image synthesis and semantic manipulation with
conditional gans.
InProceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 8798{8807, 2018.
34
Tags