Introduction to Generative Adversarial Network

Generative Adversarial Networks
Aaron Mishkin
UBC MLRG 2018W2
1

Generative Adversial Networks
\Two imaginary celebrities that were dreamed up by a random
number generator."
https://research.nvidia.com/publication/2017-10Progressive-Growing-of
2

Why care about GANs?
Why to spend your limited time learning about GANs:
GANs are achieving state-of-the-art results in a large variety
of image generation tasks.
There's been a veritable explosion in GAN publications over
the last few years { many people are very excited!
GANs are stimulating new theoretical interest in min-max
optimization problems and \smooth games".
3

Why care about GANs: Hyper-realistic Image Generation
StyleGAN: image generatation with hierarchical style transfer [3].
https://arxiv.org/abs/1812.04948
4

Why care about GANs: Conditionally Generative Models
Conditional GANs: high-resolution image synthesis via semantic
labeling [8].
Input: Segmentation Output: Synthesized Image
https://research.nvidia.com/publication/2017-12High-Resolution-Image-Synthesis
5

Why care about GANs: Image Super Resolution
SRGAN: Photo-realistic super-resolution [4].
Bicubic Interp. SRGAN Original Image
https://arxiv.org/abs/1609.04802 6

Why care about GANs: Publications
Approximately 500 papers GAN papers as of September 2018!
See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers.Image Credit: https://github.com/bgavran.
7

Generative Models

Generative Modeling
Generative Modelsestimate the probabilistic process that
generated a set of observationsD.
=

x
i
;y
i

n
i=1
: supervised generative models learn the
joint distributionp(x
i
;y
i
), often to computep(y
i
jx
i
).
=

x
i

n
i=1
: unsupervised generative models learn the
distribution ofDfor clustering, sampling, etc. We can:
directly estimatep(x
i
),
introducing latentsy
i
and estimatep(x
i
;y
i
).
8

Generative Modeling: Unsupervised Parametric Approaches
Direct Estimation:Choose a parameterized familyp(xj)
and learnby maximizing the log-likelihood

=arg max
n
X
i=1
logp(x
i
j):
Latent Variable Models:Dene a joint distribution
p(x;yj) and learnby maximizing the log-marginal
likelihood

=arg max
n
X
i=1
log
Z
z
i
p(x
i
;z
i
j)dz:
Both approaches require thatp(xj) is easy to evaluate.
9

Generative Modeling: Models for (Very) Complex Data
How can we learn such models for very complex data?
https://www.researchgate.net/gure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10-
image-categories-Theg1322148855
10

Generative Modeling: Normalizing Flows and VAEs
Design parameterized densities with huge capacity!
Normalizing ows:sequence of non-linear transformations to
a simple distributionpz(z)
p(xj0:k) =pz(z) wherez=f
1
k
f
1
1
f
1
0
(x):
f
1
j
must be invertible with tractable log-det. Jacobians.
VAEs:latent-variable models where inference networks
specify parameters
p(x;yj) =p(xjf(y))py(y):
The marginal likelihood is maximized via the ELBO.
11

GANs

GANs: Density-Free Models
Generative Adversial Networks (GANs)instead use an
unrestricted generatorGg
(z) such that
p(xjg) =pz(fzg) wherefzg=G
1
g
(x):
Problem:the inverse image ofGg
(z) may be huge!
Problem:it's likely intractable to preserve volume through
G(z;g).
So, we can't evaluatep(xjg) and we can't learngby maximum
likelihood.
12

GANs: Discriminators
GANs learn by comparing model samples with examples fromD.
Sampling from the generator is easy:
^x=Gg
(^z);where^zpz(z):
Given a sample^x, a discriminator tries to distinguish it from
true examples:
D(x) = Pr (xpdata):
The discriminator \supervises" the generator network.
13

GANs: Generator + Descriminator
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-
training-upc-2016
14

GANs: Goodfellow et al. (2014)
Letz2R
m
andpz(z) be a simple base distribution.
The generatorGg
(z) :R
m
!~Dis a deep neural network.
~Dis the manifold of generated examples.
The discriminatorDd
(x) :D [~D !(0;1) is also a deep
neural network.
https://arxiv.org/abs/1511.06434
15

GANs: Saddle-Point Optimization
Saddle-Point Optimization:learnGg
(z) andDd
(x) jointly via
the objectiveV(d; g):
min
g
max
d
Epdata
[logDd
(x)]
| {z }
likelihood of true data
+E
pz(z)

log

1Dd
(Gg
(z))

| {z }
likelihood of generated data
16

GANs: Optimal Discriminators
Claim:GivenGg
dening an implicit distributionpg=p(xjg),
the optimal descriminator is
D

(x) =
pdata(x)
pdata(x) +pg(x)
:
Proof Sketch:
V(d; g) =
Z
D
pdata(x) logD(x)dx+
Z
~D
p(z) log(1D(Gg
(z)))dz
=
Z
D[~D
pdata(x) logD(x) +pg(x) log(1D(x))dx
Maximizing the integrand for allxis sucient and gives the result
(see bonus slides).
Previous Slide: https://commons.wikimedia.org/wiki/File:Saddlepoint.svg
17

GANs: Jensen-Shannon Divergence and Optimal Generators
Given an optimal discriminatorD

(x), the generator objective is
C(g) =Epdata

logD

d
(x)

+E
pg(x)

log

1D

d
(x)

=Epdata

log
pdata(x)
pdata(x) +pg(x)

+E
pg(x)

log
pg(x)
pdata(x) +pg(x)

/
1
2
KL

pdata

(pdata+pg)
2

+
1
2
KL

pg

(pdata+pg)
2

| {z }
Jensen-Shannon Divergence
C(g) achives its global minimum atpg=pdatagiven an optimal
discriminator!
18

GANs: Learning Generators and Discriminators
Putting these results to use in practice:
High-capacity discriminatorsDd
approximate the
Jensen-Shannon divergence when close to global maximum.
Dd
is a \dierentiable program".
We can useDd
to learnGg
with our favourite gradient
descent method.
https://arxiv.org/abs/1511.06434
19

GANs: Training Procedure
fori= 1: : :Ndo
fork= 1: : :Kdo
Sample noise samplesfz
1
; : : : ;z
m
g pz(z)
Sample examplesfx
1
; : : : ;x
m
gfrompdata(x).
Update the discriminatorDd
:
d=ddrd
1
m
m
X
i=1

logD

x
i

+ log

1D

G

z
i

:
end for
Sample noise samplesfz
1
; : : : ;z
m
g pz(z).
Update the generatorGg
:
g=ggrg
1
m
m
X
i=1
log

1D

G

z
i

:
end for
20

Problems (c. 2016)

Problems with GANs
Vanishing gradients:the discriminator becomes "too good"
and the generator gradient vanishes.
Non-Convergence:the generator and discriminator oscillate
without reaching an equilibrium.
Mode Collapse:the generator distribution collapses to a
small set of examples.
Mode Dropping:the generator distribution doesn't fully
cover the data distribution.
21

Problems: Vanishing Gradients
Theminimaxobjective saturates whenDd
is close to perfect:
V(d; g) =Epdata
[logDd
(x)]+E
pz(z)

log

1Dd
(Gg
(z))

:
Anon-saturating heuristicobjective for the generator is
J(Gg
) =E
pz(z)

log

Dd
(Gg
(z))

:
https://arxiv.org/abs/1701.00160
22

Problems: Addressing Vanishing Gradients
Solutions:
Change Objectives:use the non-saturating heuristic
objective, maximum-likelihood cost, etc.
Limit Discriminator:restrict the capacity of the
discriminator.
Schedule Learning:try to balance trainingDd
andGg
.
23

Problems: Non-Convergence
Simultaneous gradient descent is not guaranteed to converge for
minimax objectives.
Goodfellow et al. only showed convergence when updates are
made in the function space [2].
The parameterization ofDd
andGg
results in highly
non-convex objective.
In practice, training tends to oscillate { updates \undo" each
other.
24

Problems: Addressing Non-Convergence
Solutions:Lots and lots of hacks!
https://github.com/soumith/ganhacks
25

Problems: Mode Collapse and Mode Dropping
One Explanation:SGD may optimize the max-min objective
max
d
min
g
Epdata
[logDd
(x)] +E
pz(z)

log

1Dd
(Gg
(z))

Intuition:the generator maps allzvalues to the^xthat is mostly
likely to fool the discriminator.
https://arxiv.org/abs/1701.00160
26

A Possible Solution

A Possible Solution: Alternative Divergences
There are a large variety of divergence measures for distributions:
f-Divergences:(e.g. Jensen-Shannon, Kullback-Leibler)
Df(PjjQ) =
Z

q(x)f(
p(x)
q(x)
)dx
GANs [2], f-GANs [7], and more.
Integral Probability Metrics:(e.g. Earth Movers Distance,
Maximum Mean Discrepancy)
F(PjjQ) = sup
f2F

Z
fdP
Z
fdQ

Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and
more.
27

A Possible Solution: Wasserstein GANs
Wasserstein GANs:Strong theory and excellent empirical results.
\In no experiment did we see evidence of mode collapse for
the WGAN algorithm." [1]
https://arxiv.org/abs/1701.07875
28

Summary

Summary
Recap:
GANs are a class of density-free generative models with
(mostly) unrestricted generator functions.
Introducing adversial discriminator networks allows GANs to
learn by minimizing the Jensen-Shannon divergence.
Concurrently learning the generator and discriminator is
challenging due to
Vanishing Gradients,
Non-convergence due to oscilliation
Mode collapse and mode dropping.
A variety of alternative objective functions are being proposed.
29

Agknowledgements and References
There are lots of excellent references on GANs:
Sebastian Nowozin's
NIPS 2016
A
30

Bonus: Optimal Discriminators Cont.
The integrand
h(D(x)) =pdata(x) logD(x) +pg(x) log(1D(x))
is concave forD(x)2(0;1). We take the derivative and compute
a stationary point in the domain:
@h(D(x))
@D(x)
=
pdata(x)
D(x)

pg(x)
1D(x)
= 0
)D(x) =
pdata(x)
pdata(x) +pg(x)
:
This minimizes the integrand over the domain of the discriminator,
completing the proof.
31

References
Martin Arjovsky, Soumith Chintala, and Leon Bottou.
Wasserstein gan.
arXiv preprint arXiv:1701.07875, 2017.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.
Generative adversarial networks. arxiv e-prints.
arXiv preprint arXiv:1406.2661, 2014.
Tero Karras, Samuli Laine, and Timo Aila.
A style-based generator architecture for generative adversarial
networks.
arXiv preprint arXiv:1812.04948, 2018.
32

References
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, et al.
Photo-realistic single image super-resolution using a generative
adversarial network.
InProceedings of the IEEE conference on computer vision and pattern
recognition, pages 4681{4690, 2017.
Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng.
Sobolev gan.
arXiv preprint arXiv:1711.04894, 2017.
Youssef Mroueh and Tom Sercu.
Fisher gan.
InAdvances in Neural Information Processing Systems, pages 2513{2523,
2017.
33

References
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka.
f-gan: Training generative neural samplers using variational
divergence minimization.
InAdvances in neural information processing systems, pages 271{279,
2016.
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz,
and Bryan Catanzaro.
High-resolution image synthesis and semantic manipulation with
conditional gans.
InProceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 8798{8807, 2018.
34

Introduction to Generative Adversarial Network

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Introduction to Generative Adversarial Network

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......