M4L19 Generative Models - Slides v 3.pdf

yireme8491 17 views 61 slides Jun 30, 2024
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

Generative Models - Slides v3


Slide Content

Generative
Models
Introduction

Spectrum of Low-Labeled Learning
Supervised
Learning
⬣Train Input: !,#
⬣Learning output:
$∶!→#, '()|+)
⬣e.g. classification
Sheep
Dog
Cat
Lion
Giraffe
Unsupervised
Learning
⬣Input: !
⬣Learning output: '+
⬣Example: Clustering,
density estimation, etc.
Less Labels

Unsupervised Learning
Density
Estimation
Classification
Regression
Clustering
Dimensionality
Reduction
x y
x y
Discrete
Continuous
x cDiscrete
x zContinuous
Supervised Learning
Unsupervised Learning
x p(x)On simplex

What to Learn?
Traditional unsupervised learning methods:
Similar in deep learning, but from neural network/learning perspective
Modeling !"Comparing/
Grouping
Representation
Learning
Principal
Component
Analysis
ClusteringDensity
estimation
Almost all deep learning!Metric learning & clusteringDeep Generative Models

Discriminative models model !"#
⬣Example: Model this via neural network, SVM, etc.
Generative models model !(#)
GenerativeModels
Discriminative vs.Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

Discriminative models model !"#
⬣Example: Model this via neural network, SVM, etc.
Generative models model !(#)
⬣We can parameterize our model as!(#,')and use maximum likelihood to optimize the
parameters given an unlabeled dataset:
⬣They are called generative because they can often generate samples
⬣Example: Multivariate Gaussian with estimated parameters (,)
Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
Discriminative vs.Generative Models

Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

PixelRNN&
PixelCNN

Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

Factorizing P(x)
We can use chain rule to decompose the joint distribution
⬣Factorizes joint distribution into a productof conditional distributions
⬣Similarto Bayesian Network (factorizing a joint distribution)
⬣Similar tolanguage models!
⬣Requires some orderingof variables (edges in a probabilistic graphical
model)
⬣We can estimate this conditionaldistribution as a neural network
Oord et al., Pixel Recurrent Neural Networks
!"=$
#$%
&-
!"#"%,…,"#'%)

Modeling Language as a Sequence
W(ZZZ)=W(^,^,...,^U)
<latexit sha1_base64="Dy6L98O5wQz/O6+rKcsn8UtJLaA=">AAACTHicjVBPSxtBHJ2NNtVt2sZ69LIYCgmkYTcESg+FQAU9tGqhMYFsWGYnv8Qh84+Z2VhZ9iv4aby2H6F3v4e3Ijgbc9B66Q8G3vzevMe8lypGjQ3DG6+ysfmi+nJr239Ve/3mbX3n3ZmRmSYwIJJJPUqxAUYFDCy1DEZKA+Ypg2G6+FLywyVoQ6X4YS8VTDieCzqjBFu3SurNWAXu8DQ3RSv4HMSqeZFE7eAi6baDmE2lNeVFtJJ6I+yEqwmeg2gNGmg9p8mOV4unkmQchCUMGzOOQmUnOdaWEgaF78eZAYXJAs9hPF1SZQTmYCb5z1Ws4jGfY244tuftlBf++8dEJiiRU/hQsk8983JlFBDnZMBixjimYiaFNc0DOqcu2VfXgmgfaoBFKz8CtgT3NxwcQwYPopVFKfkvhR8fgIuq4ZuTnSjQ2Eqdx6rIVeH6i/5t6zk463aiXufT916j3183uYX20D5qogh9RH10hE7RABF0ha7RL/Tb++Pden+9u4enFW+t2UVPplK9B/L8sGc=</latexit>
=?(?)?(?|?)?(?|?,?)···?(??|??!,...,?)
<latexit sha1_base64="z2oJn1MUbm5hrhI+eRlNhny9eTg=">AAACpXicjZDfShtBFMYnW9va9I/RXvZmMJRG2IZdK5ReCIFK64VSW5IYcMMyO3sSh8w/ZibGsuzL+Ra+QW/rEzgbV6l603P18f3mO4f5Ms2ZdVF02QierDx99nz1RfPlq9dv1lrrG0Or5obCgCquzCgjFjiTMHDMcRhpA0RkHI6z2deKH5+BsUzJvvutYSzIVLIJo8R5K20luzjRuLNI462lwIt0GyeC5fjOWqSf7pyw4t6muXK2pvKWFvJjXIYJr1hYxdNWO+pGy8GPRVyLNqrnKF1vxEmu6FyAdJQTa0/iSLtxQYxjlEPZbCZzC5rQGZnCSX7GtJVEgB0X58sqyn95QYQVxJ2GmfDB93gEB6QPI7xg7hR/IxQypWb4F5AcTNfzgQX/I5EV5yWeKIMzxXPMHOGM3ltb7bQaqD9mwWMuCJMTJZ3t7LEp838/8OXK8LsBmG0VDy7dpJY7qsz/RZrJHvg+DBz63A8NhjhlikSXhS59yfHDSh+L4XY33ul++bnT7vXqulfRO7SJOihGn1EP7aMjNEAUXaA/6C+6Cj4Eh0E/GN48DRp15i26N0F6Db6HygI=</latexit>
=
Y
?
?(??|??!,...,?)
<latexit sha1_base64="5lnhRMXlIvrzVgBIp37mqdQq9xo=">AAACmHicjVBNaxsxEJU3/UjcLye9tRdRU3DANbslUHIImDQ0LaQkNXFiyJpFqx07w0orIclJyqKf1R8Tem3/R2XH0Ca5dC4a3tN7M/NyLdC6OL5uRCsPHj56vLrWfPL02fMXrfWNE6tmhsOQK6HMKGcWBFYwdOgEjLQBJnMBp3n5cc6fXoCxqKpj913DWLJphRPkzAUoax3u0FQbVWQYXtqhdbrwrHfFDDy9zNDTVGLxFzfIywEUc67Gd4nv0lQUytluABK/mbXacS9eFL3fJMumTZZ1lK03krRQfCahclwwa8+SWLtxzYxDLsA3m+nMgma8ZFM4Ky5Q24pJsOP6arGR/5evmbSSufNuLoPwLR3BATuGEb1Ed04/MQ65UiUdACvA9AI/tBDOlnl95elEGZorUVB0TCC/ZTv3tBp4GGYh0EIyrCaqcrazh1MM1x+EPKvuvgEoN+s7k25UC4+55v8kzXQPQh4GvgbdoQbDXIg/1b7WPoSc3I30fnPyvpds9ba/bbX7u8u4V8lr8oZ0SEI+kD75TI7IkHDyg/wkv8jv6FXUj/ajLzdfo8ZS85LcqmjwB1n2yX8=</latexit>
next
word
history

Language Models as an RNN
_
<latexit sha1_base64="gZeHMYTOP+sT+l1JRcnTYUEuAeM=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhwQmzs5OZWYLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWdwOdbrQ5CKevxRPwtou7jPaY8SrKPRw6hjd5IZK29NkF4m9oxk0AyVTspItLoe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEwGuA/N7pAKxbELqh2MJveG83qAXeVi/ZRz3NA8mhd8TonXhZOxupgZjEdKAImSFGjMmIsp73lcq2yJ9qlWuevoPZ67kgCD46AMbAjRbTh9Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2T2mneLuTP7wuZ4sWsyTg6QIcoi2x0hoqojCqoigjqoxf0it6Md+PD+DS+pqsxY+bZRwswfn4B60ekNw==</latexit>
_
<latexit sha1_base64="k/dKwWbQ2lSCCFW0fyxAQzLS+xk=">AAACJXicjVDLTgJBEJz1iSsq6NELkZhggmSXkBhvREnk4AOjPBIgZHZocMLs7GRmlmA2+wle9RP8Gm/GxJO/4vI4gFysU6Wrq9JdjmBUacv6NlZW19Y3NmNb5nZ8Z3cvkdyvKc+XBKrEY55sOFgBoxyqmmoGDSEBuw6DujO4HOv1IUhFPf6onwW0XdzntEcJ1tHoYdTJdxJpK2dNkFom9oyk0QyVTtKIt7oe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEwGuA/N7pAKxbELqh2MJveG83qAXeVi/ZR13NA8nhd8TonXhdOxupgZjEdKAImSFGjMmIsp73lcq0yJ9qlW2evoPZ69kgCDk6AMbAjRbTh1Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2TWj5nF3Ln94V08WLWZAwdoiOUQTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09UVY+Y5QAswfn4B7QakOA==</latexit>
_U
<latexit sha1_base64="xwJ+yfU5qHSz3GwUO6jojkIveGw=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhwQmzs5OZWYLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWdwOdbrQ5CKevxRPwtou7jPaY8SrKPRw6jDO8mMlbcmSC8Te0YyaIZKJ2UkWl2P+C5wTRhWqmlbQrcDLDUlDELTbPkKBCYD3Idmd0iF4tgF1Q5Gk3vDeT3ArnKxfso5bmgezQs+p8TrwslYXcwMxiMlgERJCjRmzMWU9zyuVbZE+1Sr3HX0Hs9dSYDBcVAGNoToNpy+BR+mpknE2PIvh9kqQfSqhJvIdidAYu3JoCXCQIRRf/bftpZJ7TRvF/Ln94VM8WLWZBwdoEOURTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09WYMfPsowUYP79V2aR0</latexit>
O
<latexit sha1_base64="K5xyeyIv9CCxeY0JYexCdBzEmok=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhgQmzs5OZWaLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWd4OdbrI5CKevxRPwtou7jPaY8SrKPRw6Bjd5IZK29NkF4m9oxk0AyVTspItLoe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEyGuA/N7ogKxbELqh08Te4N5/UAu8rFepBz3NA8mhd8TonXhZOxupgZjEdKAImSFGjMmIsp73lcq2yJ9qlWuevoPZ67kgDD46AMbATRbTh9Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2T2mneLuTP7wuZ4sWsyTg6QIcoi2x0hoqojCqoigjqoxf0it6Md+PD+DS+pqsxY+bZRwswfn4BzzekJw==</latexit>
O
<latexit sha1_base64="nPel/bKYlGkxQFGV8Fo6FNZBKAs=">AAACJXicjVDLTgJBEJz1iSsq6NELkZhggmSXkBhvREnk4AOjPBIgZHZoYMLs7GRmlmg2+wle9RP8Gm/GxJO/4vI4gFysU6Wrq9JdjmBUacv6NlZW19Y3NmNb5nZ8Z3cvkdyvKc+XBKrEY55sOFgBoxyqmmoGDSEBuw6DujO8HOv1EUhFPf6onwW0XdzntEcJ1tHoYdDJdxJpK2dNkFom9oyk0QyVTtKIt7oe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEyGuA/N7ogKxbELqh08Te4N5/UAu8rFepB13NA8nhd8TonXhdOxupgZjEdKAImSFGjMmIsp73lcq0yJ9qlW2evoPZ69kgDDk6AMbATRbTh1Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2TWj5nF3Ln94V08WLWZAwdoiOUQTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09UVY+Y5QAswfn4B0PakKA==</latexit>
OU
<latexit sha1_base64="LFF7Nx5Em2MvkypnPro+YviNqe8=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhgQmzs5OZWaLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWd4OdbrI5CKevxRPwtou7jPaY8SrKPRw6DDO8mMlbcmSC8Te0YyaIZKJ2UkWl2P+C5wTRhWqmlbQrcDLDUlDELTbPkKBCZD3Idmd0SF4tgF1Q6eJveG83qAXeViPcg5bmgezQs+p8TrwslYXcwMxiMlgERJCjRmzMWU9zyuVbZE+1Sr3HX0Hs9dSYDhcVAGNoLoNpy+BR+mpknE2PIvh9kqQfSqhJvIdidAYu3JoCXCQIRRf/bftpZJ7TRvF/Ln94VM8WLWZBwdoEOURTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09WYMfPsowUYP785yaRk</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
. . .
⬣Language modeling involves estimating a probability distribution over
sequences of words.
W(ZZZ)=W(^,^,...,^U)
<latexit sha1_base64="Dy6L98O5wQz/O6+rKcsn8UtJLaA=">AAACTHicjVBPSxtBHJ2NNtVt2sZ69LIYCgmkYTcESg+FQAU9tGqhMYFsWGYnv8Qh84+Z2VhZ9iv4aby2H6F3v4e3Ijgbc9B66Q8G3vzevMe8lypGjQ3DG6+ysfmi+nJr239Ve/3mbX3n3ZmRmSYwIJJJPUqxAUYFDCy1DEZKA+Ypg2G6+FLywyVoQ6X4YS8VTDieCzqjBFu3SurNWAXu8DQ3RSv4HMSqeZFE7eAi6baDmE2lNeVFtJJ6I+yEqwmeg2gNGmg9p8mOV4unkmQchCUMGzOOQmUnOdaWEgaF78eZAYXJAs9hPF1SZQTmYCb5z1Ws4jGfY244tuftlBf++8dEJiiRU/hQsk8983JlFBDnZMBixjimYiaFNc0DOqcu2VfXgmgfaoBFKz8CtgT3NxwcQwYPopVFKfkvhR8fgIuq4ZuTnSjQ2Eqdx6rIVeH6i/5t6zk463aiXufT916j3183uYX20D5qogh9RH10hE7RABF0ha7RL/Tb++Pden+9u4enFW+t2UVPplK9B/L8sGc=</latexit>
=
Y
?
?(??|??!,...,?)
<latexit sha1_base64="5lnhRMXlIvrzVgBIp37mqdQq9xo=">AAACmHicjVBNaxsxEJU3/UjcLye9tRdRU3DANbslUHIImDQ0LaQkNXFiyJpFqx07w0orIclJyqKf1R8Tem3/R2XH0Ca5dC4a3tN7M/NyLdC6OL5uRCsPHj56vLrWfPL02fMXrfWNE6tmhsOQK6HMKGcWBFYwdOgEjLQBJnMBp3n5cc6fXoCxqKpj913DWLJphRPkzAUoax3u0FQbVWQYXtqhdbrwrHfFDDy9zNDTVGLxFzfIywEUc67Gd4nv0lQUytluABK/mbXacS9eFL3fJMumTZZ1lK03krRQfCahclwwa8+SWLtxzYxDLsA3m+nMgma8ZFM4Ky5Q24pJsOP6arGR/5evmbSSufNuLoPwLR3BATuGEb1Ed04/MQ65UiUdACvA9AI/tBDOlnl95elEGZorUVB0TCC/ZTv3tBp4GGYh0EIyrCaqcrazh1MM1x+EPKvuvgEoN+s7k25UC4+55v8kzXQPQh4GvgbdoQbDXIg/1b7WPoSc3I30fnPyvpds9ba/bbX7u8u4V8lr8oZ0SEI+kD75TI7IkHDyg/wkv8jv6FXUj/ajLzdfo8ZS85LcqmjwB1n2yX8=</latexit>
next
word
history
⬣RNNs are a family of neural architectures for modeling sequences.

Factorized Models for Images
!"=!"*$
+,-
.!
!"+"*,…,"+/*)
Oord et al., Pixel Recurrent Neural Networks
!"=$
+,*
.!
!"+"*,…,"+/*)

Factorized Models for Images
!"=!"!!"""!!"#"!$
$%&
'!
!"$"!,…,"$(!)
Training:
⬣We can train similar to language models:
Teacher/student forcing
⬣Maximum likelihood approach
Downsides:
⬣Slow sequential generation process
⬣Only considers few context pixels
Oord et al., Pixel Recurrent Neural Networks

PixelCNN
Oord et al., Conditional Image Generation with PixelCNNDecoders
⬣Idea: Represent conditional
distribution as a convolution
layer!
⬣Considers larger context
(receptive field)
⬣Practically can be implemented
by applying a mask, zeroing
out “future” pixels
⬣Faster training but still slow
generation
⬣Limited to smaller images

Example Results: Image Completion (PixelRNN)
Oord et al., Conditional Image Generation with PixelCNNDecoders

Example Images (PixelCNN)
Oord et al., Conditional Image Generation with PixelCNNDecoders

Generative
Adversarial
Networks
(GANs)

Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

Implicit Models
Implicit generative models do not actually learn an explicit model for !"
Instead, learn to generate samplesfrom !"
⬣Learn good feature representations
⬣Perform data augmentation
⬣Learn world models (a simulator!) for reinforcement learning
How?
⬣Learn to sample from a neural network output
⬣Adversarial trainingthat uses one network’s predictions to train
the other (dynamic loss function!)
⬣Lots of tricksto make the optimization more stable

Learning to Sample
We would like to sample from !"using a neural network
Idea:
⬣Sample from a simple distribution (Gaussian)
⬣Transform the sample to !"
#$,&NeuralNetwork
SamplesSamples
!"

Generating Images
⬣Input can be a vector with (independent) Gaussian random numbers
⬣We can use a CNN to generate images!
#$,&
NeuralNetwork!"
Vector of
Random
Numbers
Generator

Adversarial Networks
⬣Goal: We would like to generate realisticimages. How can we drive the
network to learn how to do this?
⬣Idea: Have anothernetwork try to distinguish a real image from a generated
(fake) image
⬣Why? Signal can be used to determine how well it’s doing at generation
Discriminator
Real or
Fake?
#$,&
NeuralNetwork!"
Vector of
Random
Numbers
Generator

Generative Adversarial Networks (GANs)
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data
Question: What loss functions can we use (for each network)?
⬣Generator: Update weights to improve
realism of generated images
⬣Discriminator: Update weights to better
discriminate

⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
Mini-maxTwo Player Game
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
⬣The full mini-max objective is:
⬣where()is the discriminator outputs probability ([0,1]) of realimage
⬣)is a real image and *+is a generated image
Mini-maxTwo Player Game
Generator minimizesHow well discriminator
does (0 for fake)
Sample from fake
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
⬣The full mini-max objective is:
⬣where()is the discriminator outputs probability ([0,1]) of realimage
⬣)is a real image and *+is a generated image
Mini-maxTwo Player Game
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
How well discriminator
does (1 for real)
Discriminator maximizesHow well discriminator
does (0 for fake)
Sample from fakeSample from real

Discriminator Perspective
⬣where'(is the discriminator outputs probability ([0,1]) of real
image
⬣(is a real image and )*is a generated image
⬣The discriminator wants to maximizethis:
⬣!"is pushed up (to 1) because "is a real image
⬣1−!%&is also pushed up to 1 (so that DGzis pushed down to
0)
⬣In other words, discriminator wants to classify real images as real (1)
and fake images as fake (0)

Generator Perspective
⬣where'(is the discriminator outputs probability ([0,1]) of real
image
⬣(is a real image and )*is a generated image
⬣The generator wants to minimizethis:
⬣1−!%&is pushed down to 0 (so that DGzis pushed up to 1)
⬣This means that the generator is foolingthe discriminator, i.e.
succeeding at generating images that the discriminator can’t
discriminate from real

Generative Adversarial Networks (GANs)
Generator LossDiscriminator Loss
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data

Converting to Max-Max Game
The generator part of the objective does not have good gradient properties
⬣High gradient when ')*is high (that is, discriminator is wrong)
⬣We want it to improve when samples are bad(discriminator is right)
Alternative objective, maximize:
Plot from CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

Final AlgorithmGoodfellow, NeurIPS2016 Generative Adversarial Nets

Generative Adversarial Networks (GANs)
At the end, we have:
⬣An implicitgenerative model!
⬣Features from discriminator
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data

EarlyResults
⬣Low-resolution
images but look
decent!
⬣Last column are
nearest
neighbor
matches in
dataset
Goodfellow, NeurIPS2016 Generative Adversarial Nets

Difficulty in Training
Goodfellow, NeurIPS2016 Generative Adversarial Nets
GANs are very difficult to train
due to the mini-max objective
Advancements include:
⬣More stable architectures
⬣Regularization methods to
improve optimization
⬣Progressive growing/training
and scaling

DCGAN
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Radford et al., Unsupervised Representation Learning with Deep Convolutional
Generative Adversarial Networks

Regularization
Kodaliet al., On Convergence and Stability of GANs (also known as How to Train your DRAGAN)
Training GANs is difficult due to:
⬣Minimax objective –For example, what if generator learns to
memorize training data (no variety) or only generates part of the
distribution?
⬣Mode collapse –Capturing only some modes of distribution
Several theoretically-motivated regularization methods
⬣Simple example: Add noise to real samples!

Example Generated Images -BigGAN
Brock et al., Large Scale GAN Training for High Fidelity Natural Image Synthesis

FailureExamples -BigGAN
Brock et al., Large Scale GAN Training for High Fidelity Natural Image Synthesis

Video Generation
https://www.youtube.com/wa
tch?v=PCBTZh41Ris

Summary
GenerativeAdversarial Networks (GANs)
can produce amazing images!
Several drawbacks
⬣High-fidelitygeneration heavy to train
⬣Trainingcan be unstable
⬣Noexplicit model for distribution
Larger number of extensions:
⬣GANs conditioned on labels or other
information
⬣Adversariallosses for other
applications

Variational
Autoencoders
(VAEs)

Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks

Reminder: Autoencoders
EncoderDecoder
Low dimensional embedding
Minimize the difference (with MSE)
Linear layers with reduced
dimension or Conv-2d
layers with stride
Linear layers with increasing
dimension or Conv-2d layers
with bilinear upsampling

Formalizingthe Generative Model
What is this?
Hidden/Latent variables
Factorsof variation that
produce an image:
(digit, orientation, scale, etc.)
()=*()+;-(+.+
⬣We cannot maximizethis likelihood due to the integral
⬣Insteadwe maximize a variationallower bound(VLB) that we can
compute
Kingma& Welling, Auto-Encoding VariationalBayes
+

VariationalAutoencoder: Decoder
⬣We can combine the probabilistic view, sampling, autoencoders, and approximate
optimization
⬣Just as before, sample ,from simpler distribution
⬣We can also output parameters of a probability
distribution!
⬣Example: -,/of Gaussian distribution
⬣For multi-dimensional version output diagonal
covariance
⬣How can we maximize
01=∫01,;50,6,
,
$)&)
Decoder
-.|,;1

VariationalAutoencoder: Encoder
⬣We can combine the probabilistic view, sampling, autoencoders, and approximate
optimization
⬣Given an image, estimate ,
⬣Again, output parameters of a
distribution
2*3*
X
Encoder
Q+|5;6

Putting Them Together
We can tie the encoder and decoder togetherinto a probabilistic autoencoder
⬣Given data (X), estimate -",/"and sample from 8(-",/")
⬣Given ,, estimate -#,/#and sample from 8(-#,/#)
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+

Maximizing Likelihood
⬣How can we optimize the parameters of the two networks?
Now equipped with our encoder and decoder networks, let’s work
out the (log) data likelihood:
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

KL-Divergence
Aside: KL Divergence (distance measure for distributions), always >= 0
9:(<|==?,<,=−?(<)=∑<(log<(−∑<(log=(
Definition of Expectation

Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
The expectation wrt. z (using encoder
network) let us write nice KL terms

Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Decoder network gives .!/0,
can compute estimate of this term
through sampling. (Sampling
differentiable through reparam.
trick. see paper.)
This KL term (between
Gaussians for encoder
and z prior) has nice
closed-form solution!
.!0/intractable (saw
earlier), can’t compute this
KL term LBut we know KL
divergence always >=0.

Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Putting it all together: maximizing the likelihood lower bound
Make approximate
posterior distribution
close to prior
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+
Putting it all together: maximizing the likelihood lower bound
Sample from ;(<|>)~@(A$,B$)
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+
Putting it all together: maximizing the likelihood lower bound
F5
Sample from !(>|<;C)~@(A%,B%)
Maximize likelihood of
original input being
reconstructed
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung

ReparameterizationTrick: Problem
Tutorial on Variational Autoencoders
https://arxiv.org/abs/1606.05908 http://gokererdogan.github.io/2016/07/01/reparameterization-trick/
⬣Problem with respect to the VLB:
updating 6
⬣+~H(+|5;6): need to differentiate
through the sampling process w.r.t 6
(encoder is probabilistic)

ReparameterizationTrick: Solution
⬣Solution: make the randomness
independent of encoder output, making
the encoder deterministic
⬣Gaussian distribution example:
⬣Previously: encoder output =
random variable *~I(2,3)
⬣Now encoder output = distribution
parameter [2,3]
⬣*=2+K∗3,K~I(0,1)
Tutorial on Variational Autoencoders
https://arxiv.org/abs/1606.05908 http://gokererdogan.github.io/2016/07/01/reparameterization-trick/

Interpretability of LatentVector
Kingma& Welling, Auto-Encoding VariationalBayes
D&
D!

Summary
Variational Autoencoders (VAEs)provide a
principled way to perform approximate
maximum likelihood optimization
⬣Requiressome assumptions (e.g.
Gaussian distributions)
Samples are often not as competitive as
GANs
Latent features (learned in an unsupervised
way!) often good for downstream tasks:
⬣Example: World models for reinforcement
learning (Ha et al., 2018)
Ha & Schmidhuber, World Models, 2018
Tags