Spectrum of Low-Labeled Learning
Supervised
Learning
⬣Train Input: !,#
⬣Learning output:
$∶!→#, '()|+)
⬣e.g. classification
Sheep
Dog
Cat
Lion
Giraffe
Unsupervised
Learning
⬣Input: !
⬣Learning output: '+
⬣Example: Clustering,
density estimation, etc.
Less Labels
Unsupervised Learning
Density
Estimation
Classification
Regression
Clustering
Dimensionality
Reduction
x y
x y
Discrete
Continuous
x cDiscrete
x zContinuous
Supervised Learning
Unsupervised Learning
x p(x)On simplex
What to Learn?
Traditional unsupervised learning methods:
Similar in deep learning, but from neural network/learning perspective
Modeling !"Comparing/
Grouping
Representation
Learning
Principal
Component
Analysis
ClusteringDensity
estimation
Almost all deep learning!Metric learning & clusteringDeep Generative Models
Discriminative models model !"#
⬣Example: Model this via neural network, SVM, etc.
Generative models model !(#)
GenerativeModels
Discriminative vs.Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
Discriminative models model !"#
⬣Example: Model this via neural network, SVM, etc.
Generative models model !(#)
⬣We can parameterize our model as!(#,')and use maximum likelihood to optimize the
parameters given an unlabeled dataset:
⬣They are called generative because they can often generate samples
⬣Example: Multivariate Gaussian with estimated parameters (,)
Generative Models
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
Discriminative vs.Generative Models
Factorizing P(x)
We can use chain rule to decompose the joint distribution
⬣Factorizes joint distribution into a productof conditional distributions
⬣Similarto Bayesian Network (factorizing a joint distribution)
⬣Similar tolanguage models!
⬣Requires some orderingof variables (edges in a probabilistic graphical
model)
⬣We can estimate this conditionaldistribution as a neural network
Oord et al., Pixel Recurrent Neural Networks
!"=$
#$%
&-
!"#"%,…,"#'%)
Modeling Language as a Sequence
W(ZZZ)=W(^,^,...,^U)
<latexit sha1_base64="Dy6L98O5wQz/O6+rKcsn8UtJLaA=">AAACTHicjVBPSxtBHJ2NNtVt2sZ69LIYCgmkYTcESg+FQAU9tGqhMYFsWGYnv8Qh84+Z2VhZ9iv4aby2H6F3v4e3Ijgbc9B66Q8G3vzevMe8lypGjQ3DG6+ysfmi+nJr239Ve/3mbX3n3ZmRmSYwIJJJPUqxAUYFDCy1DEZKA+Ypg2G6+FLywyVoQ6X4YS8VTDieCzqjBFu3SurNWAXu8DQ3RSv4HMSqeZFE7eAi6baDmE2lNeVFtJJ6I+yEqwmeg2gNGmg9p8mOV4unkmQchCUMGzOOQmUnOdaWEgaF78eZAYXJAs9hPF1SZQTmYCb5z1Ws4jGfY244tuftlBf++8dEJiiRU/hQsk8983JlFBDnZMBixjimYiaFNc0DOqcu2VfXgmgfaoBFKz8CtgT3NxwcQwYPopVFKfkvhR8fgIuq4ZuTnSjQ2Eqdx6rIVeH6i/5t6zk463aiXufT916j3183uYX20D5qogh9RH10hE7RABF0ha7RL/Tb++Pden+9u4enFW+t2UVPplK9B/L8sGc=</latexit>
=?(?)?(?|?)?(?|?,?)···?(??|??!,...,?)
<latexit sha1_base64="z2oJn1MUbm5hrhI+eRlNhny9eTg=">AAACpXicjZDfShtBFMYnW9va9I/RXvZmMJRG2IZdK5ReCIFK64VSW5IYcMMyO3sSh8w/ZibGsuzL+Ra+QW/rEzgbV6l603P18f3mO4f5Ms2ZdVF02QierDx99nz1RfPlq9dv1lrrG0Or5obCgCquzCgjFjiTMHDMcRhpA0RkHI6z2deKH5+BsUzJvvutYSzIVLIJo8R5K20luzjRuLNI462lwIt0GyeC5fjOWqSf7pyw4t6muXK2pvKWFvJjXIYJr1hYxdNWO+pGy8GPRVyLNqrnKF1vxEmu6FyAdJQTa0/iSLtxQYxjlEPZbCZzC5rQGZnCSX7GtJVEgB0X58sqyn95QYQVxJ2GmfDB93gEB6QPI7xg7hR/IxQypWb4F5AcTNfzgQX/I5EV5yWeKIMzxXPMHOGM3ltb7bQaqD9mwWMuCJMTJZ3t7LEp838/8OXK8LsBmG0VDy7dpJY7qsz/RZrJHvg+DBz63A8NhjhlikSXhS59yfHDSh+L4XY33ul++bnT7vXqulfRO7SJOihGn1EP7aMjNEAUXaA/6C+6Cj4Eh0E/GN48DRp15i26N0F6Db6HygI=</latexit>
=
Y
?
?(??|??!,...,?)
<latexit sha1_base64="5lnhRMXlIvrzVgBIp37mqdQq9xo=">AAACmHicjVBNaxsxEJU3/UjcLye9tRdRU3DANbslUHIImDQ0LaQkNXFiyJpFqx07w0orIclJyqKf1R8Tem3/R2XH0Ca5dC4a3tN7M/NyLdC6OL5uRCsPHj56vLrWfPL02fMXrfWNE6tmhsOQK6HMKGcWBFYwdOgEjLQBJnMBp3n5cc6fXoCxqKpj913DWLJphRPkzAUoax3u0FQbVWQYXtqhdbrwrHfFDDy9zNDTVGLxFzfIywEUc67Gd4nv0lQUytluABK/mbXacS9eFL3fJMumTZZ1lK03krRQfCahclwwa8+SWLtxzYxDLsA3m+nMgma8ZFM4Ky5Q24pJsOP6arGR/5evmbSSufNuLoPwLR3BATuGEb1Ed04/MQ65UiUdACvA9AI/tBDOlnl95elEGZorUVB0TCC/ZTv3tBp4GGYh0EIyrCaqcrazh1MM1x+EPKvuvgEoN+s7k25UC4+55v8kzXQPQh4GvgbdoQbDXIg/1b7WPoSc3I30fnPyvpds9ba/bbX7u8u4V8lr8oZ0SEI+kD75TI7IkHDyg/wkv8jv6FXUj/ajLzdfo8ZS85LcqmjwB1n2yX8=</latexit>
next
word
history
Language Models as an RNN
_
<latexit sha1_base64="gZeHMYTOP+sT+l1JRcnTYUEuAeM=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhwQmzs5OZWYLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWdwOdbrQ5CKevxRPwtou7jPaY8SrKPRw6hjd5IZK29NkF4m9oxk0AyVTspItLoe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEwGuA/N7pAKxbELqh2MJveG83qAXeVi/ZRz3NA8mhd8TonXhZOxupgZjEdKAImSFGjMmIsp73lcq2yJ9qlWuevoPZ67kgCD46AMbAjRbTh9Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2T2mneLuTP7wuZ4sWsyTg6QIcoi2x0hoqojCqoigjqoxf0it6Md+PD+DS+pqsxY+bZRwswfn4B60ekNw==</latexit>
_
<latexit sha1_base64="k/dKwWbQ2lSCCFW0fyxAQzLS+xk=">AAACJXicjVDLTgJBEJz1iSsq6NELkZhggmSXkBhvREnk4AOjPBIgZHZocMLs7GRmlmA2+wle9RP8Gm/GxJO/4vI4gFysU6Wrq9JdjmBUacv6NlZW19Y3NmNb5nZ8Z3cvkdyvKc+XBKrEY55sOFgBoxyqmmoGDSEBuw6DujO4HOv1IUhFPf6onwW0XdzntEcJ1tHoYdTJdxJpK2dNkFom9oyk0QyVTtKIt7oe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEwGuA/N7pAKxbELqh2MJveG83qAXeVi/ZR13NA8nhd8TonXhdOxupgZjEdKAImSFGjMmIsp73lcq0yJ9qlW2evoPZ69kgCDk6AMbAjRbTh1Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2TWj5nF3Ln94V08WLWZAwdoiOUQTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09UVY+Y5QAswfn4B7QakOA==</latexit>
_U
<latexit sha1_base64="xwJ+yfU5qHSz3GwUO6jojkIveGw=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhwQmzs5OZWYLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWdwOdbrQ5CKevxRPwtou7jPaY8SrKPRw6jDO8mMlbcmSC8Te0YyaIZKJ2UkWl2P+C5wTRhWqmlbQrcDLDUlDELTbPkKBCYD3Idmd0iF4tgF1Q5Gk3vDeT3ArnKxfso5bmgezQs+p8TrwslYXcwMxiMlgERJCjRmzMWU9zyuVbZE+1Sr3HX0Hs9dSYDBcVAGNoToNpy+BR+mpknE2PIvh9kqQfSqhJvIdidAYu3JoCXCQIRRf/bftpZJ7TRvF/Ln94VM8WLWZBwdoEOURTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09WYMfPsowUYP79V2aR0</latexit>
O
<latexit sha1_base64="K5xyeyIv9CCxeY0JYexCdBzEmok=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhgQmzs5OZWaLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWd4OdbrI5CKevxRPwtou7jPaY8SrKPRw6Bjd5IZK29NkF4m9oxk0AyVTspItLoe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEyGuA/N7ogKxbELqh08Te4N5/UAu8rFepBz3NA8mhd8TonXhZOxupgZjEdKAImSFGjMmIsp73lcq2yJ9qlWuevoPZ67kgDD46AMbATRbTh9Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2T2mneLuTP7wuZ4sWsyTg6QIcoi2x0hoqojCqoigjqoxf0it6Md+PD+DS+pqsxY+bZRwswfn4BzzekJw==</latexit>
O
<latexit sha1_base64="nPel/bKYlGkxQFGV8Fo6FNZBKAs=">AAACJXicjVDLTgJBEJz1iSsq6NELkZhggmSXkBhvREnk4AOjPBIgZHZoYMLs7GRmlmg2+wle9RP8Gm/GxJO/4vI4gFysU6Wrq9JdjmBUacv6NlZW19Y3NmNb5nZ8Z3cvkdyvKc+XBKrEY55sOFgBoxyqmmoGDSEBuw6DujO8HOv1EUhFPf6onwW0XdzntEcJ1tHoYdDJdxJpK2dNkFom9oyk0QyVTtKIt7oe8V3gmjCsVNO2hG4HWGpKGISm2fIVCEyGuA/N7ogKxbELqh08Te4N5/UAu8rFepB13NA8nhd8TonXhdOxupgZjEdKAImSFGjMmIsp73lcq0yJ9qlW2evoPZ69kgDDk6AMbATRbTh1Cz5MTZOIseVfDrNVguhVCTeR7U6AxNqTQUuEgQij/uy/bS2TWj5nF3Ln94V08WLWZAwdoiOUQTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09UVY+Y5QAswfn4B0PakKA==</latexit>
OU
<latexit sha1_base64="LFF7Nx5Em2MvkypnPro+YviNqe8=">AAACJXicjVDLTgJBEJzFF66ooEcvRGKCCZJdQ2K8ESWRgw+M8kiAkNmhgQmzs5OZWaLZ7Cd41U/wa7wZE0/+isvjAHKxTpWurkp3OYJRpS3r24itrK6tb8Q3za3E9s5uMrVXU54vCVSJxzzZcLACRjlUNdUMGkICdh0GdWd4OdbrI5CKevxRPwtou7jPaY8SrKPRw6DDO8mMlbcmSC8Te0YyaIZKJ2UkWl2P+C5wTRhWqmlbQrcDLDUlDELTbPkKBCZD3Idmd0SF4tgF1Q6eJveG83qAXeViPcg5bmgezQs+p8TrwslYXcwMxiMlgERJCjRmzMWU9zyuVbZE+1Sr3HX0Hs9dSYDhcVAGNoLoNpy+BR+mpknE2PIvh9kqQfSqhJvIdidAYu3JoCXCQIRRf/bftpZJ7TRvF/Ln94VM8WLWZBwdoEOURTY6Q0VURhVURQT10Qt6RW/Gu/FhfBpf09WYMfPsowUYP785yaRk</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
M?
<latexit sha1_base64="OjFUHL7syOePOKl9pVMVqK8stCY=">AAACO3icjVDLSsNAFJ34rPHV6sKFm2IRFGpJRBB3RQt24ROsCqaEyfS2Dp3MDDOTooR8jVv9BD/EtTtx696kZtHqxrM63DPn3jknkIxq4zhv1sTk1PTMbGHOnl9YXFoullautYgUgRYRTKjbAGtglEPLUMPgVirAYcDgJugfZfrNAJSmgl+ZRwntEPc47VKCTTryi2uekKCwEYrjEOJu4nvmHgz2ixWn5gxR/kvcnFRQjgu/ZC14HUGiELghDGt95zrStGOsDCUMEtv2Ig0Skz7uwV1nQKXOLup2/DBMkYzqMQ51iM19NQgTe3NUiDglogM7mTq+M85GWgJJN+k0AWMhprwruNFbDdqjRldP0tC8eqwA+ttxE9gA0r/h8hlE8GMarsgs/3LYXgPSqApOU9t5XmPsySSWSdqf+7utv+R6t+bu1Q4u9yr1w7zJAlpHG2gLuWgf1VETXaAWIihBT+gZvViv1rv1YX3+PJ2wcs8qGoP19Q2wQK3A</latexit>
. . .
⬣Language modeling involves estimating a probability distribution over
sequences of words.
W(ZZZ)=W(^,^,...,^U)
<latexit sha1_base64="Dy6L98O5wQz/O6+rKcsn8UtJLaA=">AAACTHicjVBPSxtBHJ2NNtVt2sZ69LIYCgmkYTcESg+FQAU9tGqhMYFsWGYnv8Qh84+Z2VhZ9iv4aby2H6F3v4e3Ijgbc9B66Q8G3vzevMe8lypGjQ3DG6+ysfmi+nJr239Ve/3mbX3n3ZmRmSYwIJJJPUqxAUYFDCy1DEZKA+Ypg2G6+FLywyVoQ6X4YS8VTDieCzqjBFu3SurNWAXu8DQ3RSv4HMSqeZFE7eAi6baDmE2lNeVFtJJ6I+yEqwmeg2gNGmg9p8mOV4unkmQchCUMGzOOQmUnOdaWEgaF78eZAYXJAs9hPF1SZQTmYCb5z1Ws4jGfY244tuftlBf++8dEJiiRU/hQsk8983JlFBDnZMBixjimYiaFNc0DOqcu2VfXgmgfaoBFKz8CtgT3NxwcQwYPopVFKfkvhR8fgIuq4ZuTnSjQ2Eqdx6rIVeH6i/5t6zk463aiXufT916j3183uYX20D5qogh9RH10hE7RABF0ha7RL/Tb++Pden+9u4enFW+t2UVPplK9B/L8sGc=</latexit>
=
Y
?
?(??|??!,...,?)
<latexit sha1_base64="5lnhRMXlIvrzVgBIp37mqdQq9xo=">AAACmHicjVBNaxsxEJU3/UjcLye9tRdRU3DANbslUHIImDQ0LaQkNXFiyJpFqx07w0orIclJyqKf1R8Tem3/R2XH0Ca5dC4a3tN7M/NyLdC6OL5uRCsPHj56vLrWfPL02fMXrfWNE6tmhsOQK6HMKGcWBFYwdOgEjLQBJnMBp3n5cc6fXoCxqKpj913DWLJphRPkzAUoax3u0FQbVWQYXtqhdbrwrHfFDDy9zNDTVGLxFzfIywEUc67Gd4nv0lQUytluABK/mbXacS9eFL3fJMumTZZ1lK03krRQfCahclwwa8+SWLtxzYxDLsA3m+nMgma8ZFM4Ky5Q24pJsOP6arGR/5evmbSSufNuLoPwLR3BATuGEb1Ed04/MQ65UiUdACvA9AI/tBDOlnl95elEGZorUVB0TCC/ZTv3tBp4GGYh0EIyrCaqcrazh1MM1x+EPKvuvgEoN+s7k25UC4+55v8kzXQPQh4GvgbdoQbDXIg/1b7WPoSc3I30fnPyvpds9ba/bbX7u8u4V8lr8oZ0SEI+kD75TI7IkHDyg/wkv8jv6FXUj/ajLzdfo8ZS85LcqmjwB1n2yX8=</latexit>
next
word
history
⬣RNNs are a family of neural architectures for modeling sequences.
Factorized Models for Images
!"=!"*$
+,-
.!
!"+"*,…,"+/*)
Oord et al., Pixel Recurrent Neural Networks
!"=$
+,*
.!
!"+"*,…,"+/*)
Factorized Models for Images
!"=!"!!"""!!"#"!$
$%&
'!
!"$"!,…,"$(!)
Training:
⬣We can train similar to language models:
Teacher/student forcing
⬣Maximum likelihood approach
Downsides:
⬣Slow sequential generation process
⬣Only considers few context pixels
Oord et al., Pixel Recurrent Neural Networks
PixelCNN
Oord et al., Conditional Image Generation with PixelCNNDecoders
⬣Idea: Represent conditional
distribution as a convolution
layer!
⬣Considers larger context
(receptive field)
⬣Practically can be implemented
by applying a mask, zeroing
out “future” pixels
⬣Faster training but still slow
generation
⬣Limited to smaller images
Example Results: Image Completion (PixelRNN)
Oord et al., Conditional Image Generation with PixelCNNDecoders
Example Images (PixelCNN)
Oord et al., Conditional Image Generation with PixelCNNDecoders
Implicit Models
Implicit generative models do not actually learn an explicit model for !"
Instead, learn to generate samplesfrom !"
⬣Learn good feature representations
⬣Perform data augmentation
⬣Learn world models (a simulator!) for reinforcement learning
How?
⬣Learn to sample from a neural network output
⬣Adversarial trainingthat uses one network’s predictions to train
the other (dynamic loss function!)
⬣Lots of tricksto make the optimization more stable
Learning to Sample
We would like to sample from !"using a neural network
Idea:
⬣Sample from a simple distribution (Gaussian)
⬣Transform the sample to !"
#$,&NeuralNetwork
SamplesSamples
!"
Generating Images
⬣Input can be a vector with (independent) Gaussian random numbers
⬣We can use a CNN to generate images!
#$,&
NeuralNetwork!"
Vector of
Random
Numbers
Generator
Adversarial Networks
⬣Goal: We would like to generate realisticimages. How can we drive the
network to learn how to do this?
⬣Idea: Have anothernetwork try to distinguish a real image from a generated
(fake) image
⬣Why? Signal can be used to determine how well it’s doing at generation
Discriminator
Real or
Fake?
#$,&
NeuralNetwork!"
Vector of
Random
Numbers
Generator
Generative Adversarial Networks (GANs)
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data
Question: What loss functions can we use (for each network)?
⬣Generator: Update weights to improve
realism of generated images
⬣Discriminator: Update weights to better
discriminate
⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
Mini-maxTwo Player Game
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
⬣The full mini-max objective is:
⬣where()is the discriminator outputs probability ([0,1]) of realimage
⬣)is a real image and *+is a generated image
Mini-maxTwo Player Game
Generator minimizesHow well discriminator
does (0 for fake)
Sample from fake
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
⬣Since we have two networks competing, this is a mini-max two player game
⬣Ties to game theory
⬣Not clear what (even local) Nash equilibria are for this game
⬣The full mini-max objective is:
⬣where()is the discriminator outputs probability ([0,1]) of realimage
⬣)is a real image and *+is a generated image
Mini-maxTwo Player Game
Goodfellow, NeurIPS2016 Tutorial: Generative Adversarial Networks
How well discriminator
does (1 for real)
Discriminator maximizesHow well discriminator
does (0 for fake)
Sample from fakeSample from real
Discriminator Perspective
⬣where'(is the discriminator outputs probability ([0,1]) of real
image
⬣(is a real image and )*is a generated image
⬣The discriminator wants to maximizethis:
⬣!"is pushed up (to 1) because "is a real image
⬣1−!%&is also pushed up to 1 (so that DGzis pushed down to
0)
⬣In other words, discriminator wants to classify real images as real (1)
and fake images as fake (0)
Generator Perspective
⬣where'(is the discriminator outputs probability ([0,1]) of real
image
⬣(is a real image and )*is a generated image
⬣The generator wants to minimizethis:
⬣1−!%&is pushed down to 0 (so that DGzis pushed up to 1)
⬣This means that the generator is foolingthe discriminator, i.e.
succeeding at generating images that the discriminator can’t
discriminate from real
Generative Adversarial Networks (GANs)
Generator LossDiscriminator Loss
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data
Converting to Max-Max Game
The generator part of the objective does not have good gradient properties
⬣High gradient when ')*is high (that is, discriminator is wrong)
⬣We want it to improve when samples are bad(discriminator is right)
Alternative objective, maximize:
Plot from CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Final AlgorithmGoodfellow, NeurIPS2016 Generative Adversarial Nets
Generative Adversarial Networks (GANs)
At the end, we have:
⬣An implicitgenerative model!
⬣Features from discriminator
Vector of
Random
Numbers
GeneratorDiscriminator
Cross-entropy
(Real or Fake?)
We know the
answer (self-
supervised)
Mini-batch of
real & fake data
EarlyResults
⬣Low-resolution
images but look
decent!
⬣Last column are
nearest
neighbor
matches in
dataset
Goodfellow, NeurIPS2016 Generative Adversarial Nets
Difficulty in Training
Goodfellow, NeurIPS2016 Generative Adversarial Nets
GANs are very difficult to train
due to the mini-max objective
Advancements include:
⬣More stable architectures
⬣Regularization methods to
improve optimization
⬣Progressive growing/training
and scaling
DCGAN
Radford et al., Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Radford et al., Unsupervised Representation Learning with Deep Convolutional
Generative Adversarial Networks
Regularization
Kodaliet al., On Convergence and Stability of GANs (also known as How to Train your DRAGAN)
Training GANs is difficult due to:
⬣Minimax objective –For example, what if generator learns to
memorize training data (no variety) or only generates part of the
distribution?
⬣Mode collapse –Capturing only some modes of distribution
Several theoretically-motivated regularization methods
⬣Simple example: Add noise to real samples!
Example Generated Images -BigGAN
Brock et al., Large Scale GAN Training for High Fidelity Natural Image Synthesis
FailureExamples -BigGAN
Brock et al., Large Scale GAN Training for High Fidelity Natural Image Synthesis
Video Generation
https://www.youtube.com/wa
tch?v=PCBTZh41Ris
Summary
GenerativeAdversarial Networks (GANs)
can produce amazing images!
Several drawbacks
⬣High-fidelitygeneration heavy to train
⬣Trainingcan be unstable
⬣Noexplicit model for distribution
Larger number of extensions:
⬣GANs conditioned on labels or other
information
⬣Adversariallosses for other
applications
Reminder: Autoencoders
EncoderDecoder
Low dimensional embedding
Minimize the difference (with MSE)
Linear layers with reduced
dimension or Conv-2d
layers with stride
Linear layers with increasing
dimension or Conv-2d layers
with bilinear upsampling
Formalizingthe Generative Model
What is this?
Hidden/Latent variables
Factorsof variation that
produce an image:
(digit, orientation, scale, etc.)
()=*()+;-(+.+
⬣We cannot maximizethis likelihood due to the integral
⬣Insteadwe maximize a variationallower bound(VLB) that we can
compute
Kingma& Welling, Auto-Encoding VariationalBayes
+
VariationalAutoencoder: Decoder
⬣We can combine the probabilistic view, sampling, autoencoders, and approximate
optimization
⬣Just as before, sample ,from simpler distribution
⬣We can also output parameters of a probability
distribution!
⬣Example: -,/of Gaussian distribution
⬣For multi-dimensional version output diagonal
covariance
⬣How can we maximize
01=∫01,;50,6,
,
$)&)
Decoder
-.|,;1
VariationalAutoencoder: Encoder
⬣We can combine the probabilistic view, sampling, autoencoders, and approximate
optimization
⬣Given an image, estimate ,
⬣Again, output parameters of a
distribution
2*3*
X
Encoder
Q+|5;6
Putting Them Together
We can tie the encoder and decoder togetherinto a probabilistic autoencoder
⬣Given data (X), estimate -",/"and sample from 8(-",/")
⬣Given ,, estimate -#,/#and sample from 8(-#,/#)
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+
Maximizing Likelihood
⬣How can we optimize the parameters of the two networks?
Now equipped with our encoder and decoder networks, let’s work
out the (log) data likelihood:
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
KL-Divergence
Aside: KL Divergence (distance measure for distributions), always >= 0
9:(<|==?,<,=−?(<)=∑<(log<(−∑<(log=(
Definition of Expectation
Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
The expectation wrt. z (using encoder
network) let us write nice KL terms
Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Decoder network gives .!/0,
can compute estimate of this term
through sampling. (Sampling
differentiable through reparam.
trick. see paper.)
This KL term (between
Gaussians for encoder
and z prior) has nice
closed-form solution!
.!0/intractable (saw
earlier), can’t compute this
KL term LBut we know KL
divergence always >=0.
Maximizing Likelihood
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Putting it all together: maximizing the likelihood lower bound
Make approximate
posterior distribution
close to prior
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+
Putting it all together: maximizing the likelihood lower bound
Sample from ;(<|>)~@(A$,B$)
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
Forward and Backward Passes
Encoder
Q+|5;6
2*3*
X
Decoder
75|+;8
+
2+3+
Putting it all together: maximizing the likelihood lower bound
F5
Sample from !(>|<;C)~@(A%,B%)
Maximize likelihood of
original input being
reconstructed
From CS231n, Fei-FeiLi, Justin Johnson, Serena Yeung
ReparameterizationTrick: Problem
Tutorial on Variational Autoencoders
https://arxiv.org/abs/1606.05908 http://gokererdogan.github.io/2016/07/01/reparameterization-trick/
⬣Problem with respect to the VLB:
updating 6
⬣+~H(+|5;6): need to differentiate
through the sampling process w.r.t 6
(encoder is probabilistic)
ReparameterizationTrick: Solution
⬣Solution: make the randomness
independent of encoder output, making
the encoder deterministic
⬣Gaussian distribution example:
⬣Previously: encoder output =
random variable *~I(2,3)
⬣Now encoder output = distribution
parameter [2,3]
⬣*=2+K∗3,K~I(0,1)
Tutorial on Variational Autoencoders
https://arxiv.org/abs/1606.05908 http://gokererdogan.github.io/2016/07/01/reparameterization-trick/
Interpretability of LatentVector
Kingma& Welling, Auto-Encoding VariationalBayes
D&
D!
Summary
Variational Autoencoders (VAEs)provide a
principled way to perform approximate
maximum likelihood optimization
⬣Requiressome assumptions (e.g.
Gaussian distributions)
Samples are often not as competitive as
GANs
Latent features (learned in an unsupervised
way!) often good for downstream tasks:
⬣Example: World models for reinforcement
learning (Ha et al., 2018)
Ha & Schmidhuber, World Models, 2018