TTIC 31230, Fundamentals of Deep Learning
DavidMcAllester,April2017
Convolutional Neural Networks | CNNs
Imagenet Classication
1000kindsofobjects.
Speech Recognition
CurrentstateoftheartspeechrecognitionsystemsuseCNNs
(aswellasRNNs).
Computertranscriptionofconversationalspeechnowmatches
theerrorrateofprofessionalhumantranscribers.
(It should be noted that RNNs are also are being used in vi-
sion).
Protein Folding
VeryRecentworkofJinboXu(hereatTTIC)hasusedCNNs
tofoldproteins.
Construct an imageIx;ywherexandyare positions in the
sequence of proteinPandIx;yis the mutual information be-
tween variants at positionxand variants at positionyin the
dierentversionofPindierentspecies.
For proteins with known structure, construct a target image
Dx;ywhereDx;yis the physical distance between the residue
atpositionxandtheresidueatpositiony.
TrainaCNN(resnet)toproduceoutputDx;yfrominputim-
ageIx;y.
Theresultisarevolutioninpredictingthreedimensionalstruc-
turefromaproteinsequence.
Convolution
In Deep learning we use the following denition of (1D) con-
volution.
(xf)(t)=
jfj1
X
j=0
x[t+j]f[j]
This version has the spatial dimensions of the lter reversed
relativetotheclassicaldenition.
The classical denition yieldsfg=gfwhich does not
holdforthedeeplearningdenition.
2D Convnets
1D: Conv(s;f)[t]=
X
j
s[t+j]f[j]
2D: Conv(I;f)[x;y]=
X
j;k
I[x+j;y+k]f[j;k]
Wewillwriteequationsfor1Dconvnets.
For 2D one just replaces the time dimension with two spatial
dimensionsinboththesignalandthelter.
Padding
Conv(x;f)[i]=
jfj1
X
j=0
x[t+j]f[j]
jConv(f;x)j=jxjjfj+1
We typically want the lter to slide o the edge of the signal
tosomeextent.
Thisisdonebypaddingthesignalwithzeros.
Pad(x;2)=(0;0;x[0];x[1];:::;x[jxj1];0;0)
Padding in Numpy
class Pad
def __init__(self, x, p):
...
def forward(self):
x = self.x
p = self.p
s = x.size
self.value = np.zeros(s+2*p)
self.value[p:p+size] = x.value
Incorporating Padding into a Convolution Layer
Typicallyjfjisoddandwedo
Conv(Pad(x;bjfj=2c);f)
Forconveniencewecoulddeneaprocedure
Convp(x;f)Conv(Pad(x;jfj=2);f)
Or,foreciency,wecouldimplementConvpdirectlyasaclass.
Channels
In speech or audition one typically is given a channel vector,
such as the Mel-cepstral coecients, at each time position of
theinput.
The convolution operation also produces a channel vector at
eachtimeposition.
Inthiscasethelterhasshape(T;C2;C1)andwehave
Conv(x;f)[t;c]=
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
Notethatthetimedimensionishandledasbefore.
Padding canbe generalized straightforwardly to handle chan-
nels.
Adding An Activation Function
Eachconvolutionoperationistypicallyfollowedbyanactiva-
tionfunctionnonlinearity.
Relu(Conv(x;f))
Notethattheactivationfunctionisscalar-to-scalarandisap-
pliedtoeachchannelateachtime(orimage)position.
Max Pooling
Pooling merges a segment of lengthpinto a single channel
vector by selecting, for each channel, the maximum value of
thatchanneloverthesegment.
Therearetwoparameters.
pisthesizeoftheregionpooled.
sisthe\stride"|thesizeoftheregionshiftoneachiteration.
MaxPool(x;p;s)[t;c]= max
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1
MaxPooling Handles \Deformation"
Thedeformablepartmodel(DPM):
In DPM the part lters are at a higher spatial resolution ap-
pliedinaregionaroundtheirnominalposition.
Average Pooling
Average pooling is the same as max pooling but takes an av-
erageratherthanamax.
AvePool(x;p;s)[t;c]=
1
p
X
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1
Example
StanfordCS231Network
Convolution with Strides
Insteadofadvancingthelteronetimevalueateachiteration,
itiscommontoadvancethelterbyastrides.
wecanaddastrideparametertotheconvolutionoperation.
Conv(x;f;s)[t;c]=
X
j;c
0
x[st+j;c
0
]f[j;c;c
0
]
jMaxPool(x;p;s)j=b(jxjjfj)=sc+1
Thinking About the Backward Method
Considery=Conv(x;f;s).
y:value[t;c]+=x:value[st+j;c
0
]f:value[j;c;c
0
]
Eachincrementcanbebackpropagatedindependently.
x:grad[st+j;c
0
]+=y:grad[t;c]f:value[j;c;c
0
]
f:grad[j;c;c
0
]+=y:grad[t;c]x:value[st+j;c
0
]
Until someone writes an appropriate compiler, one must still
handcodetheappropriateNumpyorCUDAvectoroperations.
Image to Column (Im2C)
Matrix multiplication is a highly optimized operation. Using
more space, convolution can be reduced to matrix multiplica-
tion.
Conv(x;f)[t;c] =
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
=
X
j;c
0
X[t;j;c
0
]f[j;c
0
;c]
X[t;j;c
0
] =x[t+j;c
0
]
Thisusesmorespace,thesamevalueofxisincludedmultiple
times inX. The second line can be computed by a matrix
multiplicationofreshapings.