CNN lecture for fundamentals of deep learning

dragonjimut008 17 views 21 slides Aug 31, 2025
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

CNN lecture for fundamentals of deep learning


Slide Content

TTIC 31230, Fundamentals of Deep Learning
DavidMcAllester,April2017
Convolutional Neural Networks | CNNs

Imagenet Classication
1000kindsofobjects.

Speech Recognition
CurrentstateoftheartspeechrecognitionsystemsuseCNNs
(aswellasRNNs).
Computertranscriptionofconversationalspeechnowmatches
theerrorrateofprofessionalhumantranscribers.
(It should be noted that RNNs are also are being used in vi-
sion).

Protein Folding
VeryRecentworkofJinboXu(hereatTTIC)hasusedCNNs
tofoldproteins.
Construct an imageIx;ywherexandyare positions in the
sequence of proteinPandIx;yis the mutual information be-
tween variants at positionxand variants at positionyin the
dierentversionofPindierentspecies.
For proteins with known structure, construct a target image
Dx;ywhereDx;yis the physical distance between the residue
atpositionxandtheresidueatpositiony.
TrainaCNN(resnet)toproduceoutputDx;yfrominputim-
ageIx;y.
Theresultisarevolutioninpredictingthreedimensionalstruc-
turefromaproteinsequence.

Convolution
In Deep learning we use the following denition of (1D) con-
volution.
(xf)(t)=
jfj1
X
j=0
x[t+j]f[j]
This version has the spatial dimensions of the lter reversed
relativetotheclassicaldenition.
The classical denition yieldsfg=gfwhich does not
holdforthedeeplearningdenition.

2D Convnets
1D: Conv(s;f)[t]=
X
j
s[t+j]f[j]
2D: Conv(I;f)[x;y]=
X
j;k
I[x+j;y+k]f[j;k]
Wewillwriteequationsfor1Dconvnets.
For 2D one just replaces the time dimension with two spatial
dimensionsinboththesignalandthelter.

Padding
Conv(x;f)[i]=
jfj1
X
j=0
x[t+j]f[j]
jConv(f;x)j=jxjjfj+1
We typically want the lter to slide o the edge of the signal
tosomeextent.
Thisisdonebypaddingthesignalwithzeros.
Pad(x;2)=(0;0;x[0];x[1];:::;x[jxj1];0;0)

Padding in Numpy
class Pad
def __init__(self, x, p):
...
def forward(self):
x = self.x
p = self.p
s = x.size
self.value = np.zeros(s+2*p)
self.value[p:p+size] = x.value

Incorporating Padding into a Convolution Layer
Typicallyjfjisoddandwedo
Conv(Pad(x;bjfj=2c);f)
Forconveniencewecoulddeneaprocedure
Convp(x;f)Conv(Pad(x;jfj=2);f)
Or,foreciency,wecouldimplementConvpdirectlyasaclass.

Channels
In speech or audition one typically is given a channel vector,
such as the Mel-cepstral coecients, at each time position of
theinput.
The convolution operation also produces a channel vector at
eachtimeposition.
Inthiscasethelterhasshape(T;C2;C1)andwehave
Conv(x;f)[t;c]=
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
Notethatthetimedimensionishandledasbefore.
Padding canbe generalized straightforwardly to handle chan-
nels.

Adding An Activation Function
Eachconvolutionoperationistypicallyfollowedbyanactiva-
tionfunctionnonlinearity.
Relu(Conv(x;f))
Notethattheactivationfunctionisscalar-to-scalarandisap-
pliedtoeachchannelateachtime(orimage)position.

Max Pooling
Pooling merges a segment of lengthpinto a single channel
vector by selecting, for each channel, the maximum value of
thatchanneloverthesegment.
Therearetwoparameters.
pisthesizeoftheregionpooled.
sisthe\stride"|thesizeoftheregionshiftoneachiteration.
MaxPool(x;p;s)[t;c]= max
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1

MaxPooling Handles \Deformation"
Thedeformablepartmodel(DPM):
In DPM the part lters are at a higher spatial resolution ap-
pliedinaregionaroundtheirnominalposition.

Average Pooling
Average pooling is the same as max pooling but takes an av-
erageratherthanamax.
AvePool(x;p;s)[t;c]=
1
p
X
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1

Example
StanfordCS231Network

Convolution with Strides
Insteadofadvancingthelteronetimevalueateachiteration,
itiscommontoadvancethelterbyastrides.
wecanaddastrideparametertotheconvolutionoperation.
Conv(x;f;s)[t;c]=
X
j;c
0
x[st+j;c
0
]f[j;c;c
0
]
jMaxPool(x;p;s)j=b(jxjjfj)=sc+1

Thinking About the Backward Method
Considery=Conv(x;f;s).
y:value[t;c]+=x:value[st+j;c
0
]f:value[j;c;c
0
]
Eachincrementcanbebackpropagatedindependently.
x:grad[st+j;c
0
]+=y:grad[t;c]f:value[j;c;c
0
]
f:grad[j;c;c
0
]+=y:grad[t;c]x:value[st+j;c
0
]
Until someone writes an appropriate compiler, one must still
handcodetheappropriateNumpyorCUDAvectoroperations.

The Backward Method with Minibatching
Forward:
y:value[b;t;c]+=x:value[b;st+j;c
0
]f:value[j;c;c
0
]
Backward:
x:grad[b;st+j;c
0
]+=y:grad[b;t;c]f:value[j;c;c
0
]
f:grad[j;c;c
0
]+=y:grad[t;c]x:value[b;st+j;c
0
]
Notethatthebackpropagationtofsumsoverbothbandt.

Image to Column (Im2C)
Matrix multiplication is a highly optimized operation. Using
more space, convolution can be reduced to matrix multiplica-
tion.
Conv(x;f)[t;c] =
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
=
X
j;c
0
X[t;j;c
0
]f[j;c
0
;c]
X[t;j;c
0
] =x[t+j;c
0
]
Thisusesmorespace,thesamevalueofxisincludedmultiple
times inX. The second line can be computed by a matrix
multiplicationofreshapings.

END
Tags