CNN lecture for fundamentals of deep learning

dragonjimut008 17 views 21 slides Aug 31, 2025

Slide 1 of 21

About This Presentation

Size: 922.46 KB

Language: en

Added: Aug 31, 2025

Slides: 21 pages

Slide Content

TTIC 31230, Fundamentals of Deep Learning
DavidMcAllester,April2017
Convolutional Neural Networks | CNNs

Imagenet Classication
1000kindsofobjects.

Speech Recognition
CurrentstateoftheartspeechrecognitionsystemsuseCNNs
(aswellasRNNs).
Computertranscriptionofconversationalspeechnowmatches
theerrorrateofprofessionalhumantranscribers.
(It should be noted that RNNs are also are being used in vi-
sion).

Protein Folding
VeryRecentworkofJinboXu(hereatTTIC)hasusedCNNs
tofoldproteins.
Construct an imageIx;ywherexandyare positions in the
sequence of proteinPandIx;yis the mutual information be-
tween variants at positionxand variants at positionyin the
dierentversionofPindierentspecies.
For proteins with known structure, construct a target image
Dx;ywhereDx;yis the physical distance between the residue
atpositionxandtheresidueatpositiony.
TrainaCNN(resnet)toproduceoutputDx;yfrominputim-
ageIx;y.
Theresultisarevolutioninpredictingthreedimensionalstruc-
turefromaproteinsequence.

Convolution
In Deep learning we use the following denition of (1D) con-
volution.
(xf)(t)=
jfj1
X
j=0
x[t+j]f[j]
This version has the spatial dimensions of the lter reversed
relativetotheclassicaldenition.
The classical denition yieldsfg=gfwhich does not
holdforthedeeplearningdenition.

2D Convnets
1D: Conv(s;f)[t]=
X
j
s[t+j]f[j]
2D: Conv(I;f)[x;y]=
X
j;k
I[x+j;y+k]f[j;k]
Wewillwriteequationsfor1Dconvnets.
For 2D one just replaces the time dimension with two spatial
dimensionsinboththesignalandthelter.

Padding
Conv(x;f)[i]=
jfj1
X
j=0
x[t+j]f[j]
jConv(f;x)j=jxjjfj+1
We typically want the lter to slide o the edge of the signal
tosomeextent.
Thisisdonebypaddingthesignalwithzeros.
Pad(x;2)=(0;0;x[0];x[1];:::;x[jxj1];0;0)

Padding in Numpy
class Pad
def __init__(self, x, p):
...
def forward(self):
x = self.x
p = self.p
s = x.size
self.value = np.zeros(s+2*p)
self.value[p:p+size] = x.value

Incorporating Padding into a Convolution Layer
Typicallyjfjisoddandwedo
Conv(Pad(x;bjfj=2c);f)
Forconveniencewecoulddeneaprocedure
Convp(x;f)Conv(Pad(x;jfj=2);f)
Or,foreciency,wecouldimplementConvpdirectlyasaclass.

Channels
In speech or audition one typically is given a channel vector,
such as the Mel-cepstral coecients, at each time position of
theinput.
The convolution operation also produces a channel vector at
eachtimeposition.
Inthiscasethelterhasshape(T;C2;C1)andwehave
Conv(x;f)[t;c]=
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
Notethatthetimedimensionishandledasbefore.
Padding canbe generalized straightforwardly to handle chan-
nels.

Adding An Activation Function
Eachconvolutionoperationistypicallyfollowedbyanactiva-
tionfunctionnonlinearity.
Relu(Conv(x;f))
Notethattheactivationfunctionisscalar-to-scalarandisap-
pliedtoeachchannelateachtime(orimage)position.

Max Pooling
Pooling merges a segment of lengthpinto a single channel
vector by selecting, for each channel, the maximum value of
thatchanneloverthesegment.
Therearetwoparameters.
pisthesizeoftheregionpooled.
sisthe\stride"|thesizeoftheregionshiftoneachiteration.
MaxPool(x;p;s)[t;c]= max
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1

MaxPooling Handles \Deformation"
Thedeformablepartmodel(DPM):
In DPM the part lters are at a higher spatial resolution ap-
pliedinaregionaroundtheirnominalposition.

Average Pooling
Average pooling is the same as max pooling but takes an av-
erageratherthanamax.
AvePool(x;p;s)[t;c]=
1
p
X
j2f0;:::;p1g
x[st+j;c]
jMaxPool(x;p;s)j=b(jxjp)=sc+1

Example
StanfordCS231Network

Convolution with Strides
Insteadofadvancingthelteronetimevalueateachiteration,
itiscommontoadvancethelterbyastrides.
wecanaddastrideparametertotheconvolutionoperation.
Conv(x;f;s)[t;c]=
X
j;c
0
x[st+j;c
0
]f[j;c;c
0
]
jMaxPool(x;p;s)j=b(jxjjfj)=sc+1

Thinking About the Backward Method
Considery=Conv(x;f;s).
y:value[t;c]+=x:value[st+j;c
0
]f:value[j;c;c
0
]
Eachincrementcanbebackpropagatedindependently.
x:grad[st+j;c
0
]+=y:grad[t;c]f:value[j;c;c
0
]
f:grad[j;c;c
0
]+=y:grad[t;c]x:value[st+j;c
0
]
Until someone writes an appropriate compiler, one must still
handcodetheappropriateNumpyorCUDAvectoroperations.

The Backward Method with Minibatching
Forward:
y:value[b;t;c]+=x:value[b;st+j;c
0
]f:value[j;c;c
0
]
Backward:
x:grad[b;st+j;c
0
]+=y:grad[b;t;c]f:value[j;c;c
0
]
f:grad[j;c;c
0
]+=y:grad[t;c]x:value[b;st+j;c
0
]
Notethatthebackpropagationtofsumsoverbothbandt.

Image to Column (Im2C)
Matrix multiplication is a highly optimized operation. Using
more space, convolution can be reduced to matrix multiplica-
tion.
Conv(x;f)[t;c] =
X
j;c
0
x[t+j;c
0
]f[j;c
0
;c]
=
X
j;c
0
X[t;j;c
0
]f[j;c
0
;c]
X[t;j;c
0
] =x[t+j;c
0
]
Thisusesmorespace,thesamevalueofxisincludedmultiple
times inX. The second line can be computed by a matrix
multiplicationofreshapings.

CNN lecture for fundamentals of deep learning

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

CNN lecture for fundamentals of deep learning

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx