AIS302-Artificial Neural Networks-Spr24-lec3.pdf

twitchprimegaming112 0 views 33 slides Oct 07, 2025
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Nlp


Slide Content

AIS302
Artificial Neural
Networks
Spring 24
Ghada Khoriba
Associate Prof. of Artificial Intelligence
1

Agenda
2
References :
Princeton University COS 495 Instructor: Yingyu Liang
https://www.deeplearningbook.org/
Algorithmic Intelligence Laboratory, alinlab.kaist.ac.kr
•Recap
•RNN, Recurrent neural networks

Recurrentneuralnetworks
•Datesbackto(Rumelhartetal.,1986)
•Afamilyofneuralnetworksforhandlingsequentialdata,which
involvesvariable lengthinputsor outputs
•Especially,fornaturallanguageprocessing(NLP)

Sequentialdata
•Eachdatapoint:Asequenceof vectors!("),for1≤$≤%
•Batchdata:manysequenceswithdifferentlengths%
•Label:canbea scalar,a vector,orevena sequence
•Example
•Sentimentanalysis
•Machinetranslation

Example:machinetranslation

Morecomplicatedsequential data
•Datapoint:twodimensionalsequenceslikeimages
•Label:differenttypeofsequencesliketextsentences
•Example:image captioning

Imagecaptioning
Figurefrom thepaper“DenseCap:FullyConvolutionalLocalizationNetworksforDenseCaptioning”,
by JustinJohnson,AndrejKarpathy,LiFei-Fei

4
Sequential Data Problems
Fixed-sized
input
to fixed-sized
output
(e.g. image
classification)
Sequence output
(e.g. image captioning
takes an image and
outputs a sentence of
words).
Sequence input(e.g.
sentiment analysis
where a given sentence
is classified as
expressing positive or
negative sentiment).
Sequence input and
sequence output(e.g.
Machine Translation: an
RNN reads a sentence in
English and then outputs
a sentence in French)
Synced sequence input
and output (e.g. video
classification where we
wish to label each frame
of the video)
Credits: Andrej Karpathy

Atypicaldynamicsystem
!("+1)=#(!";&)
FigurefromDeepLearning,
Goodfellow,BengioandCourville

Asystem driven byexternaldata
!("+1)=#(!",)("+1);&)

Compactview
!("+1)=#(!",)("+1);&)

Compactview
!("+1)=#(!",)("+1);&)
Key:thesame&and '
for alltimesteps
square:onesteptimedelay

Recurrentneuralnetworks
•Usethesamecomputationalfunctionandparametersacrossdifferent
timestepsofthesequence.
•Eachtimestep:takestheinputentryandtheprevioushiddenstateto
computethe outputentry
•Loss:typicallycomputedeverytimestep

Recurrentneuralnetworks
Label
Loss
Output
State
Input

Recurrentneuralnetworks
Mathformula:

Advantage
•Hiddenstate:alossysummaryof thepast
•Sharedfunctionsandparameters:greatlyreducethecapacity and
goodforgeneralizationinlearning
•Explicitly usethepriorknowledgethatthesequentialdatacanbe
processedbyinthesamewayatdifferenttimestep(e.g.,NLP)

Advantage
•Hiddenstate:alossysummaryof thepast
•Sharedfunctionsandparameters:greatlyreducethecapacityand
goodforgeneralizationinlearning
•Explicitly usethepriorknowledgethatthesequentialdatacanbe
processedbyinthesamewayatdifferenttimestep(e.g.,NLP)

TrainingRNN
•Principle:unfoldthecomputationalgraphandusebackpropagation
•Calledback-propagationthroughtime(BPTT)algorithm
•Canthenapplyanygeneral-purposegradient-basedtechniques

TrainingRNN
•Principle:unfoldthecomputationalgraph,andusebackpropagation
•Calledback-propagationthroughtime(BPTT)algorithm
•Canthenapplyanygeneral-purposegradient-basedtechniques
•Conceptually:firstcomputethegradientsoftheinternalnodes,then
computethe gradientsofthe parameters

Recurrentneuralnetworks
Mathformula:
FigurefromDeepLearning,
Goodfellow,BengioandCourville

Recurrentneuralnetworks
Gradientat!("):(total
lossissumofthoseat
differenttimesteps)
FigurefromDeepLearning,
Goodfellow,BengioandCourville

Recurrentneuralnetworks
Gradient at"("):

Recurrentneuralnetworks
Gradientat#($):
is the gradient of the loss L with
respect to the output O at time
step τ
gradients are propagated backwards through time to update the parameters
of the RNN. This allows the RNN to learn temporal dependencies from
sequences of data.

Recurrentneuralnetworks
Gradientat #("):

Recurrentneuralnetworks
Gradientatparameter$:

RNN
•Usethesamecomputationalfunctionandparametersacrossdifferent
timestepsofthesequence
•Eachtimestep:takestheinputentryandtheprevioushiddenstateto
computethe outputentry
•Loss:typicallycomputedeverytimestep
•Manyvariants
•Informationaboutthepastcan beinmany otherforms
•Onlyoutputattheendof thesequence

Example:onlyoutputat the
end

BidirectionalRNNs
•Manyapplications:outputattime!maydependonthewholeinput
sequence
•Exampleinspeechrecognition:correctinterpretationofthecurrent
soundmaydependonthenextfewphonemes,potentiallyeventhe
nextfewwords
•BidirectionalRNNsareintroducedtoaddressthis

BiRNNs
Tags