deep learning UNIT-1 Introduction Part-1.ppt

shashikanthsana 442 views 37 slides Jul 27, 2024
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

deep learning


Slide Content

UNIT-I
•Introduction
•Feed forward Neural networks
•Gradient descent and the back propagation algorithm
•Unit saturation
•the vanishing gradient problem
•and ways to mitigate it.
•RelU Heuristics for avoiding bad local minima Heuristics
for faster training
•Nestors accelerated gradient descent Regularization
•Dropout

Introduction to Deep Learning
•Deeplearningisasubfieldofartificial
intelligence(AI)andmachinelearningthat
focusesontrainingartificialneuralnetworksto
performtasksthattypicallyrequirehuman
intelligence.
•Ithasgainedwidespreadattentionandmade
significantadvancements in various
applications,includingimagerecognition,natural
languageprocessing,speechrecognition,and
more.

Here are some common types of deep
learning:
Feedforward Neural
Networks (FNNs):
•Thesearethefundamental
buildingblocksofdeep
learning.FNNsconsistofan
inputlayer,oneormorehidden
layers,andanoutputlayer.
•Eachlayercontainsnodes
(neurons)thatprocessand
transformthedata.
•FNNsareusedforvarious
tasks,includingregressionand
classification.
Convolutional Neural
Networks (CNNs):
•CNNs are designed for
processing grid-like data, such
as images and videos.
•They use convolutional layers
to automatically learn features
from local regions of the input,
making them highly effective in
tasks like image classification,
object detection, and image
segmentation.

Common types of deep learning
(contd..)
Recurrent Neural
Networks (RNNs):
•RNNs aredesignedfor
sequentialdata,suchastime
series,text,andspeech.They
havefeedbackconnections,
allowingthemtomaintaina
memoryofpreviousinputs.
•RNNsaresuitablefortasks
like natural language
processing(NLP),machine
translation,and speech
recognition.
Long Short-Term Memory
(LSTM)
•LSTMsareatypeofRNN
architecturedesignedto
capture long-range
dependenciesinsequential
datamoreeffectively.
•Theyusespecializedmemory
cellstostoreandupdate
informationover longer
sequences,makingthem
suitablefortasksrequiring
understandingofcontextover
time.

Common types of deep learning
(contd..)
Gated Recurrent Unit
(GRU):
•GRUs are another variant of
RNNs that address the
vanishing gradient problem,
like LSTMs.
•They are computationally more
efficient and often used for
similar sequence-based tasks
in NLP and speech
recognition.
Autoencoders:
•Autoencodersareneural
networks used for
unsupervisedlearningand
dimensionalityreduction.
•Theyconsistofanencoder
thatmapsinputdatatoa
lower-dimensional
representation(encoding)and
adecoderthatreconstructsthe
originaldatafromthis
encoding.
•Autoencodersareusedin
applicationslikeimage
denoisingand anomaly
detection.

Common types of deep learning
(contd..)
Generative Adversarial
Networks (GANs):
•GANsconsistoftwoneural
networks,ageneratoranda
discriminator,thatcompete
againsteachother.
•Thegeneratortriestocreate
datathatisindistinguishable
fromrealdata,whilethe
discriminatortriestotellreal
fromfake.
•GANsareusedfortaskslike
image generation,style
transfer, and data
augmentation.
Transformer Models:
•Transformers have
revolutionized natural
languageprocessing(NLP)
andhavebeenadaptedto
variousotherdomains.
•Theyuseaself-attention
mechanismtoprocessinput
datainparallel,makingthem
highlyscalableandeffective
for sequence-to-sequence
tasks.
•Notable transformer-based
modelsincludeBERT,GPT
(Generative Pre-trained
Transformer),andT5.

Common types of deep learning
(contd..)
Siamese Networks:
•Thesenetworksaredesigned
fortasksinvolvingsimilarityor
distance measurement
betweenpairsofinputs.
•Siamesenetworkshavetwo
identicalsubnetworksthat
processeachinputand
produceembeddingsthatcan
becomparedtomeasure
similarityordissimilarity.
Capsule Networks
(CapsNets):
•CapsNetsaredesignedto
improvetheshortcomingsof
traditionalCNNs,especiallyin
handlingposevariationsand
hierarchicalfeaturesinimages.
•Theyusecapsulesinsteadof
neuronstorepresentdifferent
partsofanobject.

Feed forward Neural networks
•Deepfeedforwardnetworks,alsocalledfeedforwardneural
networks,ormultilayerperceptrons(MLPs),arethe
quintessentialdeeplearningmodels.
•Thegoalofafeedforwardnetworkistoapproximatesome
functionf

.
•Forexample,foraclassifier,y=f

(x)mapsaninputxtoa
categoryy.
•Afeedforwardnetworkdefinesamappingy=f(x;θ)andlearnsthe
valueoftheparametersθthatresultinthebestfunction
approximation.
Thesemodelsarecalledfeedforwardbecauseinformationflowsthrough
thefunctionbeingevaluatedfromx,throughtheintermediatecomputations
usedtodefinef,andfinallytotheoutputy.Therearenofeedback
connectionsinwhichoutputsofthemodelarefedbackintoitself.When
feedforwardneuralnetworksareextendedtoincludefeedback
connections,theyarecalledrecurrentneuralnetworks

Feed forward Neural networks (Contd.)
•Feedforwardneuralnetworksareoftenreferredtoas"networks"
becausetheyareconstructedbycombiningmultiplefunctions.
•Thesenetworksarerepresentedbyadirectedacyclicgraphthat
illustrateshowthesefunctionsareinterconnected.
•Typically,theyareorganizedinasequentialmanner,withfunctions
likef
(1)
,f
(2)
,andf
(3)
linkedtogetherinachain,forminganoverall
functionf(x)=f
(3)
(f
(2)
(f
(1)
(x))).
•Thesechain-likestructuresarethemostcommonconfigurationfor
neuralnetworks.Inthiscontext,eachfunction,suchasf
(1)
,f
(2)
,etc.,
istermedalayerofthenetwork,withf
(1)
beingthefirstlayer,f
(2)
the
secondlayer,andsoforth.Theyformthehiddenlayers.
•Theoveralllengthofthechaingivesthedepthofthemodel.Thename
“deeplearning”arosefromthisterminology.Thefinallayerofa
feedforwardnetworkiscalledtheoutputlayer.
•Feedforwardnetworksusetheactivationfunctionstocomputethehidden
layervalues.

Example:LearningXOR
•Anexampleofafullyfunctioningfeedforwardnetworkona
verysimpletask:learningtheXORfunction.
•TheXORfunction(“exclusiveor”)isanoperationontwo
binaryvalues,x1andx2.
•Whenexactlyoneofthesebinaryvaluesisequalto1,the
XORfunctionreturns1.Otherwise,itreturns0.
•TheXORfunctionprovidesthetargetfunctiony=f

(x)thatwe
wanttolearn.Ourmodelprovidesafunctiony=f(x;θ),and
ourlearningalgorithmwilladapttheparametersθtomakef
assimilaraspossibletof

We want our network to perform correctly on the four points X = {[0,
0], [0,1],[1,0], and [1,1]}.
We will train the network on all four of these points.
The only challenge is to fit the training set.
Evaluated on our whole training set, the MSE loss function is a
linear model, with θ consisting of w and b.
Our model is defined to be
f (x; w, b) = x T w + b.

Evaluatedonourwholetrainingset,theMSEloss
functionis

To finish computing the value of h for each example, we apply the rectified
linear transformation: In this space, all the examples lie along a line with slope
1. As we move along this line, the output needs to begin at 0, then rise to 1,
then drop back down to 0. A linear model cannot implement such a function.

GRADIENT DESCENT & BACK
PROPAGATION
•Gradientdescentandthebackpropagationalgorithmare
fundamentaltechniquesusedintrainingartificialneural
networksforvariousmachinelearningtasks,including
imagerecognition,naturallanguageprocessing,and
more.
•Gradient Descent:
•Gradientdescentisanoptimizationalgorithmusedto
minimizealossfunctionbyadjustingtheparameters
(weightsandbiases)ofamachinelearningmodel
iteratively.Theideaistofindthesetofparametersthat
minimizestheerrorbetweenthemodel'spredictionsand
theactualtargetvalues.

Here's a simple example of gradient
descent with a linear regression model:
•Objective: Minimize the mean squared error (MSE) loss for a
linear regression model.
•Linear Regression Model: The model has a single
parameter, a weight (w), and a bias (b). It predicts an output
(y_pred) given an input (x) as follows:
•y_pred= w * x + b
•Loss Function: The MSE loss for linear regression is defined
as:
•MSE = (1/n) * Σ(y_i-y_pred_i)^2
•Where:
•n is the number of data points.
•y_iis the actual target for the i-thdata point.
•y_pred_iis the predicted output for the i-thdata point.

Gradient Descent Algorithm:
1.Initialize w and b with random values.
2.Choose a learning rate (α). Which is used to scale the
magnitude of parameter updates during gradient
descent.
3.Repeat until convergence:
1.Calculate the gradient of the loss with respect to w and b.
2.Update w and b using the gradient and learning rate:
3.w = w -α * ∂(MSE)/∂w
4.b = b -α * ∂(MSE)/∂b
5.Repeat the above steps until the loss converges to a minimum
value.
•a

A simple example of gradient
descent using a one-dimensional
function.
•Suppose we want to minimize the
following quadratic function:
•f(x) = x^2
•The goal is to find the minimum value of
this function using gradient descent.

GD
•The gradient is:
•∂f/∂x = 2x
•Update x using the gradient and the learning
rate:
•x = x -α * ∂f/∂x
1.Repeat steps 2 and 3 for a specified
number of iterations or until convergence.
•Let's perform a few iterations of gradient
descent:

Asyoucansee,witheachiteration,x
getscloserto0,whichistheminimum
ofthefunction.
Thisprocesscontinuesuntilthe
convergencecriteriaaremetora
specifiednumberofiterationsare
reached.
Inpractice,gradientdescentisused
tooptimizemorecomplexfunctions
withhigh-dimensionalparameter
spaces,suchastrainingneural
networksindeeplearning.

Back Propagation Algorithm
•Backpropagationisafundamental
algorithmusedfortrainingartificialneural
networks,particularlyfeedforwardneural
networkswithmultiplelayers(alsoknown
asdeepneuralnetworks).
•Itenablesthenetworktolearnfromdata
byiterativelyadjustingitsparameters
(weightsandbiases)tominimizea
predefinedlossorerrorfunction.

Key Concepts in
Backpropagation:
1.FeedforwardPass:Inthefeedforwardpass,inputdatais
propagatedthroughthenetworklayerbylayer,resultinginan
outputprediction.Eachneuroninalayercalculatesaweighted
sumofitsinputs,appliesanactivationfunction,andpassesthe
resulttothenextlayer.
2.LossFunction:Alossfunction(alsoknownasacostfunction)
quantifiestheerrorbetweenthenetwork'spredictionsandthe
actualtargetvalues.Commonlossfunctionsincludemean
squarederror(MSE)forregressiontasksandcross-entropyfor
classificationtasks.
3.BackpropagationofError:Afterthefeedforwardpass,the
networkcomputesthegradientofthelosswithrespecttoits
parameters(weightsandbiases)usingthechainrulefromcalculus.
Thisgradientinformationisthenusedtoupdatetheparameters
duringtheoptimizationprocess.

•4.GradientDescent:Theoptimization
algorithm(usuallygradientdescentorits
variants)adjuststhenetwork's
parametersintheoppositedirectionof
thegradienttominimizetheloss.The
learningratedeterminesthestepsizefor
eachparameterupdate.

Example of Backpropagation:
•Let'sconsidertrainingafeedforwardneural
networkforbinaryclassification.Thenetwork
hasonehiddenlayerwithtwoneuronsand
anoutputlayerwithasingleneuron.We'll
useasimpledatasetoftwo-dimensional
points(x1,x2)andbinarylabels(0or1)for
theexample.Thenetwork'sarchitectureisas
follows:
•Input layer: 2 neurons (corresponding to x1
and x2)
•Hidden layer: 2 neurons (with sigmoid
activation)
•Output layer: 1 neuron (with sigmoid
activation)

Steps in Backpropagation:
•Forward Pass:
•Input (x1, x2) is fed into the network.
•Calculate the weighted sum and apply the
sigmoid activation in the hidden layer.
•Calculate the weighted sum and apply the
sigmoid activation in the output layer.

1.LossCalculation:
1.Computetheloss(e.g.,cross-entropy)betweenthepredicted
outputandtheactualtargetlabel.
2.Backpropagation:
1.Calculatethegradientofthelosswithrespecttotheoutput
layer'sweightedsumandbiases.
2.Backpropagatethisgradienttothehiddenlayerandcompute
gradientsforitsparameters.
3.Usethesegradientstoupdatetheweightsandbiasesinboth
layersusinggradientdescent.
•Repeat:
•Repeat the above steps for a batch of training
examples (mini-batch) and iterate through the entire
dataset for multiple epochs.

Here's a simplified example of a
single training iteration:
•Forward Pass:
•Input (x1, x2) = (1.0, 0.5)
•Hidden layer:
•Weighted sum: z1 = w1 * x1 + w2 * x2 + b1
•Activation: a1 = sigmoid(z1)
•Similar calculations for neuron 2 in the hidden layer.
•Output layer:
•Weighted sum: z2 = w3 * a1 + w4 * a2 + b2
•Activation: a2 = sigmoid(z2)
•Loss Calculation:
•Calculate the cross-entropy loss between the
predicted output a2 and the actual label (0 or 1).

•Backpropagation:
•Compute gradients for output layer parameters (e.g.,
w3, w4, b2).
•Propagate gradients backward to the hidden layer,
compute gradients for its parameters (e.g., w1, w2,
b1).
•Update all weights and biases using gradient
descent.
•This process is repeated for multiple training
iterations until the network's parameters
converge, and the loss reaches a satisfactory
minimum.

UNIT SATURATION
•Unitsaturation,alsoknownassaturationofaneuralunit,isa
phenomenonthatoccurswhentheactivationfunctionofaneuron
reachesextremevalues,typically0or1,andremainstherefor
mostinputvalues.
•Inotherwords,theneuronsaturateswhenitsinputiseithervery
large(positiveornegative)orveryclosetozero,causingthe
outputoftheneurontobecomeinsensitivetofurtherchangesin
input.
•Thiscanposeproblemsduringtrainingbecausethegradients
withrespecttotheweightsmaybecomeverysmall,leadingto
slowconvergenceorvanishinggradients.
•Unitsaturationisoftenassociatedwithactivationfunctionslike
sigmoidandhyperbolictangent(tanh)

•Sigmoid Activation Function: The sigmoid function
is defined as follows:
•σ(x) = 1 / (1 + exp(-x))
•When x is very large (positive or negative), σ(x) approaches
1 or 0, respectively.
•When x is close to 0, σ(x) is approximately 0.5.
•Example of Unit Saturation:
•Consider a neural network with a sigmoid activation
function and a weight (w) connected to a neuron. Let's say
that during training, the network encounters an input value
(x) of 10 for this neuron:
•x = 10

•Now,let'scomputetheoutputoftheneuronusingthesigmoidfunction:
•σ(10)≈0.9999546
•Atthispoint,theneuronhaseffectivelysaturated.Evensmallchangesinworx
maynotsignificantlyaffecttheneuron'soutputbecausetheoutputisalready
closeto1.
•Asaresult:
•Thegradientwithrespecttow(neededforweightupdatesduringtraining)
becomesverysmall,causingslowlearningorconvergenceissues.
•Theneuronisnoteffectivelycontributingtothelearningprocesssinceit
respondssimilarlytolargevariationsininput.
•Inpractice,thisphenomenoncanleadtochallengesintrainingdeepneural
networks,especiallywhenusingactivationfunctionslikesigmoidortanh.To
mitigateunitsaturation,otheractivationfunctionssuchasReLU(Rectified
LinearUnit)orvariantslikeLeakyReLUandParametricReLUareoftenused.
•Theseactivationfunctionsdonotsaturateasquicklyforpositiveinputsand
allowgradientstoflowmoreeffectivelyduringtraining,whichcanleadto
fasterconvergenceandbetterlearning.
Tags