deep learning UNIT-1 Introduction Part-1.ppt

UNIT-I
•Introduction
•Feed forward Neural networks
•Gradient descent and the back propagation algorithm
•Unit saturation
•the vanishing gradient problem
•and ways to mitigate it.
•RelU Heuristics for avoiding bad local minima Heuristics
for faster training
•Nestors accelerated gradient descent Regularization
•Dropout

Introduction to Deep Learning
•Deeplearningisasubfieldofartificial
intelligence(AI)andmachinelearningthat
focusesontrainingartificialneuralnetworksto
performtasksthattypicallyrequirehuman
intelligence.
•Ithasgainedwidespreadattentionandmade
significantadvancements in various
applications,includingimagerecognition,natural
languageprocessing,speechrecognition,and
more.

Here are some common types of deep
learning:
Feedforward Neural
Networks (FNNs):
•Thesearethefundamental
buildingblocksofdeep
learning.FNNsconsistofan
inputlayer,oneormorehidden
layers,andanoutputlayer.
•Eachlayercontainsnodes
(neurons)thatprocessand
transformthedata.
•FNNsareusedforvarious
tasks,includingregressionand
classification.
Convolutional Neural
Networks (CNNs):
•CNNs are designed for
processing grid-like data, such
as images and videos.
•They use convolutional layers
to automatically learn features
from local regions of the input,
making them highly effective in
tasks like image classification,
object detection, and image
segmentation.

Common types of deep learning
(contd..)
Recurrent Neural
Networks (RNNs):
•RNNs aredesignedfor
sequentialdata,suchastime
series,text,andspeech.They
havefeedbackconnections,
allowingthemtomaintaina
memoryofpreviousinputs.
•RNNsaresuitablefortasks
like natural language
processing(NLP),machine
translation,and speech
recognition.
Long Short-Term Memory
(LSTM)
•LSTMsareatypeofRNN
architecturedesignedto
capture long-range
dependenciesinsequential
datamoreeffectively.
•Theyusespecializedmemory
cellstostoreandupdate
informationover longer
sequences,makingthem
suitablefortasksrequiring
understandingofcontextover
time.

Common types of deep learning
(contd..)
Gated Recurrent Unit
(GRU):
•GRUs are another variant of
RNNs that address the
vanishing gradient problem,
like LSTMs.
•They are computationally more
efficient and often used for
similar sequence-based tasks
in NLP and speech
recognition.
Autoencoders:
•Autoencodersareneural
networks used for
unsupervisedlearningand
dimensionalityreduction.
•Theyconsistofanencoder
thatmapsinputdatatoa
lower-dimensional
representation(encoding)and
adecoderthatreconstructsthe
originaldatafromthis
encoding.
•Autoencodersareusedin
applicationslikeimage
denoisingand anomaly
detection.

Common types of deep learning
(contd..)
Generative Adversarial
Networks (GANs):
•GANsconsistoftwoneural
networks,ageneratoranda
discriminator,thatcompete
againsteachother.
•Thegeneratortriestocreate
datathatisindistinguishable
fromrealdata,whilethe
discriminatortriestotellreal
fromfake.
•GANsareusedfortaskslike
image generation,style
transfer, and data
augmentation.
Transformer Models:
•Transformers have
revolutionized natural
languageprocessing(NLP)
andhavebeenadaptedto
variousotherdomains.
•Theyuseaself-attention
mechanismtoprocessinput
datainparallel,makingthem
highlyscalableandeffective
for sequence-to-sequence
tasks.
•Notable transformer-based
modelsincludeBERT,GPT
(Generative Pre-trained
Transformer),andT5.

Common types of deep learning
(contd..)
Siamese Networks:
•Thesenetworksaredesigned
fortasksinvolvingsimilarityor
distance measurement
betweenpairsofinputs.
•Siamesenetworkshavetwo
identicalsubnetworksthat
processeachinputand
produceembeddingsthatcan
becomparedtomeasure
similarityordissimilarity.
Capsule Networks
(CapsNets):
•CapsNetsaredesignedto
improvetheshortcomingsof
traditionalCNNs,especiallyin
handlingposevariationsand
hierarchicalfeaturesinimages.
•Theyusecapsulesinsteadof
neuronstorepresentdifferent
partsofanobject.

Feed forward Neural networks
•Deepfeedforwardnetworks,alsocalledfeedforwardneural
networks,ormultilayerperceptrons(MLPs),arethe
quintessentialdeeplearningmodels.
•Thegoalofafeedforwardnetworkistoapproximatesome
functionf
∗
.
•Forexample,foraclassiﬁer,y=f
∗
(x)mapsaninputxtoa
categoryy.
•Afeedforwardnetworkdeﬁnesamappingy=f(x;θ)andlearnsthe
valueoftheparametersθthatresultinthebestfunction
approximation.
Thesemodelsarecalledfeedforwardbecauseinformationﬂowsthrough
thefunctionbeingevaluatedfromx,throughtheintermediatecomputations
usedtodeﬁnef,andﬁnallytotheoutputy.Therearenofeedback
connectionsinwhichoutputsofthemodelarefedbackintoitself.When
feedforwardneuralnetworksareextendedtoincludefeedback
connections,theyarecalledrecurrentneuralnetworks

Feed forward Neural networks (Contd.)
•Feedforwardneuralnetworksareoftenreferredtoas"networks"
becausetheyareconstructedbycombiningmultiplefunctions.
•Thesenetworksarerepresentedbyadirectedacyclicgraphthat
illustrateshowthesefunctionsareinterconnected.
•Typically,theyareorganizedinasequentialmanner,withfunctions
likef
(1)
,f
(2)
,andf
(3)
linkedtogetherinachain,forminganoverall
functionf(x)=f
(3)
(f
(2)
(f
(1)
(x))).
•Thesechain-likestructuresarethemostcommonconfigurationfor
neuralnetworks.Inthiscontext,eachfunction,suchasf
(1)
,f
(2)
,etc.,
istermedalayerofthenetwork,withf
(1)
beingthefirstlayer,f
(2)
the
secondlayer,andsoforth.Theyformthehiddenlayers.
•Theoveralllengthofthechaingivesthedepthofthemodel.Thename
“deeplearning”arosefromthisterminology.Theﬁnallayerofa
feedforwardnetworkiscalledtheoutputlayer.
•Feedforwardnetworksusetheactivationfunctionstocomputethehidden
layervalues.

Example:LearningXOR
•Anexampleofafullyfunctioningfeedforwardnetworkona
verysimpletask:learningtheXORfunction.
•TheXORfunction(“exclusiveor”)isanoperationontwo
binaryvalues,x1andx2.
•Whenexactlyoneofthesebinaryvaluesisequalto1,the
XORfunctionreturns1.Otherwise,itreturns0.
•TheXORfunctionprovidesthetargetfunctiony=f
∗
(x)thatwe
wanttolearn.Ourmodelprovidesafunctiony=f(x;θ),and
ourlearningalgorithmwilladapttheparametersθtomakef
assimilaraspossibletof
∗

We want our network to perform correctly on the four points X = {[0,
0], [0,1],[1,0], and [1,1]}.
We will train the network on all four of these points.
The only challenge is to fit the training set.
Evaluated on our whole training set, the MSE loss function is a
linear model, with θ consisting of w and b.
Our model is defined to be
f (x; w, b) = x T w + b.

Evaluatedonourwholetrainingset,theMSEloss
functionis

To ﬁnish computing the value of h for each example, we apply the rectiﬁed
linear transformation: In this space, all the examples lie along a line with slope
1. As we move along this line, the output needs to begin at 0, then rise to 1,
then drop back down to 0. A linear model cannot implement such a function.

GRADIENT DESCENT & BACK
PROPAGATION
•Gradientdescentandthebackpropagationalgorithmare
fundamentaltechniquesusedintrainingartificialneural
networksforvariousmachinelearningtasks,including
imagerecognition,naturallanguageprocessing,and
more.
•Gradient Descent:
•Gradientdescentisanoptimizationalgorithmusedto
minimizealossfunctionbyadjustingtheparameters
(weightsandbiases)ofamachinelearningmodel
iteratively.Theideaistofindthesetofparametersthat
minimizestheerrorbetweenthemodel'spredictionsand
theactualtargetvalues.

Here's a simple example of gradient
descent with a linear regression model:
•Objective: Minimize the mean squared error (MSE) loss for a
linear regression model.
•Linear Regression Model: The model has a single
parameter, a weight (w), and a bias (b). It predicts an output
(y_pred) given an input (x) as follows:
•y_pred= w * x + b
•Loss Function: The MSE loss for linear regression is defined
as:
•MSE = (1/n) * Σ(y_i-y_pred_i)^2
•Where:
•n is the number of data points.
•y_iis the actual target for the i-thdata point.
•y_pred_iis the predicted output for the i-thdata point.

Gradient Descent Algorithm:
1.Initialize w and b with random values.
2.Choose a learning rate (α). Which is used to scale the
magnitude of parameter updates during gradient
descent.
3.Repeat until convergence:
1.Calculate the gradient of the loss with respect to w and b.
2.Update w and b using the gradient and learning rate:
3.w = w -α * ∂(MSE)/∂w
4.b = b -α * ∂(MSE)/∂b
5.Repeat the above steps until the loss converges to a minimum
value.
•a

A simple example of gradient
descent using a one-dimensional
function.
•Suppose we want to minimize the
following quadratic function:
•f(x) = x^2
•The goal is to find the minimum value of
this function using gradient descent.

GD
•The gradient is:
•∂f/∂x = 2x
•Update x using the gradient and the learning
rate:
•x = x -α * ∂f/∂x
1.Repeat steps 2 and 3 for a specified
number of iterations or until convergence.
•Let's perform a few iterations of gradient
descent:

Asyoucansee,witheachiteration,x
getscloserto0,whichistheminimum
ofthefunction.
Thisprocesscontinuesuntilthe
convergencecriteriaaremetora
specifiednumberofiterationsare
reached.
Inpractice,gradientdescentisused
tooptimizemorecomplexfunctions
withhigh-dimensionalparameter
spaces,suchastrainingneural
networksindeeplearning.

Back Propagation Algorithm
•Backpropagationisafundamental
algorithmusedfortrainingartificialneural
networks,particularlyfeedforwardneural
networkswithmultiplelayers(alsoknown
asdeepneuralnetworks).
•Itenablesthenetworktolearnfromdata
byiterativelyadjustingitsparameters
(weightsandbiases)tominimizea
predefinedlossorerrorfunction.

Key Concepts in
Backpropagation:
1.FeedforwardPass:Inthefeedforwardpass,inputdatais
propagatedthroughthenetworklayerbylayer,resultinginan
outputprediction.Eachneuroninalayercalculatesaweighted
sumofitsinputs,appliesanactivationfunction,andpassesthe
resulttothenextlayer.
2.LossFunction:Alossfunction(alsoknownasacostfunction)
quantifiestheerrorbetweenthenetwork'spredictionsandthe
actualtargetvalues.Commonlossfunctionsincludemean
squarederror(MSE)forregressiontasksandcross-entropyfor
classificationtasks.
3.BackpropagationofError:Afterthefeedforwardpass,the
networkcomputesthegradientofthelosswithrespecttoits
parameters(weightsandbiases)usingthechainrulefromcalculus.
Thisgradientinformationisthenusedtoupdatetheparameters
duringtheoptimizationprocess.

•4.GradientDescent:Theoptimization
algorithm(usuallygradientdescentorits
variants)adjuststhenetwork's
parametersintheoppositedirectionof
thegradienttominimizetheloss.The
learningratedeterminesthestepsizefor
eachparameterupdate.

Example of Backpropagation:
•Let'sconsidertrainingafeedforwardneural
networkforbinaryclassification.Thenetwork
hasonehiddenlayerwithtwoneuronsand
anoutputlayerwithasingleneuron.We'll
useasimpledatasetoftwo-dimensional
points(x1,x2)andbinarylabels(0or1)for
theexample.Thenetwork'sarchitectureisas
follows:
•Input layer: 2 neurons (corresponding to x1
and x2)
•Hidden layer: 2 neurons (with sigmoid
activation)
•Output layer: 1 neuron (with sigmoid
activation)

Steps in Backpropagation:
•Forward Pass:
•Input (x1, x2) is fed into the network.
•Calculate the weighted sum and apply the
sigmoid activation in the hidden layer.
•Calculate the weighted sum and apply the
sigmoid activation in the output layer.

1.LossCalculation:
1.Computetheloss(e.g.,cross-entropy)betweenthepredicted
outputandtheactualtargetlabel.
2.Backpropagation:
1.Calculatethegradientofthelosswithrespecttotheoutput
layer'sweightedsumandbiases.
2.Backpropagatethisgradienttothehiddenlayerandcompute
gradientsforitsparameters.
3.Usethesegradientstoupdatetheweightsandbiasesinboth
layersusinggradientdescent.
•Repeat:
•Repeat the above steps for a batch of training
examples (mini-batch) and iterate through the entire
dataset for multiple epochs.

Here's a simplified example of a
single training iteration:
•Forward Pass:
•Input (x1, x2) = (1.0, 0.5)
•Hidden layer:
•Weighted sum: z1 = w1 * x1 + w2 * x2 + b1
•Activation: a1 = sigmoid(z1)
•Similar calculations for neuron 2 in the hidden layer.
•Output layer:
•Weighted sum: z2 = w3 * a1 + w4 * a2 + b2
•Activation: a2 = sigmoid(z2)
•Loss Calculation:
•Calculate the cross-entropy loss between the
predicted output a2 and the actual label (0 or 1).

•Backpropagation:
•Compute gradients for output layer parameters (e.g.,
w3, w4, b2).
•Propagate gradients backward to the hidden layer,
compute gradients for its parameters (e.g., w1, w2,
b1).
•Update all weights and biases using gradient
descent.
•This process is repeated for multiple training
iterations until the network's parameters
converge, and the loss reaches a satisfactory
minimum.

UNIT SATURATION
•Unitsaturation,alsoknownassaturationofaneuralunit,isa
phenomenonthatoccurswhentheactivationfunctionofaneuron
reachesextremevalues,typically0or1,andremainstherefor
mostinputvalues.
•Inotherwords,theneuronsaturateswhenitsinputiseithervery
large(positiveornegative)orveryclosetozero,causingthe
outputoftheneurontobecomeinsensitivetofurtherchangesin
input.
•Thiscanposeproblemsduringtrainingbecausethegradients
withrespecttotheweightsmaybecomeverysmall,leadingto
slowconvergenceorvanishinggradients.
•Unitsaturationisoftenassociatedwithactivationfunctionslike
sigmoidandhyperbolictangent(tanh)

•Sigmoid Activation Function: The sigmoid function
is defined as follows:
•σ(x) = 1 / (1 + exp(-x))
•When x is very large (positive or negative), σ(x) approaches
1 or 0, respectively.
•When x is close to 0, σ(x) is approximately 0.5.
•Example of Unit Saturation:
•Consider a neural network with a sigmoid activation
function and a weight (w) connected to a neuron. Let's say
that during training, the network encounters an input value
(x) of 10 for this neuron:
•x = 10

•Now,let'scomputetheoutputoftheneuronusingthesigmoidfunction:
•σ(10)≈0.9999546
•Atthispoint,theneuronhaseffectivelysaturated.Evensmallchangesinworx
maynotsignificantlyaffecttheneuron'soutputbecausetheoutputisalready
closeto1.
•Asaresult:
•Thegradientwithrespecttow(neededforweightupdatesduringtraining)
becomesverysmall,causingslowlearningorconvergenceissues.
•Theneuronisnoteffectivelycontributingtothelearningprocesssinceit
respondssimilarlytolargevariationsininput.
•Inpractice,thisphenomenoncanleadtochallengesintrainingdeepneural
networks,especiallywhenusingactivationfunctionslikesigmoidortanh.To
mitigateunitsaturation,otheractivationfunctionssuchasReLU(Rectified
LinearUnit)orvariantslikeLeakyReLUandParametricReLUareoftenused.
•Theseactivationfunctionsdonotsaturateasquicklyforpositiveinputsand
allowgradientstoflowmoreeffectivelyduringtraining,whichcanleadto
fasterconvergenceandbetterlearning.

deep learning UNIT-1 Introduction Part-1.ppt

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

deep learning UNIT-1 Introduction Part-1.ppt

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 11

Slide 12

Slide 14

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 24

Slide 25

Slide 26

Slide 27

Slide 29

Slide 30

Slide 31

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx