Derivation of the gradient descent rule.

2217004 304 views 38 slides May 03, 2024

Slide 1 of 38

About This Presentation

A topic in neural network based on machine learning

Size: 673.72 KB

Language: en

Added: May 03, 2024

Slides: 38 pages

Slide Content

DERIVATION OF THE GRADIENT
DESCENT RULE

•To calculate the direction of steepest descent
along the error surface:
•This direction can be found by computing the
derivative of E with respect to each
component of the vector ??????.
•This vector derivative is called the gradient of
E with respect to ?????? written ????????????(??????)

•Since the gradient specifies the direction of
steepest increase of E, the training rule for
gradient descent is

•training rule can also be written in its
component form

So final standard GRADIENT DESCENT

STOCHASTIC APPROXIMATION TO
GRADIENT DESCENT
•Gradient descent is a strategy for searching
through a large or infinite hypothesis space
that can be applied whenever
(1) the hypothesis space contains continuously
parameterized hypotheses
(2) the error can be differentiated with respect
to these hypothesis parameters.

•The key practical difficulties in applying
gradient descent are
(1)converging to a local minimum can
sometimes be quite slow
(2)if there are multiple local minima in the error
surface, then there is no guarantee that the
procedure will find the global minimum.

•One common variation on gradient descent
intended to alleviate these difficulties is called
incremental gradient descent, or alternatively
stochastic gradient descent.

•Whereas the standard gradient descent
training rule presented in Equation computes
weight updates after summing over all the
training examples in D, the idea behind
stochastic gradient descent is to approximate
this gradient descent search by updating
weights incrementally, following the
calculation of the error for each individual
example.

MULTILAYER NETWORKS AND
THE BACKPROPAGATION ALGORITHM
•single perceptrons can only express linear
decision surfaces.

A Differentiable Threshold Unit

The sigmoid threshold unit

•the sigmoid unit computes its output o as

The BACKPROPAGATIAOIN Algorithm
•we begin by redefining E to sum the errors
over all of the network output units:

ADDING MOMENTUM

LEARNING IN ARBITRARY ACYCLIC
NETWORKS

Derivation of the
BACKPROPAGATION Rule
•The specific problem we address here is
deriving the stochastic gradient descent rule
implemented by the algorithm

•Stochastic gradient descent involves iterating
through the training examples one at a time,
for each training example d descending the
gradient of the error E
d with respect to this
single example.

•In other words, for each training example d
every weight w
ji is updated by adding to it
??????w
ji

1.
•To begin, notice that weight w
ji can influence
the rest of the network only through net
j.

Case 1: Training Rule for Output Unit
Weights.
•net
j can influence the network only through o
j

•So,

•Finally, we have the stochastic gradient
descent rule for output units

Case 2: Training Rule for Hidden Unit
Weights

•net
j can influence the network outputs (and
therefore E
d) only through the units in
Downstream(j).

Derivation of the gradient descent rule.

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Derivation of the gradient descent rule.

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 4

Slide 5

Slide 7

Slide 8

Slide 9

Slide 10

Slide 12

Slide 14

Slide 15

Slide 16

Slide 17

Slide 19

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 27

Slide 28

Slide 32

Slide 33

Slide 34

Slide 35

Slide 38

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......