0321204662_lec07_2.pptbklllllmbklgbkhnjfv

ShubmanGill 18 views 27 slides Aug 13, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Fmlllombjim hllnc klm vnds. K xxkkznhon um u


Slide Content

© Negnevitsky, Pearson Education, 2005 1
Lecture 7 Lecture 7
Artificial neural networks: Artificial neural networks:
Supervised learningSupervised learning
Introduction, or how the brain worksIntroduction, or how the brain works
The neuron as a simple computing elementThe neuron as a simple computing element
The perceptronThe perceptron
Multilayer neural networksMultilayer neural networks
Accelerated learning in multilayer neural networksAccelerated learning in multilayer neural networks
The Hopfield networkThe Hopfield network
Bidirectional associative memories (BAM)Bidirectional associative memories (BAM)
SummarySummary

© Negnevitsky, Pearson Education, 2005 2
Accelerated learning in multilayer Accelerated learning in multilayer
neural networksneural networks
A multilayer network learns much faster when the A multilayer network learns much faster when the
sigmoidal activation function is represented by a sigmoidal activation function is represented by a
hyperbolic tangenthyperbolic tangent::
where where a a and and bb are constants.are constants.
Suitable values for Suitable values for a a and and b b are: are:
a a = 1.716 and = 1.716 and b b = 0.667= 0.667
a
e
a
Y
bX
htan




1
2

© Negnevitsky, Pearson Education, 2005 3
We also can accelerate training by including a We also can accelerate training by including a
momentum term momentum term in the delta rule:in the delta rule:
where where is a positive number (0 is a positive number (0 1) called 1) called
the the momentum constantmomentum constant. Typically, the . Typically, the
momentum constant is set to 0.95. momentum constant is set to 0.95.
This equation is called the This equation is called the generalised delta rulegeneralised delta rule..
)()()1()( ppypwpw
kjjkjk 

© Negnevitsky, Pearson Education, 2005 4
Learning with momentum for operation Learning with momentum for operation Exclusive-ORExclusive-OR
0 20 40 60 80 100 120
10
-4
10
-2
10
0
10
2
Epoch
Trainingfor126Epochs
0 100 140
-1
-0.5
0
0.5
1
1.5
Epoch
10
-3
10
1
10
-1
20 40 60 80 120
L
e
a
r
n
i
n
g

R
a
t
e

© Negnevitsky, Pearson Education, 2005 5
Learning with adaptive learning rateLearning with adaptive learning rate
To accelerate the convergence and yet avoid the To accelerate the convergence and yet avoid the
danger of instability, we can apply two heuristics:danger of instability, we can apply two heuristics:
Heuristic 1Heuristic 1
If the change of the sum of squared errors has the If the change of the sum of squared errors has the
same algebraic sign for several consequent epochs, same algebraic sign for several consequent epochs,
then the learning rate parameter, then the learning rate parameter, , should be , should be
increased.increased.
Heuristic 2Heuristic 2
If the algebraic sign of the change of the sum of If the algebraic sign of the change of the sum of
squared errors alternates for several consequent squared errors alternates for several consequent
epochs, then the learning rate parameter, epochs, then the learning rate parameter, , should be , should be
decreased.decreased.

© Negnevitsky, Pearson Education, 2005 6
Adapting the learning rate requires some changes Adapting the learning rate requires some changes
in the back-propagation algorithm. in the back-propagation algorithm.
If the sum of squared errors at the current epoch If the sum of squared errors at the current epoch
exceeds the previous value by more than a exceeds the previous value by more than a
predefined ratio (typically 1.04), the learning rate predefined ratio (typically 1.04), the learning rate
parameter is decreased (typically by multiplying parameter is decreased (typically by multiplying
by 0.7) and new weights and thresholds are by 0.7) and new weights and thresholds are
calculated.calculated.
If the error is less than the previous one, the If the error is less than the previous one, the
learning rate is increased (typically by multiplying learning rate is increased (typically by multiplying
by 1.05).by 1.05).

© Negnevitsky, Pearson Education, 2005 7
Learning with adaptive learning rateLearning with adaptive learning rate
0 10 20 30 40 50 60 70 80 90 100
Epoch
Training for103Epochs
0 20 40 60 80 100 120
0
0.2
0.4
0.6
0.8
1
Epoch
10
-4
10
-2
10
0
10
2
10
-3
10
1
10
-1
S
u
m
-
S
q
u
a
r
e
d

S
u
m
-
S
q
u
a
r
e
d

E
r
r
o
E
r
r
o
L
e
a
r
n
i
n
g

L
e
a
r
n
i
n
g

R
a
t
e
R
a
t
e

© Negnevitsky, Pearson Education, 2005 8
Learning with momentum and adaptive learning rateLearning with momentum and adaptive learning rate
0 10 20 30 40 50 60 70 80
Epoch
Trainingfor85Epochs
0 10 20 30 40 50 60 70 80 90
0
0.5
1
2.5
Epoch
10
-4
10
-2
10
0
10
2
10
-3
10
1
10
-1
1.5
2
S
u
m
-
S
q
u
a
r
e
d

S
u
m
-
S
q
u
a
r
e
d

E
r
r
o
E
r
r
o
L
e
a
r
n
i
n
g

L
e
a
r
n
i
n
g

R
a
t
e
R
a
t
e

© Negnevitsky, Pearson Education, 2005 9
The Hopfield NetworkThe Hopfield Network
Neural networks were designed on analogy with Neural networks were designed on analogy with
the brain. The brain’s memory, however, works the brain. The brain’s memory, however, works
by association. For example, we can recognise a by association. For example, we can recognise a
familiar face even in an unfamiliar environment familiar face even in an unfamiliar environment
within 100-200 ms. We can also recall a within 100-200 ms. We can also recall a
complete sensory experience, including sounds complete sensory experience, including sounds
and scenes, when we hear only a few bars of and scenes, when we hear only a few bars of
music. The brain routinely associates one thing music. The brain routinely associates one thing
with another.with another.

© Negnevitsky, Pearson Education, 2005 10
Multilayer neural networks trained with the back-Multilayer neural networks trained with the back-
propagation algorithm are used for pattern propagation algorithm are used for pattern
recognition problems. However, to emulate the recognition problems. However, to emulate the
human memory’s associative characteristics we human memory’s associative characteristics we
need a different type of network: a need a different type of network: a recurrent recurrent
neural networkneural network..
A recurrent neural network has feedback loops A recurrent neural network has feedback loops
from its outputs to its inputs. The presence of from its outputs to its inputs. The presence of
such loops has a profound impact on the learning such loops has a profound impact on the learning
capability of the network.capability of the network.

© Negnevitsky, Pearson Education, 2005 11
The stability of recurrent networks intrigued The stability of recurrent networks intrigued
several researchers in the 1960s and 1970s. several researchers in the 1960s and 1970s.
However, none was able to predict which network However, none was able to predict which network
would be stable, and some researchers were would be stable, and some researchers were
pessimistic about finding a solution at all. The pessimistic about finding a solution at all. The
problem was solved only in 1982, when problem was solved only in 1982, when John John
Hopfield Hopfield formulated the physical principle of formulated the physical principle of
storing information in a dynamically stable storing information in a dynamically stable
network.network.

© Negnevitsky, Pearson Education, 2005 12
Single-layer Single-layer nn-neuron Hopfield network-neuron Hopfield network
xi
x1
x
2
x
n
y
i
y1
y
2
y
n
1
2
i
n
I
n
p
u

t


S
i
g
n
a
l
s
I
n
p
u
t

S
i
g
n

a
l
s
O
u

t
p
u
t

S

i
g
n

a

l

s
O
u
t
p
u
t
S
i
g
n

a

l

s

© Negnevitsky, Pearson Education, 2005 13
The Hopfield network uses McCulloch and Pitts The Hopfield network uses McCulloch and Pitts
neurons with the neurons with the sign activation function sign activation function as its as its
computing element:computing element:









XY
X
X
Y
sign
if,
if,1
0if,1

© Negnevitsky, Pearson Education, 2005 14
The current state of the Hopfield network is The current state of the Hopfield network is
determined by the current outputs of all neurons, determined by the current outputs of all neurons, yy
11, ,
yy
22, . . ., , . . ., yy
nn..
Thus, for a single-layer Thus, for a single-layer nn-neuron network, the state -neuron network, the state
can be defined by the can be defined by the state vector state vector as:as:















ny
y
y
2
1
Y

© Negnevitsky, Pearson Education, 2005 15
In the Hopfield network, synaptic weights between In the Hopfield network, synaptic weights between
neurons are usually represented in matrix form as neurons are usually represented in matrix form as
follows:follows:
where where MM is the number of states to be memorised is the number of states to be memorised
by the network, Yby the network, Y
mm is the n-dimensional binary is the n-dimensional binary
vector, I is vector, I is nn  nn identity matrix, and superscript identity matrix, and superscript T T
denotes matrix transposition.denotes matrix transposition.
IYYW M
M
m
T
mm

1

© Negnevitsky, Pearson Education, 2005 16
Possible states for the three-neuron Possible states for the three-neuron
Hopfield networkHopfield network
y
1
y
2
y
3
(1,1,1)(1,1,1)
(1,1,1) (1,1,1)
(1,1,1)(1,1,1)
(1,1,1)(1,1,1)
0

© Negnevitsky, Pearson Education, 2005 17
The stable state-vertex is determined by the weight The stable state-vertex is determined by the weight
matrix matrix WW, the current input vector , the current input vector XX, and the , and the
threshold matrix threshold matrix . If the input vector is partially . If the input vector is partially
incorrect or incomplete, the initial state will converge incorrect or incomplete, the initial state will converge
into the stable state-vertex after a few iterations.into the stable state-vertex after a few iterations.
Suppose, for instance, that our network is required to Suppose, for instance, that our network is required to
memorise two opposite states, (1, 1, 1) and (memorise two opposite states, (1, 1, 1) and (1, 1, 1, 1, 1). 1).
Thus,Thus,
where where YY
11 and and YY
22 are the three-dimensional vectors. are the three-dimensional vectors.
oror











1
1
1
1Y














1
1
1
2
Y 111
1
T
Y 111
2

T
Y

© Negnevitsky, Pearson Education, 2005 18
The 3 The 3 3 identity matrix 3 identity matrix I I isis
Thus, we can now determine the weight matrix as Thus, we can now determine the weight matrix as
follows:follows:
Next, the network is tested by the sequence of input Next, the network is tested by the sequence of input
vectors, vectors, XX
11 and and XX
22, which are equal to the output (or, which are equal to the output (or
target) vectors target) vectors YY
11 and and YY
22, respectively., respectively.











100
010
001
I




































100
010
001
2111
1
1
1
111
1
1
1
W











022
202
220

© Negnevitsky, Pearson Education, 2005 19
First, we activate the Hopfield network by applying First, we activate the Hopfield network by applying
the input vector the input vector XX. Then, we calculate the actual . Then, we calculate the actual
output vector output vector YY, and finally, we compare the result , and finally, we compare the result
with the initial input vector with the initial input vector XX. .





















































1
1
1
0
0
0
1
1
1
022
202
220
1
signY



























































1
1
1
0
0
0
1
1
1
022
202
220
2
signY

© Negnevitsky, Pearson Education, 2005 20
The remaining six states are all unstable. However, The remaining six states are all unstable. However,
stable states (also calledstable states (also called fundamental memoriesfundamental memories) are ) are
capable of attracting states that are close to them.capable of attracting states that are close to them.
The fundamental memory (1, 1, 1) attracts unstable The fundamental memory (1, 1, 1) attracts unstable
states (states (1, 1, 1), (1, 1, 1, 1), (1, 1, 1) and (1, 1, 1, 1) and (1, 1, 1). Each of these 1). Each of these
unstable states represents a single error, compared to unstable states represents a single error, compared to
the fundamental memory (1, 1, 1). the fundamental memory (1, 1, 1).
The fundamental memory (The fundamental memory (1, 1, 1, 1, 1) attracts unstable 1) attracts unstable
states (states (1, 1, 1, 1), (1, 1), (1, 1, 1, 1, 1) and (1, 1) and (1, 1, 1, 1). 1).
Thus, the Hopfield network can act as an Thus, the Hopfield network can act as an error error
correction networkcorrection network..

© Negnevitsky, Pearson Education, 2005 21
Storage capacity of the Hopfield networkStorage capacity of the Hopfield network
Storage capacity Storage capacity is is or the largest number of or the largest number of
fundamental memories that can be stored and fundamental memories that can be stored and
retrieved correctly.retrieved correctly.
The maximum number of fundamental memories The maximum number of fundamental memories
MM
maxmax that can be stored in the that can be stored in the nn-neuron recurrent -neuron recurrent
network is limited bynetwork is limited by
nM
max0.15

© Negnevitsky, Pearson Education, 2005 22
Bidirectional associative memory (BAM)Bidirectional associative memory (BAM)
The Hopfield network represents an The Hopfield network represents an autoassociative autoassociative
type of memory type of memory it can retrieve a corrupted or it can retrieve a corrupted or
incomplete memory but cannot associate this memory incomplete memory but cannot associate this memory
with another different memory.with another different memory.
Human memory is essentially Human memory is essentially associativeassociative. . One thing One thing
may remind us of another, and that of another, and so may remind us of another, and that of another, and so
on. We use a chain of mental associations to recover on. We use a chain of mental associations to recover
a lost memory. If we forget where we left an a lost memory. If we forget where we left an
umbrella, we try to recall where we last had it, what umbrella, we try to recall where we last had it, what
we were doing, and who we were talking to. We we were doing, and who we were talking to. We
attempt to establish a chain of associations, and attempt to establish a chain of associations, and
thereby to restore a lost memory.thereby to restore a lost memory.

© Negnevitsky, Pearson Education, 2005 23
To associate one memory with another, we need a To associate one memory with another, we need a
recurrent neural network capable of accepting an recurrent neural network capable of accepting an
input pattern on one set of neurons and producing input pattern on one set of neurons and producing
a related, but different, output pattern on another a related, but different, output pattern on another
set of neurons.set of neurons.
Bidirectional associative memory (BAM)Bidirectional associative memory (BAM), first , first
proposed by proposed by Bart KoskoBart Kosko, is a heteroassociative , is a heteroassociative
network. It associates patterns from one set, set network. It associates patterns from one set, set AA, ,
to patterns from another set, set to patterns from another set, set BB, and vice versa. , and vice versa.
Like a Hopfield network, the BAM can generalise Like a Hopfield network, the BAM can generalise
and also produce correct outputs despite corrupted and also produce correct outputs despite corrupted
or incomplete inputs.or incomplete inputs.

© Negnevitsky, Pearson Education, 2005 24
BAM operationBAM operation
y
j(p)
y
1(p)
y2(p)
y
m(p)
1
2
j
m
Output
layer
Input
layer
x
i(p)
x
1
x
2
(p)
(p)
xn(p)
2
i
n
1
xi(p+1)
x
1(p+1)
x2(p+1)
xn(p+1)
y
j(p)
y
1(p)
y2(p)
y
m(p)
1
2
j
m
Output
layer
Input
layer
2
i
n
1
(a)Forwarddirection. (b)Backwarddirection.

© Negnevitsky, Pearson Education, 2005 25
The basic idea behind the BAM is to store The basic idea behind the BAM is to store
pattern pairs so that when pattern pairs so that when nn-dimensional vector -dimensional vector
X X from set from set A A is presented as input, the BAM is presented as input, the BAM
recalls recalls mm-dimensional vector -dimensional vector Y Y from set from set BB, but , but
when when Y Y is presented as input, the BAM recalls is presented as input, the BAM recalls XX..

© Negnevitsky, Pearson Education, 2005 26
To develop the BAM, we need to create a To develop the BAM, we need to create a
correlation matrix for each pattern pair we want to correlation matrix for each pattern pair we want to
store. The correlation matrix is the matrix product store. The correlation matrix is the matrix product
of the input vector of the input vector XX, and the transpose of the , and the transpose of the
output vector output vector YY
TT
. The BAM weight matrix is the . The BAM weight matrix is the
sum of all correlation matrices, that is,sum of all correlation matrices, that is,
where where M M is the number of pattern pairs to be stored is the number of pattern pairs to be stored
in the BAM.in the BAM.
T
m
M
m
mYXW


1

© Negnevitsky, Pearson Education, 2005 27
Stability and storage capacity of the BAMStability and storage capacity of the BAM
The BAM is The BAM is unconditionally stableunconditionally stable. This means that . This means that
any set of associations can be learned without risk of any set of associations can be learned without risk of
instability.instability.
The maximum number of associations to be stored in The maximum number of associations to be stored in
the BAM should not exceed the number of the BAM should not exceed the number of
neurons in the smaller layer.neurons in the smaller layer.
The more serious problem with the BAM is The more serious problem with the BAM is
incorrect convergenceincorrect convergence. The BAM may not . The BAM may not
always produce the closest association. In fact, a always produce the closest association. In fact, a
stable association may be only slightly related to stable association may be only slightly related to
the initial input vector. the initial input vector.
Tags