Adaline and Madaline.ppt

615 views 27 slides Nov 19, 2022
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Back propagation network


Slide Content

1
Adaline and Madaline
Adaline : Adaptive Linear neuron
Madaline: Multiple Adaline
2.1 Adaline(Bernard Widrow, Stanford Univ.)
bias term
(feedback,
error, gain,
adjust term)0
wx
n
T
jj
j
y w x

 0
:x
Linear combination
d = f(y)

2
2.1.1 Least Mean Square (LMS) Learning
◎Input vectors :
Ideal outputs :
Actual outputs :12
{ , , , }
L
x x x 12
{ , , , }
L
d d d 12
{ , , , }
L
y y y    
2
222
1
2
1
2 -- (2.4)
L
T
k k k k k k
k
T T T
k k k k k
ε ε d y d
L
dd

    
  
 wx
w x x w x w
Mean square error:
Let 2
,,k k kξε d p x :xx
T
kkR correlation
matrix
Assume the output function: f(y) = y = d

3
Let ()
2 2 0.
d
R
d

  
w
wp
w *1
.R

wp Obtain
Practical difficulties of analytical formula :
1. Large dimensions -difficult to calculate
2. < > expected value -Knowledge of probabilities1
R

Idea:*
argmin ( )
w
ww 2
(2.4) 2
TT
kdR    w w p w

4
The graph of is a
paraboloid. 2
( ) 2
TT
kdR  w w w p w
2.1.2 Steepest Descent

5
Steps: 1. Initialize weight values
2. Determine the steepest descent direction
Let
3. Modify weight values
4. Repeat 2~3.( 1) ( ) ( ), : step sizet t t   w w w  ( ( ))
( ( )) 2( ( ))
()
dt
t R t
dt

   
w
w
w p w
w 0
()tw
No calculation of ( ) ( ( ))tt 
w
ww 
Drawbacks: i) To know R and p is equivalent to
knowing the error surface in advance. ii) Steepest
descent training is a batch training method. 1
R

6
2.1.3 StochasticGradient Descent
Approximate by randomly
selecting one training example at a time
1. Apply an input vector
2.
3.
4.
5. Repeat 1~4 with the next input vector( ( )) 2( ( ))t R t  
w
w p w k
x 2 2 2
( ) ( ) ( ( ) )
T
k k k k kεt d y d t     wx 22
( ) ( ) ( )
2( ( ) ) 2 ( )
kk
t
k k k k k
t εt εt
dt εt
  
   
w w w
w x x x ( 1) ( ) 2 ( )kktt μεt  w w x and Rp
No calculation of

7
○Practical Considerations:
(a) No. of training vectors, (b) Stopping criteria
(c) Initial weights, (d) Step size
Drawback: time consuming.
Improvement: mini-batch training method.

2.1.4 Conjugate Gradient Descent
--Drawback: can only minimize quadratic functions,
e.g.,1
()
2
TT
f A c  w w w b w
Advantage: guarantees to find the optimum solution in
at most niterations, where nis the size of matrix A.
A-Conjugate Vectors:
Let square, symmetric, positive-definite matrix.
Vectors are A-conjugate
if { (0), (1), , ( 1)}Sn  s s s ( ) ( ) 0,ss
T
i A j i j   :
nn
A

*If A= I(identity matrix), conjugacy = orthogonality.

SetSforms a basisfor space .
n
R
The solution in can be written as 1
0
()
n
i
i
ai



ws 
w n
R
•The conjugate-direction method for minimizing f(w) is
defined by( 1) ( ) ( ) ( ), 0,1, , 1i i i i i n     w w s
wherew(0) is an arbitrary starting vector.
is determined by()i min ( ( ) ( ))f i i

ws
Let( ) ( ) ( ) ( 1), 1,2, , 1 (A)i i i i i n      s r s
Define , which is in the steepest
descent direction of( ) ( )i A ir b w ( ( ) 2( )).fA  
w
w b w
How to determine ( )?is ()fw

10
Multiply by s(i-1)A,( 1) ( ) ( 1) ( ( ) ( ) ( 1)).i A i i A i i i     
TT
s s s r s ( ) ( ) 0,
T
i A j i j   ss
In order to be A-conjugate:0 ( 1) ( ) ( ) ( 1) ( 1).i A i i i A i    
TT
s r s s ( 1) ( )
( ) (B)
( 1) ( 1)
i A i
i
i A i


    

T
T
sr
ss (1), (2), , ( 1)n s s s
generated by Eqs. (A) and (B)
are A-conjugate.
•Desire that evaluating does not need to know A.
Polak-Ribiere formula:( )( ( ) ( 1))
()
( 1) ( 1)
T
T
r r r
rr
i i i
i
ii



 ()i

Fletcher-Reeves formula:( ) ( )
()
( 1) ( 1)
T
T
rr
rr
ii
i
ii



* The conjugate-direction method for minimizingLet ( 1) ( ) ( ) ( ), 0,1, , 1i i i i i n     w w s 2
( ) 2
TT
kdR  w w w p w
w(0) is an arbitrary starting vector()i min ( ( ) ( ))ii

ws
is determined by( ) ( ) ( ) ( 1),s r si i i i   ( ) ( )i R ir p w ( 1) ( )
()
( 1) ( 1)
i R i
i
i R i




T
T
sr
ss

Nonlinear Conjugate Gradient Algorithm
Initialize w(0) by an appropriate process

Conjugate gradient
converges in at most
nsteps where nis the
size of the matrix of
the system (here n=2).
Example: A comparison of the convergences of
gradient descent(green) and conjugate gradient
(red) for minimizing a quadratic function.

14
2.3. Applications
2.3.1. Echo Cancellation in Telephone Circuitsn n
n: incoming voice, s: outgoing voice
: noise (leakage of the incoming voice)
y: the output of the filter mimics

15
Hybrid circuit: deals with the leakage issue, which
attempts to isolate incoming from outgoing signals
Adaptive filter: deals with the choppy issue, which
mimics the leakage of the incoming voice for
suppressing the choppy speech from the outgoing
signals2 2 2 2
22
( ) ( ) 2 ( )
()
s s n y s n y s n y
s n y
                 
    

( ) 0s n y   
(snot correlated with y, ) n 2 2 2 2
()ε s s n y          22
min min ( )ε ny      

16
2.3.2 Predict Signal
An adaptive filter is trained to predict signal.
The signal used to train the filter is a delayed
actual signal.
The expected output is the current signal.

17
2.3.3 Reproduce Signal

18
2.3.4. Adaptive beam –forming antenna
arrays
Antenna : spatial array of sensors which are
directional in their reception
characteristics.
Adaptive filter learns to steer antennae in order
that they can respond to incoming signals no
matter what their directions are, which reduce
responses to unwanted noise signals coming in
from other directions

19
2.4 Madaline : Many adaline
○XOR function
?

20

21
2.4.2. Madaline Rule II (MRII)
○Training algorithm –A trial–and–errorprocedure
with a minimum disturbance principle(those
nodes that can affect the output error while
incurring the least change in their weights
should have precedence in the learning
process)
○Procedure –
1. Input a training pattern
2. Count #incorrect values in the output layer

22
3.1. Select the first previously unselected error
node whose analog output is closest to zero
( this node can reverse its bipolar output
with the least change in its weights)
3.2. Change the weights on the selected unit s.t.
the bipolar output of the unit changes
3.3. Input the same training pattern
3.4. If reduce #errors, accept the weight change,
otherwise restore the original weightsQ
3. For all units on the output layer
4. Repeat Step 3 for all layers except the input layer

23
6. Repeat step 5 for all layers except the input layer.
5. For all units on the output layer
5.1. Select the previously unselected pair of units
whose output are closest to zero
5.2. Apply a weight correction to both units, in
order to change their bipolar outputs
5.3. Input the same training pattern
5.4. If reduce # errors, accept the correction;
otherwise, restore the original weights.

24
※ Steps 5 and 6 can be repeated with triplets,
quadruplets or longer combinations of units
until satisfactory results are obtained
The MRII learningrule considers the network with
only one hidden layer. For networks with more hidden
layers, the backpropagation learning strategy to be
discussed later can be employed.

25
2.4.3. A Madaline for Translation–Invariant
Pattern Recognition

26
。Relationships among the weight matrices of Adalines

27
○Extension --Mutiple slabs with different key weight
matrices for discriminating more then two classes of
patterns
Tags