4aMLChapter4Neurones&Networks UC Coimbr PT23ENSplit.pdf

Chapter 4
Neurons, Layers, (Neural) Networks
https://en.wikipedia.org/wiki/Artificial_neuron11 Sept 2023
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
148

Biological Neurons
Brain: –10
11
neurons
–10
4
connections by neuron
- parallel computation
Response Time: –10
-3
s , the biological
-10
-9
s , the electrical circuits
Dendrites
Axons
Sinapses
Body
(from DL Toolbox User’s Guide)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
149

Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Biological Network
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
150

Artificial Neuron: mathematical model
Axon
Cell Body
Dendrites
Synapses
Synapses
dendrites
sinapses
axon
body
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
151

Artificial Neuron: mathematical model
artificial
Axon
Cell Body
Dendrites
Synapses
Synapses
output
inputs
weights
Sum+activation f
dendrites
sinapses
axon
body
(de Brause)
f
a

p
1
p
n
w
1n
w
11
n
…
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
151

Axon
Cell B ody
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell B ody
Dendrites
Synapses
Synapses
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
152

Axon
Cell B ody
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell B ody
Dendrites
Synapses
Synapses
(de
Brause
)
f
a

p
1
p
n
w
1n
w
11
n
…
(de
Brause
)
f
a

p
1
p
m
w
1m
w
11
n
…
p
n
(de
Brause
)
f
a

p
1
w
12
w
11
n
…
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
152

4.1. Neuron with a single input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
() ( )
nwp
a
f
n
f
wp
 
input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
f : activation function (or transfer function)
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
f : activation function (or transfer function)
w: input weight
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

Neuron with a single input and a bias
The bias allows the output to be nonzero even if the inputs are zero. @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
The bias allows the output to be nonzero even if the inputs are zero. @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
b
1
bias
The bias allows the output to be nonzero even if the inputs are zero.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
1[ ] ''
1
() ( ) ('')
p
nwpb wb wp
afnfwpbfwp

   





b
1
bias
The bias allows the output to be nonzero even if the inputs are zero.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

4.2. Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
hardlim( ) awpb
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
hardlim( ) awpb
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


p
urelin( ) awpb


(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable) Sigmoid unipolar
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



Sigmoid unipolar
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



Sigmoid unipolar
(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



()
1
1
wp b
a
e



Sigmoid unipolar
(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Sigmoid bipolar (hiperbolic tangent)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
158

tansi
g
( ) , in Matlab
nn
nn
ee
an
ee





Sigmoid bipolar (hiperbolic tangent)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
158

(from Hagan&coll.)
Activation
functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
159

4.3. Neuron with several inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

4.3. Neuron with several inputs
output
2 inputs 2 weights activation
f
b
a

1
p
1
sum
p
2
w
12
w
11
n bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

4.3. Neuron with several inputs

1
11 1 12 2 11 12
2
W
( ) (W )
p
nwpwpbw w b pb
p
afnfpb

    


 
output
2 inputs 2 weights activation
f
b
a

1
p
1
sum
p
2
w
12
w
11
n bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

Neuron with R inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

Neuron with R inputs
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

11 1 12 2 1
... W
(W )
RR
nwpwp wpb n pb
af pb



Neuron with R inputs
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

Neuron with R inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

Neuron with R inputs
1
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

Neuron with R inputs
compact notation (vectors and
matrices)
1
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

4.4. Neuron RBF (Radial Basis Function)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
a
p
||p-w
1
||
w
1
b
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
One RBF neuron,
gaussian, with
one input and a
scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
0,5
1
2
case w
1
=0
One RBF neuron,
gaussian, with
one input and a
scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

One RBF neuron, gaussian, with R inputs and scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

() ( ) a radbas n radbas w p b 
One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

2
()

n
a radbas n
e

 
() ( ) a radbas n radbas w p b 
One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

4.5. Layer of neurons
2 inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165
3 inputs

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
w
23
w
13
p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165
3 inputs

4.5. Layer of neurons
S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
166

4.5. Layer of neurons
S neurons
11 22
(W )
...
( ) 1,2,...,
ii i iRRi
iii
afpb
nwpwp wpb
afn i S




@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
166

Compact (matrices)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
neuron
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
neuron
input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact notation (matrices) with layer index
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
111,1 1
(IW ) af pb
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
1,1
,

from the origin 1 (second index) to the destination 1 (first in dex)

from origin to the destination
ij
IW Input Weight Matrix
LW Layer Weight Matrix
ji


111,1 1
(IW ) af pb
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

4.6. Multilayer network
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) 
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) 
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
f
3
can model any nonlinear relation between input p and output a
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

Compact notation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) y
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) y
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
f
3
can model any nonlinear relation between input pand output y
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

4.7. RBFNN-Radial Basis Function Neural
Network
11,11 2,112
2
1
1
1,1 1,1
( IW ) a (LW a b ))
i element of a
IW ector composed by the row of IW
iii
th
i
th
i
a radbas p b purelin
a
vi
 


Layer of S
1
RBF neurons
Linear layer with
S
2
neurons
R e
n
t
r
a
d
a
s
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
171

4.8. The binary perceptron: training and
learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
reinforcement learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
reinforcement learning
unsupervised learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173






Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning




Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.



Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.

Unsupervised learning


Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.

Unsupervised learning
The NN has only inputs, not outputs, and learns to
categorize them (dividing the inputs in classes, as in
clustering)

Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173

Binary Perceptron learning
For a single layer one may eliminate the layer index
1
11,111
ahardlim(IWpb) 
R
I
n
p
u
t
s
Sneurons
Sneurons
(from DL Toolbox U.G.)
Perceptron is the first ANN for which mathematical developments have been
made. Its training is still illustrative of many issues in any ANN training.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
174

11 12 1
21 22 2 1,1
12
...
...
WIW
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













hardlim( ) hardlim( )
T
iiii
an wpb


1
2
W
...
T
T
T
S
w
w
w







n
i
a
i1
0
Each neuron divides the space in two regions

1
2
12

...
...
i
i
i
Si
T
iii iR
w
w
w
w
www w








Line iof W
Column iof W
For a single layer one may eliminate the layer index
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
175

One neuron, two inputs
1
11 1 12 2
hardlim(W )
hardlim( )
hardlim( )
T
apb
wp b
wp wp b





Decision boundary,
n
= 0:
1111122
0
T
nwpbwpwpb

  
11
21
12 12
wb
pp
ww
 
...
One straight line
p
2
=mp
1
+ d
1
(from DL Toolbox U.G.)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
176

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
n >0
n <0
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

p
1
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

n >0
W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

n >0
n <0
W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

1 -1
1
0
11
1
12
1
1
W
1
b
w
w
w



 



 

p
1
p
2
11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

  
n >0
n <0
W
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
179

1 -1
1
0
11
1
12
1
1
W
0
b
w
w
w



 



 

p
1
p
2
11
1
T
nwpbp

 
n >0
n <0
W
11 1 12 2
0 wp wp b


-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
180

11
1
12
1
W
1
1
w
w
w
b



 





 

11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

  
n >0
n <0
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
181

-1
-1
1
0
11
1
12
0
W
1
1
w
w
w
b



 





 

p
1
p
2
11
21
12 12
wb
pp
ww
 
12
1
T
nwpbp

 
n >0n <0
W
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
182

1 -1
1
0
11
1
12
1
1
W
1
w
w
w
b




 




 

p
1
p
2
11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

 
n >0
n <0
W
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
183

Select the weights vector W perpendicular to that
boundary,of any magnitude (what matters is its
direction and sense)
Draw a boundary straight line In any problem:
The vector W points to the regionn>0
Calculate now the neededbmaking the calculations
for a point of the boundary.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
184

Example 4.1: logical OR
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
p
2
1
p
1
1
.5
.5
n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
0,5*0 0,5*0,5 0
[0,0.5] point
b


n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
0,5*0 0,5*0,5 0
[0,0.5] point
b


0.25 b


n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185


1
0,5
-0,25
Example 4.1: logical OR
p
1
p
2
p
1
OR p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
186

Perceptron with several neurons
One decision boundary per
neuron. For the neuron i, it will
be
0
T
ii
wpb


It can classify into
2
S
categories, with Sneurons.
11,111
ahardlim(IWpb) 
(from DL Toolbox U.G.)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
187

Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Two neurons
with outputs
0101
,,,
0011






n
1
>0
n
2
>0
n
1
<0
n
2
>0
n
1
>0
n
2
<0
n
1
<0
n
2
<0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Two neurons
with outputs
0101
,,,
0011






n
1
>0
n
2
>0
n
1
<0
n
2
>0
n
1
>0
n
2
<0
n
1
<0
n
2
<0
12
1,8 1
W W
22









b
1
=1b
2
=2
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
-2 3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
-2 3
0 0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

Learning rule (automatic learning)






12
12
, , , , ..., ,
Q
Q
pt pt pt
Given a training dataset: correct pairs of {input, output}:
12 3
123
110
,1, ,0, ,0
22 1
pt p t p t

      
    

     
      
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
190

12 3
123
110
,1, ,0, ,0
22 1
pt p t p t
     
           
        
p
1
p
2
1
1
2
2
0 -1
-1
t=0
t=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
191

p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
+
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

t=0
t=1
Neuron without bias
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
p
2
badly classifified !!!
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
p
2
badly classifified !!!
Should be t =0 and it is t =1
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
3
badly classified!!!
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
3
badly classified!!!
Should be t =0, it is t =1.
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
All points are now
well classified
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
All points are now
well classified
(see numerical calculations in
Hagan 4-10)
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
1
1
2
2
0
-1
-1
1 2
3
n <0, t=0
W
(3)
p
3
p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

And if a point, when analyzed, is correct ? – nothing is changed
If
t =1
and
a = 0
, then do
() ()
11
new old
wwp


() ()
11
new old
wwp


If
t = 0 and a = 1,
then do
If
t = a
then do
() ()
11
new old
ww
Defining
e = t-a
() ()
11
.
new old
wwep
e = +1
e = -1
e = 0
() ()
11
.
new old
wwep
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
197

Only some problems can be solved by one neuron without
bias
What happens if the neuron has bias b ?
The bias is a weight of an
input equal to 1 !
1
1
1

1
hardlim( . )
T
wp
z
b
az










() ()
11
.
new old
ez



Perceptron learning rule
1
(from DL Toolbox U.G)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
198

Perceptron with several neurons
() ()
.
new old
iiii
ez


For the neuron
i, i =1,..., S
iii
eta


1
i
p
z






i
i
i
w
b








() ()
.
new T old T T
iiii
ez


.
new old T
ez 
1
...
T
T
S
 











1
...
T
T
T
S
z
z
z













12
...
S
eee e
Perceptron rule !!!
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
199

Does the procedure converge always ?
Yes, if there exists a solution: see the proof in Hagan, 4-15.
Limitations of the perceptron
It solves only linearly separable problems
Normalized perceptron rule
.
T
new old
z
e
z

- All the inputs have the same importance (problem of
outliers)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
200

Example 4.4. : exclusive OR,XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
???
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
201

Conclusions about the perceptron The perceptron learning rule is a supervised one
It is simple, but powerful
With one single layer, the binary perceptron can solve
only linearly separable problems.
The problems nonlinearly separable can be solved by
multilayer architectures.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
202

4.9.1. The ADALINE , Adaptive Linear Network
R
inputs
S
neurons
()
jj
purelin

 aWpb
( ) -th row of
TT
ij i j i i
a purelin b i wp w W
RS
( from DL Toolbox U.G.)
j
p
j
a
j
p
j
a
j
n
j
n
4.9 General Single Layer Networks
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
203

11 12 11
21 22 22
31 32 3
12
...
...
......
...
T
R
T
R
R
T
SS SRS SRSR
ww w
ww w
ww w
ww w



 

 

 


 

 

 


w
w
W
w
11 12 1 11 21 1 1
21 22 2 12 22 2 2
12
12 12
... ...
... ...
...
... ... ... ... ... ... ... ... ...
... ...
T
QR
T
QRT
Q
T
RR RQ QQ RQ Q
R
xQ QR QR
pp p pp p
pp p pp p
pp p pp p



 

 

 
  


 

 

 

 
p
p
Ppp p P
p
Notation for one layer of S neurons
11 12 1
12
12
(1)
...
... ... ... ...
...
...
1 1 ... 1
Q
Q
RR RQ
RxQ
pp p
pp p





 
 




P
Zzz z
1

11 1 11
21 2 22
1(1)
...
...
... ... ... ......
...
T
R
T
R
T
SSRSS SR
wwb
wwb
wwb

 
 
 

 
 
 
θ
θ
ΘWb
θ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
204

11 12 11
21 22 22
12
12
...
...
...
... ... ... ......
...
T
Q
T
Q
Q
SQ
T
SS SQS
SQ
tt t
tt t
tt t











 
 






 

 
T
T
Tttt
T
11 12 11
21 22 22
12
12
...
...
...
... ... ... ......
...
T
Q
T
Q
Q
SQ
T
SS SQS
aa a
aa a
aa a

 
 
 
 

 
 
 
A
A
Aaaa
A
Notation for one layer of S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
205

-the input vector of dimension
-th component of the th input vector,

-th output vector obtained with the -th input
row vector of the outputs of the -th neuron for all
j
ij
jj
T
i
jR
pi j
jj
i

p
ap
A



the inputs from 1 to
-th component da -th output , output of the -th neuron
-th target output, when the input is applied
row vector of the target of -th neuron for all the input
ij j
jj
T
i
Q
ai j i
j
i
a
tp
Τ


s from 1 to
component of the -th target
weight between the neuron and the component of the input vect or
ij j
ij
Q
tij
wij
t 

Notation for one layer of S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
206

Case of one neuron with two inputs
111111122
() ( )
TT
jj jjjj
a purelin n purelin b b w p w p b  wp wp
0 if 0
0 if 0
0 if 0
an
an
an



Decision boundary
Divides the plane into two zones.
Can classify patterns linearly
separable.
The LMSE training optimizes the
position of the boundary with
respect to the training patterns
(advantage over the perceptron).
1
(from DL Toolbox UG)
( from DL Toolbox UG)
(W=w
1
T
)
j
p
j
a
j
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
207

11 1
1
112 2

1
1
j
j
jj
wp
wp
b
b
 

 
 

 


 
 
wp
θz

11 111 122 1 2
11
12
11 12
1
1
21
1

1

TT
jj jj jj j
j
T
jj
abwp
w
w
b
wwb
wp b p p
p
p











  


wp z θ
θz
1
b
11
11 11 11 11 12 21 11 12 21 1 1 1
T
p
atwpwpbwwbp


    



θz
3 unknowns
x
1
st
The 3 needed equations are obtained by the
application of 3 inputs z
1
,z
2
, z
3
:
j
p
1
j
a
1
j
n
1
j
t
Neuron 1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
208


12
12 12 11 12 12 22 11 12 22 1 2 1
T
p
atwpwpbwwbp


   



θz

13
13 13 11 13 12 23 11 12 23 1 3 1
T
p
atwpwpbwwbp


   



θz
2
nd
3
rd

13
11 12
11 12 13
11 12 13 11 12 13 21 22 23
2
11
11
1
111

A (supervised training, the output A

TT
TT T
ppp
aaa ttt pppwwb
T











 


θθzzz Z
Zθ
should be equal to the target T)
1
st
2
nd
3
rd
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
209

111
TTT
 ATθZ
1
1 1
TT

T θZ
Does exist
(
Z)
-1
???
Almost never!!!
What to do, to find a solution ???
More inputs
More equations than
unknowns
One can use the pseudo-inverse of
Z
instead of the
inverse, for a solution with non-null error:
1
11 1
()
TT TTT


 θTZTZZZ
Is it possible? Exists (ZZ
T
)
-1
? Computational cost ?
It would be better to process the data iteratively
(Q>R+1)
bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
210

Minimization of the mean square error: LMSE
(Least Mean
Square Error)
Supervised training: one gives to the network a set Q of
input-target pairs; for neuron i
The obtained output
a
ik
is compared with the target
t
ik
,
and one obtains the error
e
ik
= t
ik
–a
ik
, k=1, ..., Q
The LMSE algorithm will adjust the weights and the bias
in order to minimize the mean square error,
mse
,
penalizing in the same way the positive and the negative
errors, i.e., minimizing
11 22
{ , },{ , },...,{ , }
ii QiQ
tt t pp p
22
11
11
()
QQ
ik ik ik
kk
mse e t a
QQ



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
211

1
2
1
...
i
i
i
i
iR
R
w
w
b
w
b









 


 






w
θ
1
2
...
1
1
j j
j
j
Rj
pp p








 


 






p
z
11 22
... 1
TTT
ij i j i j i j iR Rj i j j i
apbwpwp wpb  wθzzθ
1
2 222 2
12 12
1
... ...
...
i
Q
iT
ik i i iQ i i iQ i i
k
iQ
e
e
eee e ee e
e




  





ee
22 2
2
()()()()
( )( ) 2
TTTT
ij ij ij ij j i ij j i ij j i
TT T T TT
ij i j ij j i ij ij j i i j j i
eta t t t
tttt
   
   
zθ zθ zθ
θz zθ zθ θzzθ
For a set of Q training pairs, the sum of all squared errors will be
1
bw
i
1
w
iR
Neuron i
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
212

Consider the concatenated vectors of inputs and outputs
11 12 1
22 22 2
12
12
(1)
...
...
...... ... ... ...
...
... 1 11
Q
Q
Q
RQ RR
R
Q
pp p
pp p
p pp




 





Zzz z
12
1
...
iii iQ
Q
tt t


 


T
12
1
...
T
iii iQ
Q
aa a



A
1
2
(1)
...
i
i
i
iR
R
w
w
w
b









θ
T
ii

T
AθZ
1
QR

(More equations than unknowns, there is
no exact solution. We look for the one
that minimizes the squared errors (MSE))
1
2
(1) ...
T
T
T
T
Q
QR









z
z
Z
z
TTTT
ii iii

T
eTATθZ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
213

Now
() ( )( )( )( )

2
TTT TT T
iiiiiii ii i i
TTTT TT
ii i i i i i i
TTTTT
ii i i i i
F 
  
 
θeeT-AT-A T-θZT-Zθ
TT TZθ θZT θZZθ
TT TZθ θZZθ
This expression must be minimized with respect to

i
:
((1))((1)(1)(1) RQQR R R


1
() ()
TTT
iii

 θZZZTZT
22 0
TT
ii ii
   ZT ZZ θ ZZ θ ZT
gradient: ( ) ( 2 )
( ) 2 ( ) ( )
2 2
TTTTT
iiiiiii
TTTTT
ii i i i i
T
ii
F  

  
 
θ TT TZθ θZZθ
TT TZθ θZZθ
ZT ZZ θ
() ()
() 2
TT
T


T
ax xa a
xAx Ax Ax Ax
Note:
( if A is symmetric)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
214

Is it a minimum? Is it a maximum? Is it a saddle point ?
Second derivative
2
2
()(2 2 )
TT
T ii i i
ii



ee ZT ZZθ
ZZ
θθ
ZZ
T
> 0, has a unique global minimum
ZZ
T

0, has a weak global minimum or has no stationary point.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
215

Note about the signal of matrices
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
216
The signal of a symmetric matrix A is related to the
signal of its eigenvalues:
A>0, positive definite, all eigenvalues are >0
A≥0, positive semidefinite, eigenvalues are ≥0
A≤0, negative semidefinite, eigenvalues are ≤ 0
A<0, negative definite, eigenvalues are <0
A is indefinite if it has positive and negative
eigenvalues

Interpretations of the condition of minimum:
() 2
TTTTT
iiiiii i
F  θTTTZθθZZθ
If the inputs of the network are random vectors, then the error will
be random and the objective function to be minimized will be:
() [ 2 ]
TTTTT
iiiiiii
FE
 
θ TT TZθ θZZθ
() [ ]2[ ][]
TTTT
iii ii
T
ii
FEE E  θTTTZθθZθ Z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
217

A correlation matrix is positive definite (>0) or is
positive semidefinite (0). If the inputs applied to the network are uncorrelated,
the correlation matrix ZZ
T
is diagonal, being the
diagonal elements the squares of the inputs. In these
conditions the correlation matrix is positive definite,
and the global minimum exists, and it is unique.
The existence or not of a unique global minimum
depends on the characteristics of the input training set.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
218

For a network with S neurons,
11 1 1
121 2 2
1
(1)
...
...
...... ... ... ...
... .. .
...
R
T
R
T
SSSRS
SR
wwb
wwb
wwb







 



 


θ
Θ
θ
(Q > (R+1))
1 11 12
1
2 21 22
12 2
12
...
...
...
... ... ... ...
...
QT
QT
Q
T
S
SQ SS
SQ
t tt
T
t tt
T
T
t tt



 

 
 

 

 



Ttt t
neuron
1
212
...
T
T
TT
S
T
S
T
TTTT
T



 





T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
219

1
() ()
TT
iii

 θZZZTZT



11 1
12
1
12
1
12
() () ...()
() ...
(
...

)

TT T
S
T
S
T
S
TT
 










ZZ ZT ZZ ZT ZZ ZT
ZZ Z T T T
ZZ
Θθθ
ZT
θ
Possible? Exists (ZZ
T
)
-1
? Computational cost ?
1
(( ) )
TTT


 ΘZZZT TZ
The computation of the inverse, for a high number R of
inputs, is difficult. Does it exist a (computationally) more
simple algorithm?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
220

Iterative LMSE algorithm
For each training pair (
p
iq ,
t
iq
)
2
nd
compute the squared error
e
ip
2
=(t
ip
-a
ip
)
2
3
rd
compute the gradient of the squared error with
respect to the weights and the bias :
2
2
2
2
1
2 for the weights 1,2,...,
2 for the bias
iq iq
iq iqkk
ij ij
iqiq
iq iq
R
ee
ee jR
ww
e e
ee
bb


  


 



For
q
from
1
to
Q
do
(
case of one neuron
i) :
Initialize the weights (for example randomly)
k=1 (iteration 1) w
ij
k
= w
ij
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
221
1
st
compute the neuron output
a
iq
for the input p
iq

1
[-
= = [()]
1,2,...
1, for the bias
iq iq iqk
iq i q kk k
ij ij ij
R
k
iq ij jq jq k
j ij
iq
ta e
tb
ww w
twpbpjR
w
e
b
b

   

 
 
 
 




T
wp
Computing the derivatives (case of linear neuron)
we will have
2
2
iq iq q
ee 

z
1
2
...
1
1
q
q
q
q
Rq
p
p
p














p
z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
222

4
th
Apply the gradient method to minimize
F(

i), being
k the iteration index
1
()
kk k
ii i
F


 θθ θ
in the present case
1
1
2
2
kk
iiiqq
kk
ii iq
e
bb e






ww p
i
i
b







w
θ
2
()
k
iiq
Fe θ
2
2-2
1
q
iq iq q iq
eee


 

 


p
z
resulting in
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
223

11
2()()2
kk Tk Tk T
ii iqqi i iqq
ee


    ww pw w p
1
1
2
2
kk T
qq
kk
q






WW ep
bb e
But
Or, more compact,
1
11 1
... ... 2 ...
kk
TT
q
T
q
TT
SS sq
e
e


 


 



 


 



  
ww
p
ww
For one layer of Sneurons will come
1
11 1
... ... 2 ...
kk
q
SS sq
bb e
bb e



  

  


  

  
 


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
224

1
2
kk T
qq


 ΘΘez
a more compact form can be written,


( ) ( ) 1
TT
qq

 


ΘWb z p
Taking into account that Remark: this LMSE algorithm is also known as the Widrow-
Hoff rule (see more in Hagan).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
225

The LMSE iterative algorithm is an approximation of the
gradient method, because the computed gradient in each
iteration is an approximation of the true gradient.
Its convergence depends on the learning coefficient .
If the successive input vectors are statistically independent,
and if (
k
) and
z
(
k
) are statistically independent, it
converges.
The learning coefficient must verify
max1
0



where

max
is the maximum eigenvalue of the input
correlation matrix
R=ZZ
T
(see more in Hagan, 10-9
).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
226

4.9.2. The particular case of the associative memory
(one layer without bias)
1
=
jj
R
i
j
ik k
j
k
awp



aWp
Associative memory: learns Q prototype pairs of input-output
vectors
11 22
{ , },{ , },...,{ , }
QQ
pt pt pt
W
pj
naj
R
Rx1
SxR
Sx1 Sx1
a= n= purelin(Wp)
S
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
227

4.9.2. The particular case of the associative memory
(one layer without bias)
1
=
jj
R
i
j
ik k
j
k
awp



aWp
Associative memory: learns Q prototype pairs of input-output
vectors
11 22
{ , },{ , },...,{ , }
QQ
pt pt pt
W
pj
naj
R
Rx1
SxR
Sx1 Sx1
a= n= purelin(Wp)
S
Giving an input prototype, the output is the correct one.
Giving an input approximate to a prototype, the output will
also be approximate to the corresponding output: a small
change in the input will produce a small change in the output.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
227

If R > Q (more inputs than prototypes),Pis rectangular, with
more rows than columns. We have a system with more
unknowns than equations.
If Phas maximum characteristic (if its columns are linearly
independent), the pseudo-inverse ca be used to find an exact
solutionfor the system of equations.
 WP T
1 TT +-
W=TP =T(PP)P
P
+
is the pseudo-inversa of Moore-Penrose
.
.
Supervised training: WP=A and we want A=T , so WP=T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
228

1
1
11 1
, if is inverti l
)
e
(
b



  


W
WP T WPP TP
TP
W
P
PP TP
Solving in order to
W
, using
P
-1
:
If R=Q,the number Q of prototypes is equal to the
number Rof network inputs, and if the prototypes are
linearly independent, then Pcan be inverted.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
229

If R < Q (more prototypes than inputs),Pis rectangular,
with more columns than rows. We have a system with more
equations than unknowns.
In general there is no exact solution. Only an approximated
solution can be found, using the Penrose pseudo-inverse that
minimizes the sum of the squared errors.
 WP T
1 TT +-
W=TP =TP(PP)
Note that the pseudo-inverse does not have the same formula
as in the previous case R>Q.
It gives the solution that minimizes the sum of the squared
errors (see slides 211 and 213).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
230

The formulae of the associative memory are a particular case
of the ADALINE; here there is no bias, and as a consequence
instead of the ADALINE, one has the W of the associative
memory.
The recursive version will be the same of ADALINE
There is an historic algorithm, the Hebb’s rule, that has a
different form
1
, q 1,2,...,
kk T
qq
Q


  WW tp
1
2
kk T
qq


 WW ep
The ADALINE rule comes from the mathematical development for the
minimization of the squared error (LMSE). The Hebb’s rule results from
an empirical principle proposed by the neurobiologist Hebb in its book
The Organization of Behavior, 1949.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
231

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
232
1
Iterative minimization of a general
function g( ) with respect to :
At iteration ,
()
is a constant to be fixed by the user.
kk
kk k
xx
xx
k
g
xx x gx
x 



 

This is the gradient method and is the base of
many ML learning algorithms.
The general Gradient Method

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
233
1
bw
i
1
w
iR
w
ij
target
2
2 .(-1). . 2 . .
ij
w
ij ij
j
j
FFean
e
weanw
efp efp


  


2
2 .(-1). .1 2 . .1
b
FFean
e
beanb
ef ef



  


For the weight w
ij
between neuron iand input p
j
For the bias b
4.10. One layer with any activation function
j
t
2()
to be minimized
afn
eta
F
e
 


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
234
1
[-
= = [()]
. 1,2,...
= . 1,2,...
.1, for the bias
iq iq iqk
iq kk k
ij ij ij
k
R
q kk
iq ij jq kk
j ij q ij
jq
iq
ta e
tfn
ww w
n f
tf wpb j R
wnw
fp j R
e
fb
b



   

 
  
  
 





chain rule
2
2.. -2..
1
q
iq iq q iq
eefef



 

 


p
z
The gradient of the squared error, for all weights and bias, will come
for the input p
q
q=1, ...., Q
at iteration k :
2
ij
w
ij ij
F
Fean
e
weanw

 


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
235
1
2.
kk T
qq
f
.



 ΘΘez
and the update formula will be, for a layer of neurons
The Widraw-Hoff algorithm is the particular case of the gradient
method when the activation function is linear, and so its derivative is
one.
This is the general iterative gradient method for one layer, also
known as the LMSE (least minimum squared error).
After passing through all the inputs (1.... Q) we say that an epoch of
training is complete.
The process restarts with the actual parameters and again for the
inputs (1 ... Q), the second epoch is ended.
And so on, until the convergence criteria is reached or a fixed
maximum number of epochs is attained.

4.11. Synthesis of the learning techniques of a
network with a single layer of linear neurons


1
T




P
Wb Z
Consider the unified notation for the parameter matrix
(having or not bias) and Z the input matrix (with or without
bias). The similarity of the several learning methods becomes
clear:
RS
1
b
1
1
2
...
1
1
j j
j
j
R
j
p
p
p








 


 






p
z
TT
ii
b 

θw
( from DL Toolbox UG)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
236

Types of problems
Classification (pattern recognition)
• Number of prototypes not greater than the number of
inputs, Q≤R : pseudo-inverse rule or Hebb’s rule.
• Number of prototypes greater than the number of
inputs , Q>R: pseudo-inverse (LMSE), or its recursive
version.
Function approximation: LMSE (Widrow-Hoff)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
237

2. Associative memories: linear activation functions , without bias;

is here
justWand Zjust P;
(i) if the number of prototypes is not higher that the number of
characteristics (inputs), Q<R,
1
11
, q
kk T
qq
Q




   ΘΘ tz
iterative Hebb’s rule (see Hagan)
-1 TT

 ΘT(ZZ)Z TZ
Pseudo-inverse : if the prototypes are linearly
independent.
1. Perceptron: binary activation functions (hardlim, hardlims), where is the
matrix of weights and bias:
1kk T
qq


 ΘΘez
Rule with learning coefficient
3. ADALINE: linear neurons, with bias. The number of prototypes is higher
that the number of characteristics (inputs) (Q>R):
1
2
kk T
qq


 ΘΘ ez
LMSE recursive algorithm (W-H)
LMSE non recursive algorithm
1
1
()
if exists ( )
TT
T


 ΘTZZZ TZ
ZZ
T
ΘTZ
(non iterative, batch, Hebb, if prototypes are orthonormal)
(ii) if Q>R, most common situations
-1
TT

 ΘTZ(ZZ) TZ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
238
1
2
kk T
qq


 ΘΘ ez
Recursive Batch

Conclusions (one layer)
the LMSE (or Widrow-Hoff) learning rule is used in NN
Adaline ( linear neurons), and is a particular case of the
gradient method for the LMSE criteria.
The main advantage of the LMSE is its recursive
implementation.
A great care must be put in the preparation of the training set:
its elements should be statistically independent. In practice
this is rarely possible.
The LMSE is very adequate for functions approximation and for
parameter learning in dynamical systems.
In classification, ADLINE solves only linearly separable
problems.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
239

3 3 3,2 2 2,1 1 1,1
123
a =f (LW f (LW f (IW p+b )+b +b )
2 1 1 1,1 1 2 2 2,1 1 2 3 3 3, 2 3
a =f (IW p+b ) a =f (LW a +b ) a =f (LW a +b )
4.11. The multilayer network (MLNN)
Hidden layers Output layer
Layer 1 Layer 2 Layer 3
I
n
p
u
t
s
(from NN Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
240

Hidden layer
Output layer
33

3,2 2 2,1 1 1,1
123
a
y
f (LW f (LW f (IW p+b )+b +b )
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
( + ) ( + ) ( + )   a f IW p b a f LW a b a f LW a b
Layer 1 Layer 2 Layer 3
(from NN Toolbox Manual)
4.11. The multilayer network (MLNN)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
241

Example: exclusive OR, XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
4.11.1 The MLNN for pattern recognition
pt
Note: p
1
XORp
2
=(p
1
ORp
2
)AND(p
1
NANDp
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
242

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
V
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
1x0.5+1x0+b=0
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
1x0.5+1x0+b=0
b= -0.5
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
2
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
V
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
F
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
-1x1.5-1x0+b=0
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
-1x1.5-1x0+b=0
b= 1.5
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

Neurons 1 and 2, Layer 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
a
1
2
a
1
1
Does
not
exist
0 0
F 0 1
F 1 0
V 1 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
a
1
2
a
1
1
Does
not
exist
0 0
F 0 1
F 1 0
V 1 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
a12
1
a11
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

2
nd
layer
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
2
nd
layer
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

4.11.2 The MLNN for functions approximation
Example
12
11 11
12
21 12
12
1
1
2
10 1
10 1
10 0
10
ww
ww
bb
b







10
10
-10
10
1
1
0
2
()fn n

1
1
()
1
n
fn
e



(from Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
247

11
12
11 11
11 1 21 2
22 21212
11 1 12 2
222
11 12
222
11 12
() ()
11
= ( ) ( )
11
11
= ( ) ( )
11
nn
wpb wpb
anwawab
wwb
ee
wwb
ee

 
  






4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
248

10
10
-10
10
1
1
0
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

10
10
-10
10
1
1
0 2 0
2
-2
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

10
10
-10
10
1
1
0 2 0
2
-2
2 0
2
-2
?
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
10
10
-10
10
1
1
0
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

Influence of each parameter:
Run the demo nnd11fa.m to exercise the functions approximation problem. Download all the demos from
http://hagan.okstate.edu/nnd.html11 Sept 2023 , Neural Network Design Demonstrations in a zip
file. The number in the file name is the chapter of the book.
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
251
With these 4 degrees of freedom it is possible to obtain many
nonlinear mappings between the input and the output !!

Very flexible, with many degrees of freedom.
A NN with a sigmoidal hidden layer and a linear output
layer, can approximate anyfunction of interest, with any
precision level, provided that in the hidden layer there is a
sufficient number of neurons. (Theoretical result).
The question is to find the weights and bias that produce
a good mapping between a known input date set and a
known output dataset, i.e., training the NN for given data.
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
252

4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output t,
target
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
Super
visor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
To adjust the weights and the bias in order to minimize the error.
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
Super
visor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule 4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule
derivative of the
criterion with respect to
the weight
New Old
ww








4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule
derivative of the
criterion with respect to
the weight
New Old
ww








This is equivalent to the gradient method to minimize the
criterion.
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Criterion: squared error (quadratic function of the error)
()(())(())()()
MT M T
kk
F
kkkk    W,b t a t a e e
k,
index of iteration
W ,
matrix of weights
b
,
matrix of bias
11 12 1
21 22 2
12
...
...
... ... ... ...
...
R
R
SS SRSR
ww w
ww w
ww w














W
12
...
i
T
iii i
S
aa a 

a
M
a
output of the last layer
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
255

In each iteration
k
, with input p
k
, and for layer m
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
How to
compute
the
derivatives
???
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
with the learning coeficient

.
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
with the learning coeficient

.
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256
Remark about notation: it is assumed that we start at iteration 1 with input 1, and
use incremental learning; then the iteration index is equal to the input index all along
an epoch of training. In the next epoch, the iteration index and input index are
reinitialized at 1.

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
3 221 1 221 1 33
12211 1111
1332211122111
12 1 2 2 1 1 12 1 1 1 1 12
11
3322 322
12 21 2 11 11 2
...(.... ....)
=2 .(-1). .( . . . . . . . . )
a anan anan dF F e n n
e dw a n a n a n w a n a n w
efwfwfpwfwfp
 
    

 

(chain rule)
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
258
3 221 1 221 1 33
12211 1111
1 33 22111 22111
12 1 221112 111112
11
3322 322
12 21 2 11 11 2
...(.... ....)
=2 .(-1). .( . . . . . . . . )
a anan anan dF F e n n
e dw a n a n a n w a n a n w
e f wfwfp wfwfp
 
    

 

To compute this expression we need the value of the error, the
weights and bias, the derivatives of the activation functions and the
input.
To have these values, a forward pass is needed: given an input, given
a set of values for the weights and bias, we compute all the
intermediate values in the network until the end.
Then we can compute the error.
Then the backpropagation is made considering the intermediate
values computed in the forward pass, and the weights and bias are
updated with the gradients.
The process is repeated for the next input. And so on.

backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
259

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


a
i
m
a
j
m
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








m
im
i
F
s
n


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


1
(1) ()
(1) ()
mmmm
ij ij i j
mmm
iii
wk wk sa
bk bk s



 
 
a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

1
(1) () ()(())
(1) () ()
mmmmT
mmm
kkkk
bk k k



 
 
WWsa
bs
1
2
...
m
m
m m
m
m
S
F
n
F
F
n
n
F
n














 











 


s
( 1) () 2 () ()
(1)()2()
T
kkkk
kkk


 
 
WWep
bbe
The LMSE is similar, being s
= -2
e
(k).
In fact in the
ADALINE network
( ) ( ) ( ( )) ( ( )) ( ( )) ( ( ))
2( ( )) 2( ( )) 2 ( )
()
TT T
F
kk k k k k
F
kkk
nk


    

e e ta ta tn tn
tn ta e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
261

f
m
f
m
f
m+1
f
m+1
n
1
m+1 n
2
m+1
n
1
m
n
2
m
w
11
m+1
w
22
m+1
w
21
m+1w
12
m+1
w
22
m
w
11
m
b
2
m+1
b
1
m+1
b
1
m b
2
m
a
1
m+1
a
2
m+1
a
1
m
a
2
m
w
21
m
w
12
m
Layer
m
Layer
m+1
11 1
111 1
11 1
12
11 1
22 2
1
12
11 1
12
...
...
... ... ... ...
...
m
m
mm m
m
mm m
mm m
S
mm m
m
mm m
S m
mm m
SS S
mm m
S
nn n
nn n
nn n
n
nn n
n
nn n
nn n
 

 
 

 

  


 




 



 









 


 





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
262

b
j
m+1
f
m
f
m
f
m+1
f
m+1
n
i
m+1
n
j
m+1
n
i
m
n
j
m
w
ii
m+1
w
jj
m+1
w
ji
m+1
w
ij
m+1
b
i
m+1
a
i
m
a
j
m
w
ii
m
w
jj
mw
ij
m
b
i
m
11 1
111 1
11 1
12
11 1
22 2
1
12
11 1
12
...
...
... ... ... ...
...
m
m
mm m
m
mm m
mm m
S
mm m
m
mm m
S m
mm m
SS S
mm m
S
nn n
nn n
nn n
n
nn n
n
nn n
nn n
 

 
 

 

  


 




 



 









 


 





Layer
m
Layer
m+1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
263

11 1 1 1
111112211 112 2
1
11 11
11 11 1
11
1
11 12
12 12 2
22
() ()
()
()
()
()
mmmmmmmmmmm
m mmm
mmm
mm
m mmm
mmm
mm
nwawawfnwfn
nfn
wwfn
nn
nfn
wwfn
nn
   






 






Layer
m
Layer
m+1
b
j
m+1
f
m
f
m
f
m+1
f
m+1
n
i
m+1
n
j
m+1
n
i
m
n
j
m
w
ii
m+1
w
jj
m+1
w
ji
m+1
w
ij
m+1
b
i
m+1
a
i
m
a
j
m
w
ii
m
w
jj
mw
ij
m
b
i
m
i=1
j=2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
264

1
11
()
()
mmm m
j mmm i
i
j
i
jj
mm
jj
fn n
ww
f
n
nn



 

1
11
12
()
( )
() () ()... ()
m
m mmm
mmm
mm
mmmm
mmmm
S
diag f n f n f n














nfn
WWFn
nn
Fn
In general
Or in matrix notation,
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
265

1
1
1
11
11
12
11 1
1121 1
1
1
11
12
11 1
21222 2
1
...
...
...
...
m
m
m
m
m
m
mm
S
mmmm mm
m
S
m
mm
S
m mmmmm mm
mS
m
m
S
n nn FF F
F
nnnn nn
n
n Fnn FF F
F
nnnnn nn
F
n F
n
n






 


 
   

 
 


   
  
   





 
 

s
n
1
1
11
11
12
11
2
11 1
11 1
12
11 1
22 2
12
1
1
1
11
...
...
...

... ... ... ...
m
mmmm
m
m
m
m
mm
S
mmm mm
SSSS
mm m
mm m
S
mm m
mm m
S
m
m
S
m
n n FF
nnn nn
nn n
nn n
nn n
nn n
nn
nn






 
 




















  




  


 
 
 
  


11
1
1
1
2
1
1
1
1
1
...
...
m
T
m
T
m
m
mm
m
m
mm
S
F
n
F
F
n
F
n
n
n











 



 
  
 



 
 


  
n
nn
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
266

1
1
11
11
11
11
()
( ) ( ) ( )( )
( )( )
T
T m
mmm
mm m
mT m T m m T
mm
mmTm
FF
F
F










 
  
  





n
sWFn
nn n
Fn W Fn W
nn
Fn W s
12 1
...
M
M 
  ss s s
The sensitivities are computed retroactively; for M layers , @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
267

How to compute
s
M
(output layer)
??
2
1
()
[( ) ( )]
2( )
M
S
M
jj
M
MT M
j MMi
iii
M
MM M
ii i i
ta
a F
sta
nn n n


  
  
  

ta ta
()
()
2( ) ( ) 2 ( )( )
M MMM
M ii
i MM
ii
MM
MMM MM
iiii iii
afn
fn
nn
stafn fnta





   
2( )( ) 2( )
MMM M
s

  Fn t a Fn e
In matrix form,
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
268

One epoch of incremental learning (update after each input)
Forward phase
0k

ap
Backward phase
11
()( )
mmmTm



sFnW s
2( )( )
MM


 sFnta
1
(() ())
(NN output)
jjjj j
M
f
kk



aWab
aa
For
j
=1,2,..., M
do:
1
(1) () ()
(1) ()
mmmmT
mmm
kk
kk



 
 
WWsa
bbs
For
m
=M-1, M-2,..., 2,1
do:
Initialize the weights and the bias W
(1),
b
(1)
Fork=1to Q :
EndforResume of the backpropagation algorithm
make with the input
p
k
:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
269
Training set with Q
inputs
End for
Endfor

The backpropagation algorithm is similar to LMSE (this
one can be considered as a particular case of
backpropagation in the case of a single layer) – it uses
the gradient descent method.
To compute the gradient, it is needed to backpropagate
the sensitivities.
This backpropagation is made iteratively.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
270

a) Incremental learning
One input at a time is presented to the network, and the weights
and bias are updated after each input is presented. There are
several ways to do it:
net.trainFcn=’trainc’
net=train(net,P,T): “trainc trains a network with weight and bias
learning rules with incremental updates after each presentation of
an input. Inputs are presented in cyclic order.”
net.trainFcn=‘trainr(net,P,T)’
net=train(net,P,T)“trainr trains a network with weight and bias
learning rules with incremental updates after each presentation of
an input. Inputs are presented in random order.”
4.11.4 Learning styles
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
271

When these learning methods are used, the algorithms must be iterative
and they are implemented in the toolbox with names started bylearnas
for example:
learnp – perceptron rule
learngd – gradient rule
learngdm – gradient rule improved with momentum (see help)
learnh – hebb rule
learnhd- hebb rule with decaying weight (see help)
learnwh- Widrow-Hoff learning rule
The learning function is specified by
net.adaptFcn=’learngd’, for example.
Incremental learning can also be done by
net=adapt(net, P, T), but it is mandatory in this case that P and T be cell
arrays (not matrices, as in the previous methods).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
272

b) Batch training
net.trainFcn=‘trainb’(net,P,T) “trainb trains a network with weight and
bias learning rules with batch updates. The weights and biases are
updated at the end of an entire pass through the input data.”
net=train(net,P,T) , train by default is in batch mode.
In these methods the algorithms are in batch implementation, and their names start by train, as for example
traingd gradient descent
traingda gradient descent with adaptive leaning rate
trainlm Levenberg- Marquardt
trainscg scaled conjugate gradient
Note that learngd and traingd both implement the gradient descent
technique, but in different ways. The same for similar names. However
trainlm has no incremental implementation, only batch. The training
functions are specified by net.trainFcn=’trainlm’ for example.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
273

1
(1) () ()
(1) ()
mmmmT
mmm
kk
kk



 
 
WWsa
bbs
Note that
one can write
1
(1) () ()
mmmmT
kk


  ΘΘsz
where as before
1
1

1
m
mmm m






 


a
ΘWb z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
274

Run the demonstration: nnd11bc.m Example Hagan, 11.14.
Other gradient based algorithms to prevent local minima
and improve convergence:
Levenberg-Marquardt backpropagation trainlm
Bayesian regularization backpropagation trainbr
Scaled conjugate gradient backpropagation trainscg
Resilient backpropagation trainrp
See DL Toolbox User’s Guide and Chapter 9 of Hagan&Coll.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
275

4.11.5 Some suggestions for practical implementation
Choice of architecture
How many layers ?
How many neurons in each layer ?
... there is no generic response
Demo: Function approximation, nnd11fa.m
An empiric rule: to prevent overfitting, the number of weights +
bias should not be greater that the number of inputs for
training.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
276

Convergence
The problem of local minima
Demo: Steepest descent backprop#1 nnd12sd1.m
Influence of the learning rate
Demo: Steepest descent backprop#2, nnd12sd2.m
Demo: Variable learning rate, nnd12vl.m
To adapt the learning rate
>nnd
shows a GUI for the
demos
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
277

Generalization capability
After training, if efficient, the NN reproduces well the
training data.
And for other (new) data ? Does it generalize the input-
output mapping ?
For a good compromise between precision in the training
and generalization: the NN should have a number of
parameters (weights+bias) lower than the number of data
points in the training set. This is a guideline.
Demo: Generalization, nnd11gn.m
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
278

Good practices:
Divide the available data into tree parts:
Training set, the bigger, for ex. 70% of the data
Validation set, ex. 15% of the data. While training, one
verifies if the error in this set diminishes; when this error
augments from an iteration to another, then the network is
entering the overfittingcondition, and the training should
be stopped.
Test set, ex. 15%, where the NN performance will be
analyzed after the training is finished. How to divide the dataset? Randomly (Matlab: dividerand)
By successive blocks (Matlab: divideblock, divideind)
Create MLNN: net=feedforwardnet(...) or net =network(...)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
279

Illustration of the usefulness of the validation
The
overtraining
starts here
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
280
(from
nnet_ug.pdf)

4.12. Conclusions MLNN
The backpropagation algorithm is the multilayer version of
the gradient method.
Since the surface of the criterion has local minima, the
convergence to a global minimum is a critical issue.
There are several improvements of the algorithm aiming to
improve its convergence properties.(see Cap. 12 Hagan, for
example). Basically, the training methods that improve the
gradient use second order information (second derivative, or
the Hessian) building the family of Quasi-Newton methods,
combining successive gradients in order to improve
convergence, such as the conjugate gradient family.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
281

1
2
kk T
kk
f



 ΘΘ ez1
() () ()
mk mk m mT
k


 ΘΘ sz
Multilayer
Backpropagation
Comparison of the training algorithms
1
2
T
kk
kk


 ΘΘ ez
Widrow-Hoff gradient,
linear function
1kk T
kk


 ΘΘtz
Hebb rule
1

T
kk
kk


 ΘΘez
Perceptron rule
Unified notation: is the matrix of parameters, zis the vector of inputs of the layer; if there is no
bias, the b disappears from and the 1 disappears from z.
Note: in this notation the parameters are initialized at 
1
and after updated with
the data z
1
, z
2
, … in an incremental learning way.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
282
Single layer, gradient,
any functionf

4.13.1. Architecture
4.13.2. Training
4.13.3. Comparison with the backpropagation
4.13.4. Conclusion
4.13 RBFNeural Networks
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
283

4.13.1. Architecture
Layer of S
1
radial based
neurons
Linear layer
with S
2
neurons
R i
p
u
t
s
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
284

One RBF neuron with one input
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
Locality of the
RBF function
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

tribas(
n
) = 1 - abs(
n
), if -1
n
1
= 0, otherwise
Locality of the
RBF function
Output, a Input, p
One neuron with a triangular RBF with one input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
286

One RBF Gaussian neuron not centered
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
d
p
||p-w
1
||
w
1
=1.5
w
1
0
d: distance between pand w
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
Locality of the
RBF function
a
p
1.5
d
p
||p-w
1
||
w
1
=1.5
w
1
0
d: distance between pand w
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

One RBF Gaussian neuron with one input and a scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
Bigger
b
, more local
a
p
||p-w
1
||
w
1
b
1
w
1
=0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

Three RBF gaussian neurons with one input and one output
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
Scale factors equal to 1
p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
(1,51)
3
p
ae


2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
(1,51)
3
p
ae


2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
a
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

p
a
1
1
||
p
-
(-1,5)
||
w
1
a
2
1
||
p
-
0
||
w
2
a
3
1
||
p
-
1,5
||
w
3
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

p
a
1
1
||
p
-
(-1,5)
||
w
1
a
2
1
||
p
-
0
||
w
2
a
3
1
||
p
-
1,5
||
w
3
Output,
a
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

Sum of the three outputs
p
a
1
a
2
a
3
Output,
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

Sum of the three outputs
p
a
1
a
2
a
3
Output,
a
p
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

a
2
a
1
a
3
a
1
+a
2
+a
3
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
292

Weighted sum of the three outputs with ponderation factors
[1 1 0,5]
a
1
+a
2
+0,5a
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
293

10a
1
+2a
2
-a
3
Weighted sum of the three outputs with ponderation factors
[10 2 -1]
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
294

a
1
+a
2
+a
3
Sum of the three outputs with scale factors [1 2 1]
121
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
295

How to weight the outputs ?
RBF layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

How to weight the outputs ?
RBF layer
p
a
1
b
1
1
||
p
-
w
1
1
||w
1
1
a
3
b
1
3
||
p
-
w
3
||
w
1
3
a
2
b
1
2
||
p
-
w
2
||
w
1
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

How to weight the outputs ?
w
1
2
w
2
2
w
3
2
b
1
2
1
Linear layer
RBF layer
p
a
1
b
1
1
||
p
-
w
1
1
||w
1
1
a
3
b
1
3
||
p
-
w
3
||
w
1
3
a
2
b
1
2
||
p
-
w
2
||
w
1
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

Function of two arguments
a = f (p
1
,p
2
),
one neuron
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
-2
-1
0
1
2
-2
-1
0
1
20
0.5
1
Entrada p1
Funçoes de Base Radial Tridimensional
Entrada p2
Saida a
1
11
1
2
w
w







w
p
W
1
=[0,0]
T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
-2
-1
0
1
2
-2
-1
0
1
20
0.5
1
Entrada p1
Funçoes de Base Radial Tridimensional
Entrada p2
Saida a
1
11
1
2
w
w







w
p
W
1
=[0,0]
T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
RBF layer
p
a
1
b
1
1
||
p-w
1
1
||
w
1
a
3
b
1
3
||
p-w
3
1
||
w
3
1
a
2
b
1
2
||
p-w
2
1
||
w
2
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
w
1
2
w
2
2
w
3
2
b
1
2
1
Linear layer
RBF layer
p
a
1
b
1
1
||
p-w
1
1
||
w
1
a
3
b
1
3
||
p-w
3
1
||
w
3
1
a
2
b
1
2
||
p-w
2
1
||
w
2
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299

Camada RBF
w
1
2
w
2
2
w
3
2
b
1
2
1
p
a
1
a
3
a
2
Linear layer
Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons
A NN adequate for approximating relational functions
(explicit or not)

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons
A NN adequate for approximating relational functions
(explicit or not)
Any function may be approximated by a RBF NN with an
arbitrary precision, if the neurons in the RBF layer are in
sufficient number.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

RBF layer with S
1
neurons
Linear layer with
S
2
neurons
R i
n
p
u
t
s
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
301

Parameters of the network to be trained:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer Bias of the linear layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer Bias of the linear layer
.. several learning algorithms.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

Function
Process ....
Input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
error
t
a
e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

4.13.2. Training of the RBFNN
RBF Layer: clustering techniques
Linear layer: LMSE ou RLS (Recursive least squares, LMSE)
Look for the optimal placement of the centers of the
gaussians in the n-dimensional space.
Fixed the RBF layer, one of the single layer learning
algorithms may be applied.
Determine the convenient openness (variance)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
304

Training the RBF layer
Training the centers: clustering
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
305

Training the RBF layer
Training the centers: clustering
p
1
p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
305

Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
Linear layer
... ...
How many RBF neurons ? Where? Openness ?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
306

Each input point must activate more than one neuron (to
make the output a weighted sum of the outputs of each
neuron).
(this guarantees a good interpolating and generalization ability)
a
1
+a
2
+0,5a
3
a
1
+a
2
+0,5a
3
(Caso unidimensional)
good
bad
How many RBF neurons ? Where? Openness ?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
307

How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

a10 =0.2* a1 + 0.4*a2 + a3*0.5+0.5*a4
- 0.4*a5-2*a6+3*a7-0,6*a8+0.8*a9
a1 a9
a5
complete
Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

p
1
p
2
a
In two dimensions (two inputs): the centers spread in the plane (p
1
, p
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
309

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Disadvantages:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
It happens frequently that in the input space there are
regions of sparsity (low density) and regions of high
concentration of points.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
It happens frequently that in the input space there are
regions of sparsity (low density) and regions of high
concentration of points.
In the regions of higher density, it is needed more detail,
which means more neurons.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

p
1
p
2
The c-means clustering (or k-means clustering)
(in Matlab, K-means clustering, kmeans.m)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
312

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
The coefficient

2
normalizes the euclidian distance.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
The coefficient

2
normalizes the euclidian distance.
Remark: In the case of Gaussian functions
2
2
2 2
,1/2
ij
i
j
ae b





pv
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

Heuristics to compute

(Hassoun, 290)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers

2- if proper to each RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

1,0 1,5
iji j i
vv
vv




where is the center closest to
,

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers

2- if proper to each RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Training of the linear layer
Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
... ...
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
315

Camada linear
Training of the linear layer
Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
... ...
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
315

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv
1
2
K
kT
kk
ez


 θθ
Widrow-Hoff (LMSE)

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv
1
2
K
kT
kk
ez


 θθ
Widrow-Hoff (LMSE)
1
k+1
kk T
kk
e

 θθPz
MQR-RLS
(Recursive Least Squares)

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
Require more neurons than the ML FFNN for the same
accuracy, as a consequence of its locality property.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
Require more neurons than the ML FFNN for the same
accuracy, as a consequence of its locality property.
RBF are more adequate for real time applications (signal
processing, automatic control, etc.).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.4. RBF in Deep Learning Toolbox of Matlab
net=newrb(P,T, GOAL, SPREAD, MN, DF)
Maximum LMSE
Def=0
Width
Def=1
Maximum
number of
RBF
Def=dim(P)
Number of neurons added
in between each grafication
Def=25
X = [1 2 3];
T = [2.0 4.1 5.9];
net = newrb(X,T,0.1);
Y = net(X)
NEWRB, neurons = 0, MSE = 2.54
Y = 2.0000 4.1000 5.9000
view(net)net=newrbe(P,T, SPREAD)
net = newrbe(X,T)
view(net)
(adds a neuron centered in each input vector)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
318

Convergence
Choice of the architecture
4-13.5. Conclusion
Some variations: Hassoun, 296
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
319

4.14. Conclusion






@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer




@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..
Application to a great variety of problems

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..
Application to a great variety of problems
The big question: to chose in each case the
most appropriate architecture to use.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

Bibliography
Hagan, M.T., H.B. Demuth, M. Beale, Neural Network Design, 2nd
ed., ebook, 2014. The main book . Freely downloadable from
hagan.okstate.edu/nnd.html
Hassoun. M. H., Fundamentals of Artificial Neural Networks, MIT
Press, 1994.
Deep Learning Toolbox Users´s Guide, The Mathworks, 2023a.
Deep Learning, I. Goodfellow, Y. Bengio, A. Courville, MIT Press,
2016 (http://www.deeplearningbook.org
11 Sept 2023
.
Hassoun. M. H., Fundamentals of Artificial Neural Networks, MIT
Press, 1995.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
321

4aMLChapter4Neurones&Networks UC Coimbr PT23ENSplit.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

4aMLChapter4Neurones&amp;Networks UC Coimbr PT23ENSplit.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77

4aMLChapter4Neurones&Networks UC Coimbr PT23ENSplit.pdf