4aMLChapter4Neurones&Networks UC Coimbr PT23ENSplit.pdf

antoniodouradopc 7 views 184 slides Oct 14, 2024
Slide 1
Slide 1 of 524
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276
Slide 277
277
Slide 278
278
Slide 279
279
Slide 280
280
Slide 281
281
Slide 282
282
Slide 283
283
Slide 284
284
Slide 285
285
Slide 286
286
Slide 287
287
Slide 288
288
Slide 289
289
Slide 290
290
Slide 291
291
Slide 292
292
Slide 293
293
Slide 294
294
Slide 295
295
Slide 296
296
Slide 297
297
Slide 298
298
Slide 299
299
Slide 300
300
Slide 301
301
Slide 302
302
Slide 303
303
Slide 304
304
Slide 305
305
Slide 306
306
Slide 307
307
Slide 308
308
Slide 309
309
Slide 310
310
Slide 311
311
Slide 312
312
Slide 313
313
Slide 314
314
Slide 315
315
Slide 316
316
Slide 317
317
Slide 318
318
Slide 319
319
Slide 320
320
Slide 321
321
Slide 322
322
Slide 323
323
Slide 324
324
Slide 325
325
Slide 326
326
Slide 327
327
Slide 328
328
Slide 329
329
Slide 330
330
Slide 331
331
Slide 332
332
Slide 333
333
Slide 334
334
Slide 335
335
Slide 336
336
Slide 337
337
Slide 338
338
Slide 339
339
Slide 340
340
Slide 341
341
Slide 342
342
Slide 343
343
Slide 344
344
Slide 345
345
Slide 346
346
Slide 347
347
Slide 348
348
Slide 349
349
Slide 350
350
Slide 351
351
Slide 352
352
Slide 353
353
Slide 354
354
Slide 355
355
Slide 356
356
Slide 357
357
Slide 358
358
Slide 359
359
Slide 360
360
Slide 361
361
Slide 362
362
Slide 363
363
Slide 364
364
Slide 365
365
Slide 366
366
Slide 367
367
Slide 368
368
Slide 369
369
Slide 370
370
Slide 371
371
Slide 372
372
Slide 373
373
Slide 374
374
Slide 375
375
Slide 376
376
Slide 377
377
Slide 378
378
Slide 379
379
Slide 380
380
Slide 381
381
Slide 382
382
Slide 383
383
Slide 384
384
Slide 385
385
Slide 386
386
Slide 387
387
Slide 388
388
Slide 389
389
Slide 390
390
Slide 391
391
Slide 392
392
Slide 393
393
Slide 394
394
Slide 395
395
Slide 396
396
Slide 397
397
Slide 398
398
Slide 399
399
Slide 400
400
Slide 401
401
Slide 402
402
Slide 403
403
Slide 404
404
Slide 405
405
Slide 406
406
Slide 407
407
Slide 408
408
Slide 409
409
Slide 410
410
Slide 411
411
Slide 412
412
Slide 413
413
Slide 414
414
Slide 415
415
Slide 416
416
Slide 417
417
Slide 418
418
Slide 419
419
Slide 420
420
Slide 421
421
Slide 422
422
Slide 423
423
Slide 424
424
Slide 425
425
Slide 426
426
Slide 427
427
Slide 428
428
Slide 429
429
Slide 430
430
Slide 431
431
Slide 432
432
Slide 433
433
Slide 434
434
Slide 435
435
Slide 436
436
Slide 437
437
Slide 438
438
Slide 439
439
Slide 440
440
Slide 441
441
Slide 442
442
Slide 443
443
Slide 444
444
Slide 445
445
Slide 446
446
Slide 447
447
Slide 448
448
Slide 449
449
Slide 450
450
Slide 451
451
Slide 452
452
Slide 453
453
Slide 454
454
Slide 455
455
Slide 456
456
Slide 457
457
Slide 458
458
Slide 459
459
Slide 460
460
Slide 461
461
Slide 462
462
Slide 463
463
Slide 464
464
Slide 465
465
Slide 466
466
Slide 467
467
Slide 468
468
Slide 469
469
Slide 470
470
Slide 471
471
Slide 472
472
Slide 473
473
Slide 474
474
Slide 475
475
Slide 476
476
Slide 477
477
Slide 478
478
Slide 479
479
Slide 480
480
Slide 481
481
Slide 482
482
Slide 483
483
Slide 484
484
Slide 485
485
Slide 486
486
Slide 487
487
Slide 488
488
Slide 489
489
Slide 490
490
Slide 491
491
Slide 492
492
Slide 493
493
Slide 494
494
Slide 495
495
Slide 496
496
Slide 497
497
Slide 498
498
Slide 499
499
Slide 500
500
Slide 501
501
Slide 502
502
Slide 503
503
Slide 504
504
Slide 505
505
Slide 506
506
Slide 507
507
Slide 508
508
Slide 509
509
Slide 510
510
Slide 511
511
Slide 512
512
Slide 513
513
Slide 514
514
Slide 515
515
Slide 516
516
Slide 517
517
Slide 518
518
Slide 519
519
Slide 520
520
Slide 521
521
Slide 522
522
Slide 523
523
Slide 524
524

About This Presentation

A course on Machine Learning, Chapter 4, Department of Informatics Engineering, University of Coimbra, Portugal, 2023, Shallow Neural Neworks, the file is splitted to emulate animation in Power Point


Slide Content

Chapter 4
Neurons, Layers, (Neural) Networks
https://en.wikipedia.org/wiki/Artificial_neuron11 Sept 2023
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
148

Biological Neurons
Brain: –10
11
neurons
–10
4
connections by neuron
- parallel computation
Response Time: –10
-3
s , the biological
-10
-9
s , the electrical circuits
Dendrites
Axons
Sinapses
Body
(from DL Toolbox User’s Guide)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
149

Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Biological Network
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
150

Artificial Neuron: mathematical model
Axon
Cell Body
Dendrites
Synapses
Synapses
dendrites
sinapses
axon
body
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
151

Artificial Neuron: mathematical model
artificial
Axon
Cell Body
Dendrites
Synapses
Synapses
output
inputs
weights
Sum+activation f
dendrites
sinapses
axon
body
(de Brause)
f
a

p
1
p
n
w
1n
w
11
n

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
151

Axon
Cell B ody
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell B ody
Dendrites
Synapses
Synapses
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
152

Axon
Cell B ody
Dendrites
Synapses
Synapses
Axon
Cell Body
Dendrites
Synapses
Synapses
Axon
Cell B ody
Dendrites
Synapses
Synapses
(de
Brause
)
f
a

p
1
p
n
w
1n
w
11
n

(de
Brause
)
f
a

p
1
p
m
w
1m
w
11
n

p
n
(de
Brause
)
f
a

p
1
w
12
w
11
n

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
152

4.1. Neuron with a single input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
() ( )
nwp
a
f
n
f
wp
 
input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
f : activation function (or transfer function)
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
f : activation function (or transfer function)
w: input weight
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

4.1. Neuron with a single input
f : activation function (or transfer function)
w: input weight
() ( )
nwp
a
f
n
f
wp
 
0 0
if (0) 0
pa
f



input weight activation output
f
p
wa
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
153

Neuron with a single input and a bias
The bias allows the output to be nonzero even if the inputs are zero. @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
The bias allows the output to be nonzero even if the inputs are zero. @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
b
1
bias
The bias allows the output to be nonzero even if the inputs are zero.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

inputweight activation output
f
w
a

p
sum
n
Neuron with a single input and a bias
1[ ] ''
1
() ( ) ('')
p
nwpb wb wp
afnfwpbfwp

   





b
1
bias
The bias allows the output to be nonzero even if the inputs are zero.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
154

4.2. Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
hardlim( ) awpb
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

4.2. Activation functions
binary
hardlim( ) an
hardlim( ) awpb
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
155
The bias bmakes horizontal
displacements of the step function

Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


p
urelin( ) awpb


(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

continuous linear
p
urelin( ) an
an


p
urelin( ) awpb


(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
156
The bias bmakes horizontal
displacements of the linear
function

Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable) Sigmoid unipolar
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable) Sigmoid unipolar
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



Sigmoid unipolar
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



Sigmoid unipolar
(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Continuous nonlinear monotones (differentiable)
1
logsig( )
1
n
an
e



()
1
1
wp b
a
e



Sigmoid unipolar
(from DL Toolbox Manual)
Activation functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
157
The bias b
makes
horizontal
displacements
of the sigmoid
function

Sigmoid bipolar (hiperbolic tangent)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
158

Sigmoid bipolar (hiperbolic tangent)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
158

tansi
g
( ) , in Matlab
nn
nn
ee
an
ee





Sigmoid bipolar (hiperbolic tangent)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
158

(from Hagan&coll.)
Activation
functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
159

4.3. Neuron with several inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

4.3. Neuron with several inputs
output
2 inputs 2 weights activation
f
b
a

1
p
1
sum
p
2
w
12
w
11
n bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

4.3. Neuron with several inputs

1
11 1 12 2 11 12
2
W
( ) (W )
p
nwpwpbw w b pb
p
afnfpb

    


 
output
2 inputs 2 weights activation
f
b
a

1
p
1
sum
p
2
w
12
w
11
n bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
160

Neuron with R inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

Neuron with R inputs
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

11 1 12 2 1
... W
(W )
RR
nwpwp wpb n pb
af pb



Neuron with R inputs
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
161

Neuron with R inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

Neuron with R inputs
1
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

Neuron with R inputs
1
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

Neuron with R inputs
compact notation (vectors and
matrices)
1
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
162

4.4. Neuron RBF (Radial Basis Function)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
a
p
||p-w
1
||
w
1
b
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
One RBF neuron,
gaussian, with
one input and a
scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

4.4. Neuron RBF (Radial Basis Function) Activation function continuous non linear, non monotone
2
1
()
p
wb
ae


a
p
||p-w
1
||
w
1
b
0,5
1
2
case w
1
=0
One RBF neuron,
gaussian, with
one input and a
scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
163

One RBF neuron, gaussian, with R inputs and scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

() ( ) a radbas n radbas w p b 
One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

2
()

n
a radbas n
e

 
() ( ) a radbas n radbas w p b 
One RBF neuron, gaussian, with R inputs and scale factor
1
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
164

4.5. Layer of neurons
2 inputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165
3 inputs

4.5. Layer of neurons
2 inputs
b
1
a
1
1
f
w
11

p
1
w
22
b
21
f
a
2

p
2
w
12
w
21
w
23
w
13
p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
165
3 inputs

4.5. Layer of neurons
S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
166

4.5. Layer of neurons
S neurons
11 22
(W )
...
( ) 1,2,...,
ii i iRRi
iii
afpb
nwpwp wpb
afn i S




@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
166

Compact (matrices)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
neuron
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact (matrices)
11 12 1
21 22 2
12
...
...
W
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













1
2
p
...
R
p
p
p













1
2
b
...
S
b
b
b















1
2
a
...
S
a
a
a















a f(Wp+b) 
(from DL Toolbox Manual)
neuron
input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
167

Compact notation (matrices) with layer index
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
111,1 1
(IW ) af pb
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

Compact notation (matrices) with layer index
1,1
,

from the origin 1 (second index) to the destination 1 (first in dex)

from origin to the destination
ij
IW Input Weight Matrix
LW Layer Weight Matrix
ji


111,1 1
(IW ) af pb
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
168

4.6. Multilayer network
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) 
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

4.6. Multilayer network
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) 
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
from DL Toolbox UG)
f
3
can model any nonlinear relation between input p and output a
3 layers
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
169

Compact notation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) y
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

Compact notation
Output layer
Hidden layers
3 3 3,2 2 2,1 1 1,1
123
a f (LW f (LW f (IW p+b )+b )+b ) y
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
a f (IW p+b ) a f (LW a +b ) a f (LW a +b )  
Layer 1 Layer 2 Layer 3
(from DL Toolbox Manual)
f
3
can model any nonlinear relation between input pand output y
hidden layer: its output is not seen from the outside
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
170

4.7. RBFNN-Radial Basis Function Neural
Network
11,11 2,112
2
1
1
1,1 1,1
( IW ) a (LW a b ))
i element of a
IW ector composed by the row of IW
iii
th
i
th
i
a radbas p b purelin
a
vi
 


Layer of S
1
RBF neurons
Linear layer with
S
2
neurons
R e
n
t
r
a
d
a
s
(De DL Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
171

4.8. The binary perceptron: training and
learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
reinforcement learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172

4.8. The binary perceptron: training and
learning
Learning rule or training algorithm:
- a systematic procedure to modify the weights and
the bias of a NN such that it will work as we need
The most common learning approaches:
supervised learning
reinforcement learning
unsupervised learning
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
172





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173






Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning




Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.



Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.

Unsupervised learning


Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173


Reinforcement learning
The NN receives only a classification that favors good
performances.

Unsupervised learning
The NN has only inputs, not outputs, and learns to
categorize them (dividing the inputs in classes, as in
clustering)

Supervised learning
A set of Qexamples of correct behavior of the NN is
given (inputs p, outputs t)






12
12
, , , , ..., ,
Q
Q
pt pt pt
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
173

Binary Perceptron learning
For a single layer one may eliminate the layer index
1
11,111
ahardlim(IWpb) 
R
I
n
p
u
t
s
Sneurons
Sneurons
(from DL Toolbox U.G.)
Perceptron is the first ANN for which mathematical developments have been
made. Its training is still illustrative of many issues in any ANN training.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
174

11 12 1
21 22 2 1,1
12
...
...
WIW
... ... ... ...
...
R
R
SS SR
ww w
ww w
ww w













hardlim( ) hardlim( )
T
iiii
an wpb


1
2
W
...
T
T
T
S
w
w
w







n
i
a
i1
0
Each neuron divides the space in two regions

1
2
12

...
...
i
i
i
Si
T
iii iR
w
w
w
w
www w








Line iof W
Column iof W
For a single layer one may eliminate the layer index
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
175

One neuron, two inputs
1
11 1 12 2
hardlim(W )
hardlim( )
hardlim( )
T
apb
wp b
wp wp b





Decision boundary,
n
= 0:
1111122
0
T
nwpbwpwpb

  
11
21
12 12
wb
pp
ww
 
...
One straight line
p
2
=mp
1
+ d
1
(from DL Toolbox U.G.)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
176

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

11
1
12
1
1
W
1
w
w
w
b




 




 

11
21
12 12
wb
pp
ww
 
112
21
1
1 ( 0)
T
nwpbpp
pp n

 
 
n >0
n <0
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
177

p
1
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

n >0
W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

p
1
12
2
1
(0) 1
T
nwpbp
np

 

n >0
n <0
W
1 -1
1
0
11
1
12
1
0
W
1
b
w
w
w



 



 

p
2
11
21
12 12
wb
pp
ww
 
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
178

1 -1
1
0
11
1
12
1
1
W
1
b
w
w
w



 



 

p
1
p
2
11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

  
n >0
n <0
W
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
179

1 -1
1
0
11
1
12
1
1
W
0
b
w
w
w



 



 

p
1
p
2
11
1
T
nwpbp

 
n >0
n <0
W
11 1 12 2
0 wp wp b


-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
180

11
1
12
1
W
1
1
w
w
w
b



 





 

11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

  
n >0
n <0
W
1 -1
1
0
p
1
p
2
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
181

-1
-1
1
0
11
1
12
0
W
1
1
w
w
w
b



 





 

p
1
p
2
11
21
12 12
wb
pp
ww
 
12
1
T
nwpbp

 
n >0n <0
W
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
182

1 -1
1
0
11
1
12
1
1
W
1
w
w
w
b




 




 

p
1
p
2
11
21
12 12
wb
pp
ww
 
112
1
T
nwpbpp

 
n >0
n <0
W
-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
183

Select the weights vector W perpendicular to that
boundary,of any magnitude (what matters is its
direction and sense)
Draw a boundary straight line In any problem:
The vector W points to the regionn>0
Calculate now the neededbmaking the calculations
for a point of the boundary.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
184

Example 4.1: logical OR
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
p
2
1
p
1
1
.5
.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
p
2
1
p
1
1
.5
.5
n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
0,5*0 0,5*0,5 0
[0,0.5] point
b


n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185

Example 4.1: logical OR
p
1
ORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
1 1 1
True, 1
False, 0
W
0,5
0,5
W







p
2
1
p
1
1
.5
.5
0,5*0 0,5*0,5 0
[0,0.5] point
b


0.25 b


n >0
n <0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
185


1
0,5
-0,25
Example 4.1: logical OR
p
1
p
2
p
1
OR p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
186

Perceptron with several neurons
One decision boundary per
neuron. For the neuron i, it will
be
0
T
ii
wpb


It can classify into
2
S
categories, with Sneurons.
11,111
ahardlim(IWpb) 
(from DL Toolbox U.G.)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
187

Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Two neurons
with outputs
0101
,,,
0011






n
1
>0
n
2
>0
n
1
<0
n
2
>0
n
1
>0
n
2
<0
n
1
<0
n
2
<0
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

p
1
2
4
4
0 -2
-2
-4
p
2
-2012
Class 1:{ ,,,}
3223
 
 
 
-3 -3 -1
Class 2:{ , , }
211

  

  

  
134
Class 4:{
,,
}
012






-2 -1 1
Class 3:
{
,,
}
-1 0 -2

  

  

  
N2
N1
W
1
W
2
n
2
>0
n
2
<0
n
2
>0
n
2
<0
n
1
>0
n
1
<0
n
1
<0
n
1
>0
Two neurons
with outputs
0101
,,,
0011






n
1
>0
n
2
>0
n
1
<0
n
2
>0
n
1
>0
n
2
<0
n
1
<0
n
2
<0
12
1,8 1
W W
22









b
1
=1b
2
=2
Example 4.2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
188

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
-2 3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

-1,8
-2
-2
1
Input Output
N1
N2
21
1
1
-2 3
0 0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
189

Learning rule (automatic learning)






12
12
, , , , ..., ,
Q
Q
pt pt pt
Given a training dataset: correct pairs of {input, output}:
12 3
123
110
,1, ,0, ,0
22 1
pt p t p t

      
    

     
      
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
190

12 3
123
110
,1, ,0, ,0
22 1
pt p t p t
     
           
        
p
1
p
2
1
1
2
2
0 -1
-1
t=0
t=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
191

p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
1 2
3
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
+
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
+
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

Initialization of the weights:
random


(0)
1
1.0 0.8
T
w
p
1
-1
p
2
1
1
2
2
0
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
1
is badly classified !!!
It should be t =1 and it is
t =0
.
W
(0)
+
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
192

t=0
t=1
Neuron without bias
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
p
2
badly classifified !!!
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
p
2
badly classifified !!!
Should be t =0 and it is t =1
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

t=0
t=1
Neuron without bias
p
2
badly classifified !!!
Should be t =0 and it is t =1
(1) (0)
11 1
wwp


p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0W
(0)
p
1
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
193

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
3
badly classified!!!
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
3
badly classified!!!
Should be t =0, it is t =1.
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(2) (1)
11 2
wwp


p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
p
3
badly classified!!!
Should be t =0, it is t =1.
W
(2)
p
2
-p
2
W
(1)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
194

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
All points are now
well classified
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

(3) (2)
11 3
wwp


p
1
1
1
2
2
0
-1
-1
t=0
t=1
Neuron without bias
1 2
3
n >0, t=1
n <0, t=0
All points are now
well classified
(see numerical calculations in
Hagan 4-10)
W
(3)
p
3
-p
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
195

Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(1)
p
1
1
1
2
2
0
-1
-1
1 2
3
n <0, t=0
W
(3)
p
3
p
1
p
2
1
1
2
2
0
-1
-1
1 2
3
n >0, t=1
n <0, t=0
W
(2)
p
2
p
1
p
2
1
1
2
2
0
-1
-1
t=0
t=1
1 2
3
n >0, t=1
n <0, t=0
W
(0)
Example 4.3:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
196

And if a point, when analyzed, is correct ? – nothing is changed
If
t =1
and
a = 0
, then do
() ()
11
new old
wwp


() ()
11
new old
wwp


If
t = 0 and a = 1,
then do
If
t = a
then do
() ()
11
new old
ww
Defining
e = t-a
() ()
11
.
new old
wwep
e = +1
e = -1
e = 0
() ()
11
.
new old
wwep
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
197

Only some problems can be solved by one neuron without
bias
What happens if the neuron has bias b ?
The bias is a weight of an
input equal to 1 !
1
1
1

1
hardlim( . )
T
wp
z
b
az










() ()
11
.
new old
ez



Perceptron learning rule
1
(from DL Toolbox U.G)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
198

Perceptron with several neurons
() ()
.
new old
iiii
ez


For the neuron
i, i =1,..., S
iii
eta


1
i
p
z






i
i
i
w
b








() ()
.
new T old T T
iiii
ez


.
new old T
ez 
1
...
T
T
S
 











1
...
T
T
T
S
z
z
z













12
...
S
eee e
Perceptron rule !!!
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
199

Does the procedure converge always ?
Yes, if there exists a solution: see the proof in Hagan, 4-15.
Limitations of the perceptron
It solves only linearly separable problems
Normalized perceptron rule
.
T
new old
z
e
z

- All the inputs have the same importance (problem of
outliers)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
200

Example 4.4. : exclusive OR,XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
???
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
201

Conclusions about the perceptron The perceptron learning rule is a supervised one
It is simple, but powerful
With one single layer, the binary perceptron can solve
only linearly separable problems.
The problems nonlinearly separable can be solved by
multilayer architectures.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
202

4.9.1. The ADALINE , Adaptive Linear Network
R
inputs
S
neurons
()
jj
purelin

 aWpb
( ) -th row of
TT
ij i j i i
a purelin b i wp w W
RS
( from DL Toolbox U.G.)
j
p
j
a
j
p
j
a
j
n
j
n
4.9 General Single Layer Networks
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
203

11 12 11
21 22 22
31 32 3
12
...
...
......
...
T
R
T
R
R
T
SS SRS SRSR
ww w
ww w
ww w
ww w



 

 

 


 

 

 


w
w
W
w
11 12 1 11 21 1 1
21 22 2 12 22 2 2
12
12 12
... ...
... ...
...
... ... ... ... ... ... ... ... ...
... ...
T
QR
T
QRT
Q
T
RR RQ QQ RQ Q
R
xQ QR QR
pp p pp p
pp p pp p
pp p pp p



 

 

 
  


 

 

 

 
p
p
Ppp p P
p
Notation for one layer of S neurons
11 12 1
12
12
(1)
...
... ... ... ...
...
...
1 1 ... 1
Q
Q
RR RQ
RxQ
pp p
pp p





 
 




P
Zzz z
1

11 1 11
21 2 22
1(1)
...
...
... ... ... ......
...
T
R
T
R
T
SSRSS SR
wwb
wwb
wwb

 
 
 

 
 
 
θ
θ
ΘWb
θ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
204

11 12 11
21 22 22
12
12
...
...
...
... ... ... ......
...
T
Q
T
Q
Q
SQ
T
SS SQS
SQ
tt t
tt t
tt t











 
 






 

 
T
T
Tttt
T
11 12 11
21 22 22
12
12
...
...
...
... ... ... ......
...
T
Q
T
Q
Q
SQ
T
SS SQS
aa a
aa a
aa a

 
 
 
 

 
 
 
A
A
Aaaa
A
Notation for one layer of S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
205

-the input vector of dimension
-th component of the th input vector,

-th output vector obtained with the -th input
row vector of the outputs of the -th neuron for all
j
ij
jj
T
i
jR
pi j
jj
i

p
ap
A



the inputs from 1 to
-th component da -th output , output of the -th neuron
-th target output, when the input is applied
row vector of the target of -th neuron for all the input
ij j
jj
T
i
Q
ai j i
j
i
a
tp
Τ


s from 1 to
component of the -th target
weight between the neuron and the component of the input vect or
ij j
ij
Q
tij
wij
t 

Notation for one layer of S neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
206

Case of one neuron with two inputs
111111122
() ( )
TT
jj jjjj
a purelin n purelin b b w p w p b  wp wp
0 if 0
0 if 0
0 if 0
an
an
an



Decision boundary
Divides the plane into two zones.
Can classify patterns linearly
separable.
The LMSE training optimizes the
position of the boundary with
respect to the training patterns
(advantage over the perceptron).
1
(from DL Toolbox UG)
( from DL Toolbox UG)
(W=w
1
T
)
j
p
j
a
j
n
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
207

11 1
1
112 2

1
1
j
j
jj
wp
wp
b
b
 

 
 

 


 
 
wp
θz

11 111 122 1 2
11
12
11 12
1
1
21
1

1

TT
jj jj jj j
j
T
jj
abwp
w
w
b
wwb
wp b p p
p
p











  


wp z θ
θz
1
b
11
11 11 11 11 12 21 11 12 21 1 1 1
T
p
atwpwpbwwbp


    



θz
3 unknowns
x
1
st
The 3 needed equations are obtained by the
application of 3 inputs z
1
,z
2
, z
3
:
j
p
1
j
a
1
j
n
1
j
t
Neuron 1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
208


12
12 12 11 12 12 22 11 12 22 1 2 1
T
p
atwpwpbwwbp


   



θz

13
13 13 11 13 12 23 11 12 23 1 3 1
T
p
atwpwpbwwbp


   



θz
2
nd
3
rd

13
11 12
11 12 13
11 12 13 11 12 13 21 22 23
2
11
11
1
111

A (supervised training, the output A


TT
TT T
ppp
aaa ttt pppwwb
T











 


θθzzz Z

should be equal to the target T)
1
st
2
nd
3
rd
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
209

111
TTT
 ATθZ
1
1 1
TT

T θZ
Does exist
(
Z)
-1
???
Almost never!!!
What to do, to find a solution ???
More inputs
More equations than
unknowns
One can use the pseudo-inverse of
Z
instead of the
inverse, for a solution with non-null error:
1
11 1
()
TT TTT


 θTZTZZZ
Is it possible? Exists (ZZ
T
)
-1
? Computational cost ?
It would be better to process the data iteratively
(Q>R+1)
bias
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
210

Minimization of the mean square error: LMSE
(Least Mean
Square Error)
Supervised training: one gives to the network a set Q of
input-target pairs; for neuron i
The obtained output
a
ik
is compared with the target
t
ik
,
and one obtains the error
e
ik
= t
ik
–a
ik
, k=1, ..., Q
The LMSE algorithm will adjust the weights and the bias
in order to minimize the mean square error,
mse
,
penalizing in the same way the positive and the negative
errors, i.e., minimizing
11 22
{ , },{ , },...,{ , }
ii QiQ
tt t pp p
22
11
11
()
QQ
ik ik ik
kk
mse e t a
QQ



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
211

1
2
1
...
i
i
i
i
iR
R
w
w
b
w
b









 


 






w
θ
1
2
...
1
1
j j
j
j
Rj
pp p








 


 






p
z
11 22
... 1
TTT
ij i j i j i j iR Rj i j j i
apbwpwp wpb  wθzzθ
1
2 222 2
12 12
1
... ...
...
i
Q
iT
ik i i iQ i i iQ i i
k
iQ
e
e
eee e ee e
e




  





ee
22 2
2
()()()()
( )( ) 2
TTTT
ij ij ij ij j i ij j i ij j i
TT T T TT
ij i j ij j i ij ij j i i j j i
eta t t t
tttt
   
   
zθ zθ zθ
θz zθ zθ θzzθ
For a set of Q training pairs, the sum of all squared errors will be
1
bw
i
1
w
iR
Neuron i
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
212

Consider the concatenated vectors of inputs and outputs
11 12 1
22 22 2
12
12
(1)
...
...
...... ... ... ...
...
... 1 11
Q
Q
Q
RQ RR
R
Q
pp p
pp p
p pp




 





Zzz z
12
1
...
iii iQ
Q
tt t


 


T
12
1
...
T
iii iQ
Q
aa a



A
1
2
(1)
...
i
i
i
iR
R
w
w
w
b









θ
T
ii

T
AθZ
1
QR

(More equations than unknowns, there is
no exact solution. We look for the one
that minimizes the squared errors (MSE))
1
2
(1) ...
T
T
T
T
Q
QR









z
z
Z
z
TTTT
ii iii

T
eTATθZ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
213

Now
() ( )( )( )( )

2
TTT TT T
iiiiiii ii i i
TTTT TT
ii i i i i i i
TTTTT
ii i i i i
F 
  
 
θeeT-AT-A T-θZT-Zθ
TT TZθ θZT θZZθ
TT TZθ θZZθ
This expression must be minimized with respect to

i
:
((1))((1)(1)(1) RQQR R R


1
() ()
TTT
iii

 θZZZTZT
22 0
TT
ii ii
   ZT ZZ θ ZZ θ ZT
gradient: ( ) ( 2 )
( ) 2 ( ) ( )
2 2
TTTTT
iiiiiii
TTTTT
ii i i i i
T
ii
F  

  
 
θ TT TZθ θZZθ
TT TZθ θZZθ
ZT ZZ θ
() ()
() 2
TT
T


T
ax xa a
xAx Ax Ax Ax
Note:
( if A is symmetric)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
214

Is it a minimum? Is it a maximum? Is it a saddle point ?
Second derivative
2
2
()(2 2 )
TT
T ii i i
ii



ee ZT ZZθ
ZZ
θθ
ZZ
T
> 0, has a unique global minimum
ZZ
T

0, has a weak global minimum or has no stationary point.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
215

Note about the signal of matrices
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
216
The signal of a symmetric matrix A is related to the
signal of its eigenvalues:
A>0, positive definite, all eigenvalues are >0
A≥0, positive semidefinite, eigenvalues are ≥0
A≤0, negative semidefinite, eigenvalues are ≤ 0
A<0, negative definite, eigenvalues are <0
A is indefinite if it has positive and negative
eigenvalues

Interpretations of the condition of minimum:
() 2
TTTTT
iiiiii i
F  θTTTZθθZZθ
If the inputs of the network are random vectors, then the error will
be random and the objective function to be minimized will be:
() [ 2 ]
TTTTT
iiiiiii
FE
 
θ TT TZθ θZZθ
() [ ]2[ ][]
TTTT
iii ii
T
ii
FEE E  θTTTZθθZθ Z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
217

A correlation matrix is positive definite (>0) or is
positive semidefinite (0). If the inputs applied to the network are uncorrelated,
the correlation matrix ZZ
T
is diagonal, being the
diagonal elements the squares of the inputs. In these
conditions the correlation matrix is positive definite,
and the global minimum exists, and it is unique.
The existence or not of a unique global minimum
depends on the characteristics of the input training set.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
218

For a network with S neurons,
11 1 1
121 2 2
1
(1)
...
...
...... ... ... ...
... .. .
...
R
T
R
T
SSSRS
SR
wwb
wwb
wwb







 



 


θ
Θ
θ
(Q > (R+1))
1 11 12
1
2 21 22
12 2
12
...
...
...
... ... ... ...
...
QT
QT
Q
T
S
SQ SS
SQ
t tt
T
t tt
T
T
t tt



 

 
 

 

 



Ttt t
neuron
1
212
...
T
T
TT
S
T
S
T
TTTT
T



 





T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
219

1
() ()
TT
iii

 θZZZTZT



11 1
12
1
12
1
12
() () ...()
() ...
(
...

)

TT T
S
T
S
T
S
TT
 










ZZ ZT ZZ ZT ZZ ZT
ZZ Z T T T
ZZ
Θθθ
ZT
θ
Possible? Exists (ZZ
T
)
-1
? Computational cost ?
1
(( ) )
TTT


 ΘZZZT TZ
The computation of the inverse, for a high number R of
inputs, is difficult. Does it exist a (computationally) more
simple algorithm?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
220

Iterative LMSE algorithm
For each training pair (
p
iq ,
t
iq
)
2
nd
compute the squared error
e
ip
2
=(t
ip
-a
ip
)
2
3
rd
compute the gradient of the squared error with
respect to the weights and the bias :
2
2
2
2
1
2 for the weights 1,2,...,
2 for the bias
iq iq
iq iqkk
ij ij
iqiq
iq iq
R
ee
ee jR
ww
e e
ee
bb


  


 



For
q
from
1
to
Q
do
(
case of one neuron
i) :
Initialize the weights (for example randomly)
k=1 (iteration 1) w
ij
k
= w
ij
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
221
1
st
compute the neuron output
a
iq
for the input p
iq

1
[-
= = [()]
1,2,...
1, for the bias
iq iq iqk
iq i q kk k
ij ij ij
R
k
iq ij jq jq k
j ij
iq
ta e
tb
ww w
twpbpjR
w
e
b
b

   

 
 
 
 




T
wp
Computing the derivatives (case of linear neuron)
we will have
2
2
iq iq q
ee 

z
1
2
...
1
1
q
q
q
q
Rq
p
p
p














p
z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
222

4
th
Apply the gradient method to minimize
F(

i), being
k the iteration index
1
()
kk k
ii i
F


 θθ θ
in the present case
1
1
2
2
kk
iiiqq
kk
ii iq
e
bb e






ww p
i
i
b







w
θ
2
()
k
iiq
Fe θ
2
2-2
1
q
iq iq q iq
eee


 

 


p
z
resulting in
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
223

11
2()()2
kk Tk Tk T
ii iqqi i iqq
ee


    ww pw w p
1
1
2
2
kk T
qq
kk
q






WW ep
bb e
But
Or, more compact,
1
11 1
... ... 2 ...
kk
TT
q
T
q
TT
SS sq
e
e


 


 



 


 



  
ww
p
ww
For one layer of Sneurons will come
1
11 1
... ... 2 ...
kk
q
SS sq
bb e
bb e



  

  


  

  
 


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
224

1
2
kk T
qq


 ΘΘez
a more compact form can be written,


( ) ( ) 1
TT
qq

 


ΘWb z p
Taking into account that Remark: this LMSE algorithm is also known as the Widrow-
Hoff rule (see more in Hagan).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
225

The LMSE iterative algorithm is an approximation of the
gradient method, because the computed gradient in each
iteration is an approximation of the true gradient.
Its convergence depends on the learning coefficient .
If the successive input vectors are statistically independent,
and if (
k
) and
z
(
k
) are statistically independent, it
converges.
The learning coefficient must verify
max1
0



where

max
is the maximum eigenvalue of the input
correlation matrix
R=ZZ
T
(see more in Hagan, 10-9
).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
226

4.9.2. The particular case of the associative memory
(one layer without bias)
1
=
jj
R
i
j
ik k
j
k
awp



aWp
Associative memory: learns Q prototype pairs of input-output
vectors
11 22
{ , },{ , },...,{ , }
QQ
pt pt pt
W
pj
naj
R
Rx1
SxR
Sx1 Sx1
a= n= purelin(Wp)
S
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
227

4.9.2. The particular case of the associative memory
(one layer without bias)
1
=
jj
R
i
j
ik k
j
k
awp



aWp
Associative memory: learns Q prototype pairs of input-output
vectors
11 22
{ , },{ , },...,{ , }
QQ
pt pt pt
W
pj
naj
R
Rx1
SxR
Sx1 Sx1
a= n= purelin(Wp)
S
Giving an input prototype, the output is the correct one.
Giving an input approximate to a prototype, the output will
also be approximate to the corresponding output: a small
change in the input will produce a small change in the output.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
227

If R > Q (more inputs than prototypes),Pis rectangular, with
more rows than columns. We have a system with more
unknowns than equations.
If Phas maximum characteristic (if its columns are linearly
independent), the pseudo-inverse ca be used to find an exact
solutionfor the system of equations.
 WP T
1 TT +-
W=TP =T(PP)P
P
+
is the pseudo-inversa of Moore-Penrose
.
.
Supervised training: WP=A and we want A=T , so WP=T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
228

1
1
11 1
, if is inverti l
)
e
(
b



  


W
WP T WPP TP
TP
W
P
PP TP
Solving in order to
W
, using
P
-1
:
If R=Q,the number Q of prototypes is equal to the
number Rof network inputs, and if the prototypes are
linearly independent, then Pcan be inverted.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
229

If R < Q (more prototypes than inputs),Pis rectangular,
with more columns than rows. We have a system with more
equations than unknowns.
In general there is no exact solution. Only an approximated
solution can be found, using the Penrose pseudo-inverse that
minimizes the sum of the squared errors.
 WP T
1 TT +-
W=TP =TP(PP)
Note that the pseudo-inverse does not have the same formula
as in the previous case R>Q.
It gives the solution that minimizes the sum of the squared
errors (see slides 211 and 213).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
230

The formulae of the associative memory are a particular case
of the ADALINE; here there is no bias, and as a consequence
instead of the ADALINE, one has the W of the associative
memory.
The recursive version will be the same of ADALINE
There is an historic algorithm, the Hebb’s rule, that has a
different form
1
, q 1,2,...,
kk T
qq
Q


  WW tp
1
2
kk T
qq


 WW ep
The ADALINE rule comes from the mathematical development for the
minimization of the squared error (LMSE). The Hebb’s rule results from
an empirical principle proposed by the neurobiologist Hebb in its book
The Organization of Behavior, 1949.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
231

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
232
1
Iterative minimization of a general
function g( ) with respect to :
At iteration ,
()
is a constant to be fixed by the user.
kk
kk k
xx
xx
k
g
xx x gx
x 



 

This is the gradient method and is the base of
many ML learning algorithms.
The general Gradient Method

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
233
1
bw
i
1
w
iR
w
ij
target
2
2 .(-1). . 2 . .
ij
w
ij ij
j
j
FFean
e
weanw
efp efp


  


2
2 .(-1). .1 2 . .1
b
FFean
e
beanb
ef ef



  


For the weight w
ij
between neuron iand input p
j
For the bias b
4.10. One layer with any activation function
j
t
2()
to be minimized
afn
eta
F
e
 

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
234
1
[-
= = [()]
. 1,2,...
= . 1,2,...
.1, for the bias
iq iq iqk
iq kk k
ij ij ij
k
R
q kk
iq ij jq kk
j ij q ij
jq
iq
ta e
tfn
ww w
n f
tf wpb j R
wnw
fp j R
e
fb
b



   

 
  
  
 





chain rule
2
2.. -2..
1
q
iq iq q iq
eefef



 

 


p
z
The gradient of the squared error, for all weights and bias, will come
for the input p
q
q=1, ...., Q
at iteration k :
2
ij
w
ij ij
F
Fean
e
weanw

 


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
235
1
2.
kk T
qq
f
.



 ΘΘez
and the update formula will be, for a layer of neurons
The Widraw-Hoff algorithm is the particular case of the gradient
method when the activation function is linear, and so its derivative is
one.
This is the general iterative gradient method for one layer, also
known as the LMSE (least minimum squared error).
After passing through all the inputs (1.... Q) we say that an epoch of
training is complete.
The process restarts with the actual parameters and again for the
inputs (1 ... Q), the second epoch is ended.
And so on, until the convergence criteria is reached or a fixed
maximum number of epochs is attained.

4.11. Synthesis of the learning techniques of a
network with a single layer of linear neurons


1
T




P
Wb Z
Consider the unified notation for the parameter matrix
(having or not bias) and Z the input matrix (with or without
bias). The similarity of the several learning methods becomes
clear:
RS
1
b
1
1
2
...
1
1
j j
j
j
R
j
p
p
p








 


 






p
z
TT
ii
b 

θw
( from DL Toolbox UG)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
236

Types of problems
Classification (pattern recognition)
• Number of prototypes not greater than the number of
inputs, Q≤R : pseudo-inverse rule or Hebb’s rule.
• Number of prototypes greater than the number of
inputs , Q>R: pseudo-inverse (LMSE), or its recursive
version.
Function approximation: LMSE (Widrow-Hoff)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
237

2. Associative memories: linear activation functions , without bias;

is here
justWand Zjust P;
(i) if the number of prototypes is not higher that the number of
characteristics (inputs), Q<R,
1
11
, q
kk T
qq
Q




   ΘΘ tz
iterative Hebb’s rule (see Hagan)
-1 TT

 ΘT(ZZ)Z TZ
Pseudo-inverse : if the prototypes are linearly
independent.
1. Perceptron: binary activation functions (hardlim, hardlims), where is the
matrix of weights and bias:
1kk T
qq


 ΘΘez
Rule with learning coefficient
3. ADALINE: linear neurons, with bias. The number of prototypes is higher
that the number of characteristics (inputs) (Q>R):
1
2
kk T
qq


 ΘΘ ez
LMSE recursive algorithm (W-H)
LMSE non recursive algorithm
1
1
()
if exists ( )
TT
T


 ΘTZZZ TZ
ZZ
T
ΘTZ
(non iterative, batch, Hebb, if prototypes are orthonormal)
(ii) if Q>R, most common situations
-1
TT

 ΘTZ(ZZ) TZ
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
238
1
2
kk T
qq


 ΘΘ ez
Recursive Batch

Conclusions (one layer)
the LMSE (or Widrow-Hoff) learning rule is used in NN
Adaline ( linear neurons), and is a particular case of the
gradient method for the LMSE criteria.
The main advantage of the LMSE is its recursive
implementation.
A great care must be put in the preparation of the training set:
its elements should be statistically independent. In practice
this is rarely possible.
The LMSE is very adequate for functions approximation and for
parameter learning in dynamical systems.
In classification, ADLINE solves only linearly separable
problems.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
239

3 3 3,2 2 2,1 1 1,1
123
a =f (LW f (LW f (IW p+b )+b +b )
2 1 1 1,1 1 2 2 2,1 1 2 3 3 3, 2 3
a =f (IW p+b ) a =f (LW a +b ) a =f (LW a +b )
4.11. The multilayer network (MLNN)
Hidden layers Output layer
Layer 1 Layer 2 Layer 3
I
n
p
u
t
s
(from NN Toolbox Manual)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
240

Hidden layer
Output layer
33

3,2 2 2,1 1 1,1
123
a
y
f (LW f (LW f (IW p+b )+b +b )
1 1 1,1 1 2 2 2,1 1 2 3 3 3,2 2 3
( + ) ( + ) ( + )   a f IW p b a f LW a b a f LW a b
Layer 1 Layer 2 Layer 3
(from NN Toolbox Manual)
4.11. The multilayer network (MLNN)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
241

Example: exclusive OR, XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
4.11.1 The MLNN for pattern recognition
pt
Note: p
1
XORp
2
=(p
1
ORp
2
)AND(p
1
NANDp
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
242

Example: exclusive OR, XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
4.11.1 The MLNN for pattern recognition
pt
Note: p
1
XORp
2
=(p
1
ORp
2
)AND(p
1
NANDp
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
242

Example: exclusive OR, XOR
p
1
XORp
2
p
2
p
1
0 0 0
1 1 0
1 0 1
0 1 1
p
2
1
p
1
1
.5
.5
4.11.1 The MLNN for pattern recognition
pt
Note: p
1
XORp
2
=(p
1
ORp
2
)AND(p
1
NANDp
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
242

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
V
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
1x0.5+1x0+b=0
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

W
1
1
=[2 2]
p
2
1
p
1
1
.5
.5
W
1
1
2
2
Calculation of b
2x0.5+2x0+b=0
b= -1
W
1
1
=[1 1]
1x0.5+1x0+b=0
b= -0.5
Neuron 1, Layer 1
n<0
F
a
1
1
=0
N>0
V
a
1
1
=1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
243

p
2
1
p
1
1
.5
.5
2
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
V
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
F
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
-1x1.5-1x0+b=0
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

p
2
1
p
1
1
.5
.5
2
2
W
1
2
=[-2 -2]
n<0
F
a
1
2
=0
n>0
V
a
1
2
=1
Calculation of b
-2x1.5-2x0+b=0
b= 3
W
1
2
W
1
2
=[-1 -1]
-1x1.5-1x0+b=0
b= 1.5
1,5
Neuron 2, Layer 1
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
244

Neurons 1 and 2, Layer 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
a
1
2
a
1
1
Does
not
exist
0 0
F 0 1
F 1 0
V 1 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

Neurons 1 and 2, Layer 1
p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
F
V
a
1
2
a
1
1
Does
not
exist
0 0
F 0 1
F 1 0
V 1 1
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
a12
1
a11
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
245

(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

2
nd
layer
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

p
2
1
p
1
1
.5
.5
2
2
F
1,5
a
1
1
=1,
a
1
2
=1
a
1
1
=1,
a
1
2
=0
a
1
1
=0,
a
1
2
=1
2
nd
layer
(from Hagan&Coll)
4.11.1 The MLNN for pattern recognition
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
246

4.11.2 The MLNN for functions approximation
Example
12
11 11
12
21 12
12
1
1
2
10 1
10 1
10 0
10
ww
ww
bb
b







10
10
-10
10
1
1
0
2
()fn n

1
1
()
1
n
fn
e



(from Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
247

11
12
11 11
11 1 21 2
22 21212
11 1 12 2
222
11 12
222
11 12
() ()
11
= ( ) ( )
11
11
= ( ) ( )
11
nn
wpb wpb
anwawab
wwb
ee
wwb
ee

 
  






4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
248

10
10
-10
10
1
1
0
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

10
10
-10
10
1
1
0 2 0
2
-2
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

10
10
-10
10
1
1
0 2 0
2
-2
2 0
2
-2
?
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
249
if the input is a ramp,how will be the output

(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
10
10
-10
10
1
1
0
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

10
10
1
2(10 10)
1
1
p
a
e



1
1(10 10)
1
1
p
a
e



10
-10 20
2
-
2
10
10
-10
10
1
1
0
(de Hagan&Coll)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
250

Influence of each parameter:
Run the demo nnd11fa.m to exercise the functions approximation problem. Download all the demos from
http://hagan.okstate.edu/nnd.html11 Sept 2023 , Neural Network Design Demonstrations in a zip
file. The number in the file name is the chapter of the book.
(from Hagan&Coll)
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
251
With these 4 degrees of freedom it is possible to obtain many
nonlinear mappings between the input and the output !!

Very flexible, with many degrees of freedom.
A NN with a sigmoidal hidden layer and a linear output
layer, can approximate anyfunction of interest, with any
precision level, provided that in the hidden layer there is a
sufficient number of neurons. (Theoretical result).
The question is to find the weights and bias that produce
a good mapping between a known input date set and a
known output dataset, i.e., training the NN for given data.
4.11.2 The MLNN for functions approximation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
252

4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output t,
target
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
Super
visor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

4.11.3 The backpropagation algorithm
+
-
error
To adjust the weights and the bias in order to minimize the error.
known input
p
Process ....
Function ....
known
process
output t,
target
NN output
a
Super
visor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
253

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule 4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule
derivative of the
criterion with respect to
the weight
New Old
ww








4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Similarly to LMSE, it is an iterative process.
In each iteration a weight is updated according to the rule
derivative of the
criterion with respect to
the weight
New Old
ww








This is equivalent to the gradient method to minimize the
criterion.
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
254

Criterion: squared error (quadratic function of the error)
()(())(())()()
MT M T
kk
F
kkkk    W,b t a t a e e
k,
index of iteration
W ,
matrix of weights
b
,
matrix of bias
11 12 1
21 22 2
12
...
...
... ... ... ...
...
R
R
SS SRSR
ww w
ww w
ww w














W
12
...
i
T
iii i
S
aa a 

a
M
a
output of the last layer
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
255

In each iteration
k
, with input p
k
, and for layer m
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
How to
compute
the
derivatives
???
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
with the learning coeficient

.
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256

(1) () ()
(1) () ()
mm
ij ijm
ij
mm
jjm
j
F
wk wk k
w
F
bk bk k
b



 


 

In each iteration
k
, with input p
k
, and for layer m
with the learning coeficient

.
How to
compute
the
derivatives
???
Computed with the values of the
weights and bias at iteration k
4.11.3 The backpropagation algorithm
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
256
Remark about notation: it is assumed that we start at iteration 1 with input 1, and
use incremental learning; then the iteration index is equal to the input index all along
an epoch of training. In the next epoch, the iteration index and input index are
reinitialized at 1.

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

33
11
2
12
( ) ( ())( ()) () (),
if scalar ( ) for the input [ ]
TT
kk
kkkT
F
tak tak ekek
Fek p pp
  

W,b
w
12
1
w
21
1
How to calculate
1
12
dF
dw
???
F=e
2
f
1
f
1
f
2
f
2
f
3
-+
t
arget
a
1
3
error
n
3
n
2
1
n
2
2
n
1
1
n
1
2
w
11
3
w
12
3
w
11
2
w
22
2
w
21
2w
12
2
w
22
1
w
11
1
b
1
3
b
2
2
b
1
2
b
1
1 b
2
1
a
1
2
a
2
2
a
1
1
a
2
1
e
backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
257
3 221 1 221 1 33
12211 1111
1332211122111
12 1 2 2 1 1 12 1 1 1 1 12
11
3322 322
12 21 2 11 11 2
...(.... ....)
=2 .(-1). .( . . . . . . . . )
a anan anan dF F e n n
e dw a n a n a n w a n a n w
efwfwfpwfwfp
 
    

 

(chain rule)
p
1
p
2
example of one output e=t-a
1
3
, F=e
2

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
258
3 221 1 221 1 33
12211 1111
1 33 22111 22111
12 1 221112 111112
11
3322 322
12 21 2 11 11 2
...(.... ....)
=2 .(-1). .( . . . . . . . . )
a anan anan dF F e n n
e dw a n a n a n w a n a n w
e f wfwfp wfwfp
 
    

 

To compute this expression we need the value of the error, the
weights and bias, the derivatives of the activation functions and the
input.
To have these values, a forward pass is needed: given an input, given
a set of values for the weights and bias, we compute all the
intermediate values in the network until the end.
Then we can compute the error.
Then the backpropagation is made considering the intermediate
values computed in the forward pass, and the weights and bias are
updated with the gradients.
The process is repeated for the next input. And so on.

backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
259

backpropagation
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
259

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


a
i
m
a
j
m
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








m
im
i
F
s
n


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


1
(1) ()
(1) ()
mmmm
ij ij i j
mmm
iii
wk wk sa
bk bk s



 
 
a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


1
(1) ()
(1) ()
mmmm
ij ij i j
mmm
iii
wk wk sa
bk bk s



 
 
a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

Analytic development
(adapted from Hagan&Coll.)
m
i
mmm
ij i ij
m
i
mmm
iii
n FF
wnw
n FF
bnb
 


 


1
1
1
1
= =1
m
S
mmmm
iijji
j
mm
m ii
j mm
ij i
nwab
nn
a
wb








1

mm m
ij i mm
ij i
FF
sa s
wb






m
im
i
F
s
n


1
(1) ()
(1) ()
mmmm
ij ij i j
mmm
iii
wk wk sa
bk bk s



 
 
a
i
m
a
j
m
b
j
m
f
1
f
1
f
2
f
2
n
i
m
n
j
i
n
i
m-1
n
j
m-1
w
ii
m
w
jj
m
w
ji
m
w
ij
m
b
i
m
a
i
m-1
a
j
m-1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
260

1
(1) () ()(())
(1) () ()
mmmmT
mmm
kkkk
bk k k



 
 
WWsa
bs
1
2
...
m
m
m m
m
m
S
F
n
F
F
n
n
F
n














 











 


s
( 1) () 2 () ()
(1)()2()
T
kkkk
kkk


 
 
WWep
bbe
The LMSE is similar, being s
= -2
e
(k).
In fact in the
ADALINE network
( ) ( ) ( ( )) ( ( )) ( ( )) ( ( ))
2( ( )) 2( ( )) 2 ( )
()
TT T
F
kk k k k k
F
kkk
nk


    

e e ta ta tn tn
tn ta e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
261

f
m
f
m
f
m+1
f
m+1
n
1
m+1 n
2
m+1
n
1
m
n
2
m
w
11
m+1
w
22
m+1
w
21
m+1w
12
m+1
w
22
m
w
11
m
b
2
m+1
b
1
m+1
b
1
m b
2
m
a
1
m+1
a
2
m+1
a
1
m
a
2
m
w
21
m
w
12
m
Layer
m
Layer
m+1
11 1
111 1
11 1
12
11 1
22 2
1
12
11 1
12
...
...
... ... ... ...
...
m
m
mm m
m
mm m
mm m
S
mm m
m
mm m
S m
mm m
SS S
mm m
S
nn n
nn n
nn n
n
nn n
n
nn n
nn n
 

 
 

 

  


 




 



 









 


 





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
262

b
j
m+1
f
m
f
m
f
m+1
f
m+1
n
i
m+1
n
j
m+1
n
i
m
n
j
m
w
ii
m+1
w
jj
m+1
w
ji
m+1
w
ij
m+1
b
i
m+1
a
i
m
a
j
m
w
ii
m
w
jj
mw
ij
m
b
i
m
11 1
111 1
11 1
12
11 1
22 2
1
12
11 1
12
...
...
... ... ... ...
...
m
m
mm m
m
mm m
mm m
S
mm m
m
mm m
S m
mm m
SS S
mm m
S
nn n
nn n
nn n
n
nn n
n
nn n
nn n
 

 
 

 

  


 




 



 









 


 





Layer
m
Layer
m+1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
263

11 1 1 1
111112211 112 2
1
11 11
11 11 1
11
1
11 12
12 12 2
22
() ()
()
()
()
()
mmmmmmmmmmm
m mmm
mmm
mm
m mmm
mmm
mm
nwawawfnwfn
nfn
wwfn
nn
nfn
wwfn
nn
   






 






Layer
m
Layer
m+1
b
j
m+1
f
m
f
m
f
m+1
f
m+1
n
i
m+1
n
j
m+1
n
i
m
n
j
m
w
ii
m+1
w
jj
m+1
w
ji
m+1
w
ij
m+1
b
i
m+1
a
i
m
a
j
m
w
ii
m
w
jj
mw
ij
m
b
i
m
i=1
j=2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
264

1
11
()
()
mmm m
j mmm i
i
j
i
jj
mm
jj
fn n
ww
f
n
nn



 

1
11
12
()
( )
() () ()... ()
m
m mmm
mmm
mm
mmmm
mmmm
S
diag f n f n f n














nfn
WWFn
nn
Fn
In general
Or in matrix notation,
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
265

1
1
1
11
11
12
11 1
1121 1
1
1
11
12
11 1
21222 2
1
...
...
...
...
m
m
m
m
m
m
mm
S
mmmm mm
m
S
m
mm
S
m mmmmm mm
mS
m
m
S
n nn FF F
F
nnnn nn
n
n Fnn FF F
F
nnnnn nn
F
n F
n
n






 


 
   

 
 


   
  
   





 
 

s
n
1
1
11
11
12
11
2
11 1
11 1
12
11 1
22 2
12
1
1
1
11
...
...
...

... ... ... ...
m
mmmm
m
m
m
m
mm
S
mmm mm
SSSS
mm m
mm m
S
mm m
mm m
S
m
m
S
m
n n FF
nnn nn
nn n
nn n
nn n
nn n
nn
nn






 
 




















  




  


 
 
 
  


11
1
1
1
2
1
1
1
1
1
...
...
m
T
m
T
m
m
mm
m
m
mm
S
F
n
F
F
n
F
n
n
n











 



 
  
 



 
 


  
n
nn
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
266

1
1
11
11
11
11
()
( ) ( ) ( )( )
( )( )
T
T m
mmm
mm m
mT m T m m T
mm
mmTm
FF
F
F










 
  
  





n
sWFn
nn n
Fn W Fn W
nn
Fn W s
12 1
...
M
M 
  ss s s
The sensitivities are computed retroactively; for M layers , @ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
267

How to compute
s
M
(output layer)
??
2
1
()
[( ) ( )]
2( )
M
S
M
jj
M
MT M
j MMi
iii
M
MM M
ii i i
ta
a F
sta
nn n n


  
  
  

ta ta
()
()
2( ) ( ) 2 ( )( )
M MMM
M ii
i MM
ii
MM
MMM MM
iiii iii
afn
fn
nn
stafn fnta





   
2( )( ) 2( )
MMM M
s

  Fn t a Fn e
In matrix form,
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
268

One epoch of incremental learning (update after each input)
Forward phase
0k

ap
Backward phase
11
()( )
mmmTm



sFnW s
2( )( )
MM


 sFnta
1
(() ())
(NN output)
jjjj j
M
f
kk



aWab
aa
For
j
=1,2,..., M
do:
1
(1) () ()
(1) ()
mmmmT
mmm
kk
kk



 
 
WWsa
bbs
For
m
=M-1, M-2,..., 2,1
do:
Initialize the weights and the bias W
(1),
b
(1)
Fork=1to Q :
EndforResume of the backpropagation algorithm
make with the input
p
k
:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
269
Training set with Q
inputs
End for
Endfor

The backpropagation algorithm is similar to LMSE (this
one can be considered as a particular case of
backpropagation in the case of a single layer) – it uses
the gradient descent method.
To compute the gradient, it is needed to backpropagate
the sensitivities.
This backpropagation is made iteratively.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
270

a) Incremental learning
One input at a time is presented to the network, and the weights
and bias are updated after each input is presented. There are
several ways to do it:
net.trainFcn=’trainc’
net=train(net,P,T): “trainc trains a network with weight and bias
learning rules with incremental updates after each presentation of
an input. Inputs are presented in cyclic order.”
net.trainFcn=‘trainr(net,P,T)’
net=train(net,P,T)“trainr trains a network with weight and bias
learning rules with incremental updates after each presentation of
an input. Inputs are presented in random order.”
4.11.4 Learning styles
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
271

When these learning methods are used, the algorithms must be iterative
and they are implemented in the toolbox with names started bylearnas
for example:
learnp – perceptron rule
learngd – gradient rule
learngdm – gradient rule improved with momentum (see help)
learnh – hebb rule
learnhd- hebb rule with decaying weight (see help)
learnwh- Widrow-Hoff learning rule
The learning function is specified by
net.adaptFcn=’learngd’, for example.
Incremental learning can also be done by
net=adapt(net, P, T), but it is mandatory in this case that P and T be cell
arrays (not matrices, as in the previous methods).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
272

b) Batch training
net.trainFcn=‘trainb’(net,P,T) “trainb trains a network with weight and
bias learning rules with batch updates. The weights and biases are
updated at the end of an entire pass through the input data.”
net=train(net,P,T) , train by default is in batch mode.
In these methods the algorithms are in batch implementation, and their names start by train, as for example
traingd gradient descent
traingda gradient descent with adaptive leaning rate
trainlm Levenberg- Marquardt
trainscg scaled conjugate gradient
Note that learngd and traingd both implement the gradient descent
technique, but in different ways. The same for similar names. However
trainlm has no incremental implementation, only batch. The training
functions are specified by net.trainFcn=’trainlm’ for example.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
273

1
(1) () ()
(1) ()
mmmmT
mmm
kk
kk



 
 
WWsa
bbs
Note that
one can write
1
(1) () ()
mmmmT
kk


  ΘΘsz
where as before
1
1

1
m
mmm m






 


a
ΘWb z
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
274

Run the demonstration: nnd11bc.m Example Hagan, 11.14.
Other gradient based algorithms to prevent local minima
and improve convergence:
Levenberg-Marquardt backpropagation trainlm
Bayesian regularization backpropagation trainbr
Scaled conjugate gradient backpropagation trainscg
Resilient backpropagation trainrp
See DL Toolbox User’s Guide and Chapter 9 of Hagan&Coll.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
275

4.11.5 Some suggestions for practical implementation
Choice of architecture
How many layers ?
How many neurons in each layer ?
... there is no generic response
Demo: Function approximation, nnd11fa.m
An empiric rule: to prevent overfitting, the number of weights +
bias should not be greater that the number of inputs for
training.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
276

Convergence
The problem of local minima
Demo: Steepest descent backprop#1 nnd12sd1.m
Influence of the learning rate
Demo: Steepest descent backprop#2, nnd12sd2.m
Demo: Variable learning rate, nnd12vl.m
To adapt the learning rate
>nnd
shows a GUI for the
demos
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
277

Generalization capability
After training, if efficient, the NN reproduces well the
training data.
And for other (new) data ? Does it generalize the input-
output mapping ?
For a good compromise between precision in the training
and generalization: the NN should have a number of
parameters (weights+bias) lower than the number of data
points in the training set. This is a guideline.
Demo: Generalization, nnd11gn.m
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
278

Good practices:
Divide the available data into tree parts:
Training set, the bigger, for ex. 70% of the data
Validation set, ex. 15% of the data. While training, one
verifies if the error in this set diminishes; when this error
augments from an iteration to another, then the network is
entering the overfittingcondition, and the training should
be stopped.
Test set, ex. 15%, where the NN performance will be
analyzed after the training is finished. How to divide the dataset? Randomly (Matlab: dividerand)
By successive blocks (Matlab: divideblock, divideind)
Create MLNN: net=feedforwardnet(...) or net =network(...)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
279

Illustration of the usefulness of the validation
The
overtraining
starts here
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
280
(from
nnet_ug.pdf)

4.12. Conclusions MLNN
The backpropagation algorithm is the multilayer version of
the gradient method.
Since the surface of the criterion has local minima, the
convergence to a global minimum is a critical issue.
There are several improvements of the algorithm aiming to
improve its convergence properties.(see Cap. 12 Hagan, for
example). Basically, the training methods that improve the
gradient use second order information (second derivative, or
the Hessian) building the family of Quasi-Newton methods,
combining successive gradients in order to improve
convergence, such as the conjugate gradient family.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
281

1
2
kk T
kk
f



 ΘΘ ez1
() () ()
mk mk m mT
k


 ΘΘ sz
Multilayer
Backpropagation
Comparison of the training algorithms
1
2
T
kk
kk


 ΘΘ ez
Widrow-Hoff gradient,
linear function
1kk T
kk


 ΘΘtz
Hebb rule
1

T
kk
kk


 ΘΘez
Perceptron rule
Unified notation: is the matrix of parameters, zis the vector of inputs of the layer; if there is no
bias, the b disappears from and the 1 disappears from z.
Note: in this notation the parameters are initialized at 
1
and after updated with
the data z
1
, z
2
, … in an incremental learning way.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
282
Single layer, gradient,
any functionf

1
2
kk T
kk
f



 ΘΘ ez1
() () ()
mk mk m mT
k


 ΘΘ sz
Multilayer
Backpropagation
Comparison of the training algorithms
1
2
T
kk
kk


 ΘΘ ez
Widrow-Hoff gradient,
linear function
1kk T
kk


 ΘΘtz
Hebb rule
1

T
kk
kk


 ΘΘez
Perceptron rule
Unified notation: is the matrix of parameters, zis the vector of inputs of the layer; if there is no
bias, the b disappears from and the 1 disappears from z.
Note: in this notation the parameters are initialized at 
1
and after updated with
the data z
1
, z
2
, … in an incremental learning way.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
282
Single layer, gradient,
any functionf

4.13.1. Architecture
4.13.2. Training
4.13.3. Comparison with the backpropagation
4.13.4. Conclusion
4.13 RBFNeural Networks
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
283

4.13.1. Architecture
Layer of S
1
radial based
neurons
Linear layer
with S
2
neurons
R i
p
u
t
s
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
284

One RBF neuron with one input
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial Gaussiana’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
RADBAS(p) calculates its output
according to:
a = exp(-p^2)
One RBF neuron with one input
Output,
a
Input, p
Locality of the
RBF function
2
()
p
ae


w=0
b=1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
285

tribas(
n
) = 1 - abs(
n
), if -1
n
1
= 0, otherwise
Locality of the
RBF function
Output, a Input, p
One neuron with a triangular RBF with one input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
286

One RBF Gaussian neuron not centered
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
d
p
||p-w
1
||
w
1
=1.5
w
1
0
d: distance between pand w
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
a
p
1.5
d
p
||p-w
1
||
w
1
=1.5
w
1
0
d: distance between pand w
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

2 2
(1.5) ()p d
ae e
 

One RBF Gaussian neuron not centered
Locality of the
RBF function
a
p
1.5
d
p
||p-w
1
||
w
1
=1.5
w
1
0
d: distance between pand w
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
287

One RBF Gaussian neuron with one input and a scale factor
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
a
p
||p-w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
Bigger
b
, more local
a
p
||p-w
1
||
w
1
b
1
w
1
=0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

2
1
()
p
wb
ae


0,5
1
2
(input p-w
1
)
p = -3:.1:3;
a = radbas(p);
plot(p,a)
title(‘Função de Base Radial’);
xlabel(‘Entrada p');
ylabel(‘Saída a');
radbas(n) calculates its output
according to:
a = exp(-n^2)
One RBF Gaussian neuron with one input and a scale factor
Bigger
b
, more local
a
p
||p-w
1
||
w
1
b
1
w
1
=0
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
288

Three RBF gaussian neurons with one input and one output
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
Scale factors equal to 1
p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
(1,51)
3
p
ae


2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Three RBF gaussian neurons with one input and one output
a
1
Scale factors equal to 1
2
(1,51)
3
p
ae


2
((1,5)1)
1
p
ae
 

p
a
1
b
1
||
p
-
w
1
||
w
1
a
2
b
2
||
p
-
w
2
||
w
2
a
3
b
3
||
p
-
w
3
||
w
3
2
(01)
2
p
ae


a
2
a
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
289

Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

p
a
1
1
||
p
-
(-1,5)
||
w
1
a
2
1
||
p
-
0
||
w
2
a
3
1
||
p
-
1,5
||
w
3
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

p
a
1
1
||
p
-
(-1,5)
||
w
1
a
2
1
||
p
-
0
||
w
2
a
3
1
||
p
-
1,5
||
w
3
Output,
a
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
290

Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

Sum of the three outputs
p
a
1
a
2
a
3
Output,
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

Sum of the three outputs
p
a
1
a
2
a
3
Output,
a
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

Sum of the three outputs
p
a
1
a
2
a
3
Output,
a
p
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
291

a
2
a
1
a
3
a
1
+a
2
+a
3
Sum of the three outputs
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
292

Weighted sum of the three outputs with ponderation factors
[1 1 0,5]
a
1
+a
2
+0,5a
3
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
293

10a
1
+2a
2
-a
3
Weighted sum of the three outputs with ponderation factors
[10 2 -1]
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
294

a
1
+a
2
+a
3
Sum of the three outputs with scale factors [1 2 1]
121
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
295

How to weight the outputs ?
RBF layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

How to weight the outputs ?
RBF layer
p
a
1
b
1
1
||
p
-
w
1
1
||w
1
1
a
3
b
1
3
||
p
-
w
3
||
w
1
3
a
2
b
1
2
||
p
-
w
2
||
w
1
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

How to weight the outputs ?
w
1
2
w
2
2
w
3
2
b
1
2
1
Linear layer
RBF layer
p
a
1
b
1
1
||
p
-
w
1
1
||w
1
1
a
3
b
1
3
||
p
-
w
3
||
w
1
3
a
2
b
1
2
||
p
-
w
2
||
w
1
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
296

Function of two arguments
a = f (p
1
,p
2
),
one neuron
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
-2
-1
0
1
2
-2
-1
0
1
20
0.5
1
Entrada p1
Funçoes de Base Radial Tridimensional
Entrada p2
Saida a
1
11
1
2
w
w







w
p
W
1
=[0,0]
T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Function of two arguments
a = f (p
1
,p
2
),
one neuron
2
1
()b
ae


p-w
1
p-w
1
2
p
p




p
a
||
p
-
w
1
||
w
1
b
1
-2
-1
0
1
2
-2
-1
0
1
20
0.5
1
Entrada p1
Funçoes de Base Radial Tridimensional
Entrada p2
Saida a
1
11
1
2
w
w







w
p
W
1
=[0,0]
T
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
297

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
RBF layer
p
a
1
b
1
1
||
p-w
1
1
||
w
1
a
3
b
1
3
||
p-w
3
1
||
w
3
1
a
2
b
1
2
||
p-w
2
1
||
w
2
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
w
1
2
w
2
2
w
3
2
b
1
2
1
Linear layer
RBF layer
p
a
1
b
1
1
||
p-w
1
1
||
w
1
a
3
b
1
3
||
p-w
3
1
||
w
3
1
a
2
b
1
2
||
p-w
2
1
||
w
2
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
298

Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299

Camada RBF
w
1
2
w
2
2
w
3
2
b
1
2
1
p
a
1
a
3
a
2
Linear layer
Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299

Camada RBF
w
1
2
w
2
2
w
3
2
b
1
2
1
p
a
1
a
3
a
2
Linear layer
Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299

Camada RBF
w
1
2
w
2
2
w
3
2
b
1
2
1
p
a
1
a
3
a
2
Linear layer
Functions of two arguments
a = f (p
1
,p
2
),
three neurons
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
299





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons
A NN adequate for approximating relational functions
(explicit or not)

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

Great flexibility
Capacity to approximate a great variety of functions with
only three neurons
A NN adequate for approximating relational functions
(explicit or not)
Any function may be approximated by a RBF NN with an
arbitrary precision, if the neurons in the RBF layer are in
sufficient number.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
300

RBF layer with S
1
neurons
Linear layer with
S
2
neurons
R i
n
p
u
t
s
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
301

Parameters of the network to be trained:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer Bias of the linear layer
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

Parameters of the network to be trained:
Centers of the radial functions Openness of the radial functions (the radius) Weights of the linear layer Bias of the linear layer
.. several learning algorithms.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
302

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

Function
Process ....
Input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
error
t
a
e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
error
t
a
e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

w 1
w 2
w 3
b
1
2
1
p
a
1
b
1
||p-w
1
||
w
1
a
3
b
3
||p-w
3
||
w
2
a
2
b
2
||p-w
2
||
w
2
Function
Process ....
Input
error
t
a
e
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
303

4.13.2. Training of the RBFNN
RBF Layer: clustering techniques
Linear layer: LMSE ou RLS (Recursive least squares, LMSE)
Look for the optimal placement of the centers of the
gaussians in the n-dimensional space.
Fixed the RBF layer, one of the single layer learning
algorithms may be applied.
Determine the convenient openness (variance)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
304

Training the RBF layer
Training the centers: clustering
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
305

Training the RBF layer
Training the centers: clustering
p
1
p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
305

Training the RBF layer
Training the centers: clustering
p
1
p
2
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
305

Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
Linear layer
... ...
How many RBF neurons ? Where? Openness ?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
306

Each input point must activate more than one neuron (to
make the output a weighted sum of the outputs of each
neuron).
(this guarantees a good interpolating and generalization ability)
a
1
+a
2
+0,5a
3
a
1
+a
2
+0,5a
3
(Caso unidimensional)
good
bad
How many RBF neurons ? Where? Openness ?
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
307

How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

a10 =0.2* a1 + 0.4*a2 + a3*0.5+0.5*a4
- 0.4*a5-2*a6+3*a7-0,6*a8+0.8*a9
a1 a9
a5
complete
Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

a10 =0.2* a1 + 0.4*a2 + a3*0.5+0.5*a4
- 0.4*a5-2*a6+3*a7-0,6*a8+0.8*a9
a1 a9
a5
complete
Coverage: where there is data, neurons should exist: the
neurons must spread over all the space where the output of the
ANN can be non null, meaning that at least one neuron must be
excited by any point existing in the input space
(unidimensional case)
a
1
+a
2
+0,5a
3
How many RBF neurons ? Where? Openness ?
bad
good
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
308

p
1
p
2
a
In two dimensions (two inputs): the centers spread in the plane (p
1
, p
2
)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
309

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Grid partition): the centers are uniformly distributed in the
plane
(p
1
, p
2
)
p
2
p
1
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
310

Disadvantages:
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
It happens frequently that in the input space there are
regions of sparsity (low density) and regions of high
concentration of points.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

Disadvantages:
The number of neurons grows exponentially with the number
of inputs -the curse of dimensionality.
It happens frequently that in the input space there are
regions of sparsity (low density) and regions of high
concentration of points.
In the regions of higher density, it is needed more detail,
which means more neurons.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
311

p
1
p
2
The c-means clustering (or k-means clustering)
(in Matlab, K-means clustering, kmeans.m)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
312

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
The coefficient

2
normalizes the euclidian distance.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

The clustering techniques are used to fix the number and the localization of the centers of the RBF functions.
4.13.2.1.2. Training of the widths
Theoretically, a RBFNN where the RBF functions have all the
same width are universal approximators
(Hassoun,290).
2
2
,1/
ij
ij
b i
j
ae e b





 
pv
pv
The coefficient

2
normalizes the euclidian distance.
Remark: In the case of Gaussian functions
2
2
2 2
,1/2
ij
i
j
ae b





pv
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
313

Heuristics to compute

(Hassoun, 290)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers


2- if proper to each RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

1,0 1,5
iji j i
vv
vv




where is the center closest to
,

Heuristics to compute

(Hassoun, 290)
1- if equal to all RBF
ij
ij
cc
cc



compute the distance between each center e and its nearest neighbor
average of the distances between the neighbors centers


2- if proper to each RBF
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
314

Training of the linear layer
Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
... ...
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
315

Camada linear
Training of the linear layer
Camada RBF
w
1
2
w
2
2
w
R
2
b
1
2
1
p
a
1
a
R
a
2
... ...
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
315

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv
1
2
K
kT
kk
ez


 θθ
Widrow-Hoff (LMSE)

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

w
1
2
w
2
2
w
R
2
b
1
2
1
a
1
a
R
a
2
Camada linear
2
ij
ij
b i
j
ae e





pv
pv
1
2
K
kT
kk
ez


 θθ
Widrow-Hoff (LMSE)
1
k+1
kk T
kk
e

 θθPz
MQR-RLS
(Recursive Least Squares)

12
... 1
T
R
aa a z
o
kk k
eta


a
o
This method is similar to LMSE. The
learning coefficient depends on a
matrix that gives information about the
statistical quality of the parameters 
that are obtained in each iteration.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
316

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
Require more neurons than the ML FFNN for the same
accuracy, as a consequence of its locality property.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.3. Comparison with the backpropagation NN (RP) (FFNN)
RBF have the advantage of locality: in each iteration, only
some neurons are modified.
Require more data for training than the ML feedforward NN
(according to Hassoun, pp. 294, for the same accuracy in function approximation, 10 times more). It allows the hybrid training in two phases: unsupervised
with clustering, supervised by HW of RLS, becoming more
rapid than the backpropagation.
Require more neurons than the ML FFNN for the same
accuracy, as a consequence of its locality property.
RBF are more adequate for real time applications (signal
processing, automatic control, etc.).
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
317

4.13.4. RBF in Deep Learning Toolbox of Matlab
net=newrb(P,T, GOAL, SPREAD, MN, DF)
Maximum LMSE
Def=0
Width
Def=1
Maximum
number of
RBF
Def=dim(P)
Number of neurons added
in between each grafication
Def=25
X = [1 2 3];
T = [2.0 4.1 5.9];
net = newrb(X,T,0.1);
Y = net(X)
NEWRB, neurons = 0, MSE = 2.54
Y = 2.0000 4.1000 5.9000
view(net)net=newrbe(P,T, SPREAD)
net = newrbe(X,T)
view(net)
(adds a neuron centered in each input vector)
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
318

Convergence
Choice of the architecture
4-13.5. Conclusion
Some variations: Hassoun, 296
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
319

4.14. Conclusion






@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons





@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer




@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers



@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..


@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..
Application to a great variety of problems

@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

4.14. Conclusion
High number of degrees of freedom. Freedom to
chose the activation functions and the
architectures:
- types of neurons
- number of neurons per layer
- number of layers
- structure of the internal
connections..
Application to a great variety of problems
The big question: to chose in each case the
most appropriate architecture to use.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
320

Bibliography
Hagan, M.T., H.B. Demuth, M. Beale, Neural Network Design, 2nd
ed., ebook, 2014. The main book . Freely downloadable from
hagan.okstate.edu/nnd.html
Hassoun. M. H., Fundamentals of Artificial Neural Networks, MIT
Press, 1994.
Deep Learning Toolbox Users´s Guide, The Mathworks, 2023a.
Deep Learning, I. Goodfellow, Y. Bengio, A. Courville, MIT Press,
2016 (http://www.deeplearningbook.org
11 Sept 2023
.
Hassoun. M. H., Fundamentals of Artificial Neural Networks, MIT
Press, 1995.
@ADC/DEI/FCTUC/MEI/MEB/2023/MachineLearning/Chapt.4. Shallow NNets
321