Adaptive Resonance Theory (ART)

5,798 views 46 slides May 01, 2016
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Introduction to Adaptive Resonance Theory (ART) neural networks including:
Introduction (Stability-Plasticity Dilemma)
ART Network
ART Types
Basic ART network Architecture
ART Algorithm and Learning
ART Computational Example
ART Application
Conclusion
Main References


Slide Content

1
Adaptive Resonance Theory (ART)
ShahidRajaeeTeacher Training University
Faculty of Computer Engineering
PRESENTED BY:
Amir Masoud Sefidian

OUTLINE
•Introduction (Stability-Plasticity Dilemma)
•ART Network
•ART Types
•Basic ART network Architecture
•ART Algorithm and Learning
•ART Computational Example
•ART Application
•Conclusion
•Main References

Stability-Plasticity Dilemma (SPD)
•System behaviour doesn’t change after irrelevant
eventsStability
•System adapts its behaviour according to
significant eventsPlasticity
•How to achieve stability without rigidity and
plasticity without chaos?
•Ongoing learning capability
•Preservation of learned knowledge
Dilemma
Real world is faced with a situations where data is continuously changing.
Every learning systemfaces plasticity-stability dilemma.
How can we create a machine that can act and navigate in a world that is constantly changing?

SPD (Contd.)
Everylearningsystemfacestheplasticity-stabilitydilemma.
•Theplasticity-stabilitydilemmaposesfewquestions:
Howcanwecontinuetoquicklylearnnewthingsaboutthe
environmentandyetnotforgettingwhatwehavealreadylearned?
Howcanalearningsystemremainplastic(adaptive)inresponseto
significantinputyetstableinresponsetoirrelevantinput?
Howcananeuralnetworkcanremainplasticenoughtolearnnew
patternsandyetbeabletomaintainthestabilityofthealready
learnedpatterns?
Howdoesthesystemknowtoswitchbetweenitsplasticandstable
modes.

5

Back-propagation DrawBack
•The back-propagation algorithm suffer from stability problem.
•Once a back propagation is trained, the number of hidden neurons and the
weightsare fixed.
•The network cannot learn from new patterns unless the network is
retrainedfrom scratch.
•Thus we consider the back propagation networks don’t have plasticity.
•Assuming that the number of hidden neurons can be kept constant, the
plasticity problem can be solved by retrainingthe network on the new
patterns using on-line learning rule.
•However it will cause the network to forget about old knowledge rapidly.
We say that such algorithm is not stable.

•Gail Carpenter and Stephen Grossberg (Boston University) developed the
“Adaptive Resonance Theory” in 1976 tolearning model to answer this dilemma.
•ART networks tackle the stability-plasticity dilemma:
▫Maintain the plasticity required to learn new patterns, while preventing the modification
of pattern that have been learned previously.
▫No stored pattern is modified if input is not matched with any pattern.
▫Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a
new weight vector) if the given input cannot be classified by existing clusters (e.g., this is
a computational corollary of the biological model of neural plasticity).
▫Stability: Existing clusters are not deleted by the introduction of new inputs (new
clusters will just be created in addition to the old ones).
▫The basic ART System is an unsupervised learning model.

The key innovation :
“expectations”
As each inputis presented to the network, it is compared with the prototype vector
that is most closely matches(the expectation).
If the matchbetween the prototype and the input vector is NOTadequate, a new
prototypeis selected.
In this way, previouslearned memories (prototypes) are noteroded by newlearning.
8

9
ART Networks
ART Networks
Grossberg, 1976
Unsupervised ART
Learning
supervised ART Learning
ART1, ART2
Carpenter &
Grossberg,
1987
Fuzzy ART
Carpenter &
Grossberg,
etal,1991
ARTMAP
Carpenter &
Grossberg,
etal,1991
Fuzzy ARTMAP
Carpenter &
Grossberg,
etal,1991
Gaussian
ARTMAP
Williamson,1
992
Simplified ART
Baraldi & Alpaydin, 1998
Simplified ARTMAP
Kasuba, 1993
Mahalanobis Distance Based ARTMAP
Vuskovic & Du, 2001, Vuskovic, Xu & Du, 2002

•ART 1 :
▫simplest variety of ART networks
▫accepting only binary inputs.
•ART2 :
▫support continuous inputs.
•ART3is refinement of both models.
•Fuzzy ART implements fuzzy logic into ART’s pattern recognition.
•ARTMAPalso known asPredictive ART, combines two slightly modified ART-1 or ART-2 units
into a supervised learning structure.
•Fuzzy ARTMAPis merely ARTMAP using fuzzy ART units, resulting in a corresponding increase
in efficacy.

Basic ART network Architecture
ThebasicARTsystemisunsupervisedlearning
model.Ittypicallyconsistsof
1.acomparisonfield
2.arecognitionfieldcomposedofneurons,
3.avigilanceparameter,and
4.aresetmodule
F
2
node is in one of three states:
•Active
•Inactive, but available to participate in
competition
•Inhibited, and prevented from participating
in competition

12
ART Subsystems
Layer 1
Comparisonof input pattern and expectation.
L1-L2 Connections
Perform clusteringoperation.
Each row of weightsis a prototype pattern.
Layer 2
Competition (Contrast enhancement)winner-take-all learning strategy.
L2-L1 Connections
Perform pattern recall(Expectation).
Orienting Subsystem
Causes a resetwhen expectation does not match input pattern
The degree of similarity required for patterns to be assigned to the same cluster unit is
controlled by a user-defined gain control, known as thevigilance parameter

ART Algorithm
Adapt winner
node
Initialise uncommitted
node
new pattern
categorisation
known unknown
recognition
comparison
Incoming pattern matched with
stored cluster templates
If close enough to stored template
joins best matching cluster,
weights adapted
If not, a new cluster is initialised
with pattern as template

Recognition Phase(in Layer 2)
•Forward transmission via bottom-up weights(Inner product)
•Best matching node fires (winner-take-all layer)

Comparison Phase (in Layer 1)
•Backward transmission via top-down weights
•Vigilance test: class template matched with input pattern
•If pattern close enough to template, categorisation was successful and “resonance”
achieved
•If not close enough reset winner neuron and try next best matching
•(The reset inhibitthe current winning neuron, and the current expectation is
removed)
•A new competition is then performed in Layer 2, while the previous winning neuron
is disable.
•The new winning neuron in Layer 2 projects a new expectation to Layer 1, through
the L2-L1 connections.
•This process continues until the L2-L1 expectation provides a close enough match to
the input pattern.
•The process of matching, and subsequent adaptation, is referred to as resonance.

Step 1:
Send input from the F1 layer to F2 layer for
processing.
The first node within the F2 layer is chosen as
the closest match to the input and a
hypothesis is formed.
This hypothesis represents what
the node will look like after learning has
occurred, assuming it is the correct node to be
updated.

Step2:
Assume j is winner.
If T j (I*) > ρthen the hypothesis is accepted and
assigned to that node. Otherwise, the process moves
on to Step 3.
Step3:
If the hypothesis is rejected, a “reset” command is sent back
to the F2 layer.
In this situation, the jth node within F2 is no longer a
candidate so the process repeats for node j+1.

20
Learning in ART1
Updates for both the bottom-up and top-down weights are controlled by
differential equations.
Assuming Jth node is winner and their connected weights
should be updated, Two separate learning laws:
L2-L1 connections:
Bottom-up weights should be
smaller than or equal to value:
Solve
Solve
Initial weights for ART1:
top-down weights are initialized to 1.1)0(jit
L1-L2 connections

22
Vigilance Threshold
•A vigilance parameter ρdetermines the tolerance of
matching process.
•Vigilance threshold sets granularity of clustering.
•It defines amount of attraction of each prototype.
•Low threshold
▫Large mismatch accepted
▫Few large clusters
▫Misclassifications more likely
▫the algorithm will be more willing to accept input
vectors into clusters (i.e., definition of similarity is
LESS strict).
•High threshold
▫Small mismatch accepted
▫Many small clusters
▫Higher precision
▫the algorithm will be more "picky" about assigning
input vectors to clusters (i.e., the definition of
similarity is MORE strict)

23
Example:

For this example, let us assume that we have an ART-1 network with 7 input
neurons (n = 7) and initially one output neuron (n = 1).
Our input vectors are
{(1, 1, 0, 0, 0, 0, 1),
(0, 0, 1, 1, 1, 1, 0),
(1, 0, 1, 1, 1, 1, 0),
(0, 0, 0, 1, 1, 1, 0),
(1, 1, 0, 1, 1, 1, 0)}
and the vigilance parameter =0.7.
Initially, all top-down weights are set to ??????
??????
,
1
(
0
)=
1, and all bottom-up weights
are set to ??????
1
,
??????(0)=1/8.
ART Example Computation
24

For the first input vector, (1, 1, 0, 0, 0, 0, 1), we get:
Clearly, y
1is the winner (there are no competitors).
Since we have:
the vigilance condition is satisfied and we get the following new weights:
ART Example Computation
258
3
1
8
1
0
8
1
0
8
1
0
8
1
0
8
1
1
8
1
1
8
1
1
y 7.01
3
3
7
1
7
1
1,





l
l
l
ll
x
xt 5.3
1
35.0
1
)1()1()1(
7,12,11,1


 bbb 0)1()1()1()1(
6,15,14,13,1
 bbbb

Also, we have:
We can express the updated weights as matrices:
Now we have finished the first learning step and proceed by presenting the
next input vector.
ART Example Computation
26lll
xtt )0()1(
0,1,
 T
5.3
1
0 0 0 0
5.3
1

5.3
1
)1(






B  
T
1 0 0 0 0 1 1)1(T

For the second input vector, (0, 0, 1, 1, 1, 1, 0), we get:
Of course, y
1is still the winner.
However, this time we do not reach the vigilance threshold:
This means that we have to generate a second node in the output layer that
represents the current input.
Therefore, the top-down weights of the new node will be identical to the
current input vector.
ART Example Computation
2700
5.3
1
101010100
5.3
1
0
5.3
1
1
y .7.00
4
0
7
1
7
1
1,





l
l
l
ll
x
xt

The new unit’s bottom-up weights are set to zero in the positions where the
input has zeroes as well.
The remaining weights are set to:
1/(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0)
This gives us the following updated weight matrices:
ART Example Computation
28T
0
4.5
1

4.5
1

4.5
1

4.5
1
0 0
3.5
1
0 0 0 0
3.5
1

3.5
1
)2(








B T
0 1 1 1 1 0 0
1 0 0 0 0 1 1
)2(






T

For the third input vector, (1, 0, 1, 1, 1, 1, 0), we have:
Here, y
2is the clear winner.
This time we exceed the vigilance threshold again:
Therefore, we adapt the second node’s weights.
Each top-down weight is multiplied by the corresponding element of the
current input.
ART Example Computation
295.4
4
;
5.3
1
21
 yy .7.08.0
5
4
7
1
7
1
2,





l
l
l
ll
x
xt

The new unit’s bottom-up weights are set to the top-down weights divided by
(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0).
It turns out that, in the current case, these updates do not result in any weight
changes at all:
ART Example Computation
30T
0
4.5
1

4.5
1

4.5
1

4.5
1
0 0
3.5
1
0 0 0 0
3.5
1

3.5
1
)3(








B T
0 1 1 1 1 0 0
1 0 0 0 0 1 1
)3(






T

For the fourth input vector, (0, 0, 0, 1, 1, 1, 0), it is:
Again, y
2is the winner.
The vigilance test succeeds once again:
Therefore, we adapt the second node’s weights.
As usual, each top-down weight is multiplied by the corresponding element of
the current input.
ART Example Computation
315.4
3
;0
21
 yy .7.01
3
3
7
1
7
1
2,





l
l
l
ll
x
xt

The new unit’s bottom-up weights are set to the top-down weights divided by
(0.5 + 0 + 0 + 0 + 1 + 1 + 1 + 0).
This gives us the following new weight matrices:
ART Example Computation
32T
0
3.5
1

3.5
1

3.5
1
0 0 0
3.5
1
0 0 0 0
3.5
1

3.5
1
)4(








B T
0 1 1 1 0 0 0
1 0 0 0 0 1 1
)4(






T

Finally, the fifth input vector, (1, 1, 0, 1, 1, 1, 0), gives us:
Once again, y
2is the winner.
The vigilance test fails this time:
This means that the active set A is reduced to contain only the first node, which
becomes the uncontested winner.
ART Example Computation
335.3
3
;
5.3
2
21
 yy .7.06.0
5
3
7
1
7
1
2,





l
l
l
ll
x
xt

The vigilance test fails for the first unit as well:
We thus have to create a third output neuron, which gives us the following new
weight matrices:
ART Example Computation
34.7.04.0
5
2
7
1
7
1
1,





l
l
l
ll
x
xt T
B













0
5.5
1

5.5
1

5.5
1
0
5.5
1

5.5
1

0
3.5
1

3.5
1

3.5
1
0 0 0
3.5
1
0 0 0 0
3.5
1

3.5
1
)5( T
0 1 1 1 0 1 1
0 1 1 1 0 0 0
1 0 0 0 0 1 1
)5(










T

In the second epoch, the first input vector, (1, 1, 0, 0, 0, 0, 1), gives us:
Here, y
1is the winner, and the vigilance test succeeds:
Since the current input is identical to the winner’s top-down weights, no weight
update happens.
ART Example Computation
355.5
2
;0 ;
5.3
3
321
 yyy .7.01
3
3
7
1
7
1
1,





l
l
l
ll
x
xt

The second input vector, (0, 0, 1, 1, 1, 1, 0), results in:
Now y
2is the winner, and the vigilance test succeeds:
Again, because the current input is identical to the winner’s top-down weights,
no weight update occurs.
ART Example Computation
365.5
3
;
5.3
3
;0
321
 yyy .7.01
3
3
7
1
7
1
2,





l
l
l
ll
x
xt

The third input vector, (1, 0, 1, 1, 1, 1, 0), give us:
Once again, y
2is the winner, but this time the vigilance test fails:
This means that the active set is reduced to A = {1, 3}.
Since y
3> y
1, the third node is the new winner.
ART Example Computation
375.5
4
;
5.3
3
;
5.3
1
321
 yyy .7.06.0
5
3
7
1
7
1
2,





l
l
l
ll
x
xt

The third node does satisfy the vigilance threshold:
This gives us the following updated weight matrices:
ART Example Computation
38.7.08.0
5
4
7
1
7
1
3,





l
l
l
ll
x
xt T
B













0
4.5
1

4.5
1

4.5
1
0 0
4.5
1

0
3.5
1

3.5
1

3.5
1
0 0 0
3.5
1
0 0 0 0
3.5
1

3.5
1
)8( T
0 1 1 1 0 0 1
0 1 1 1 0 0 0
1 0 0 0 0 1 1
)8(










T

For the fourth vector, (0, 0, 0, 1, 1, 1, 0), the second node wins, passes the
vigilance test, but no weight changes occur.
The fifth vector, (1, 1, 0, 1, 1, 1, 0), makes the second unit win, which fails the
vigilance test.
The new winner is the third output neuron, which passes the vigilance test but
does not lead to any weight modifications.
Further presentation of the five sample vectors do not lead to any weight
changes; the network has thus stabilized.
ART Example Computation
39

Adaptive Resonance Theory
40
Illustration of the categories (or clusters) in input space formed by ART networks.
increasing leads to narrower cones and not to wider ones as suggested by the figure.

•A problem with ART-1 is the need to determine the vigilance parameter for
a given problem, which can be tricky.
•Furthermore, ART-1 always builds clusters of the same size, regardless of the
distribution of samples in the input space.
•Nevertheless, ART is one of the most important and successful attempts at
simulating incremental learning in biological systems.
Adaptive Resonance Theory
41

Applications of ART
•Face recognition
•Image compression
•Mobile robot control
•Target recognition
•Medical diagnosis
•Signature verification

Conclusion
ART is Artificial Neural Network system that must be able to adapt to
changing environment but constant change can make the system unstable
system because system may learn new information only by forgetting
everything it has so far learned.

44
Main References:
•S.Rajasekaran,G.A.V.Pai,“NeuralNetworks,FuzzyLogicandGeneticAlgorithms”,PrenticeHallofIndia,
AdaptiveResonanceTheory,Chapter5.
•JacekM.Zurada,“IntroductiontoArtificialNeuralSystems”,WestPublishingCompany,Matching&Self
organizingmaps,Chapter7.
•Carpenter,G.A.,&Grossberg,S.(1987).Amassivelyparallelarchitectureforaself-organizingneuralpattern
recognitionmachine.Computervision,graphics,andimageprocessing,37(1),54-115.
•AdaptiveResonanceTheory,Softcomputinglecturenotes,http://www.myreaders.info/html/soft_computing.html
•Fausett,L.V.(1994).Fundamentalsofneuralnetworks:Architectures,algorithms,andapplications.Englewood
Cliffs,NJ:Prentice-Hall.

QUESTION??...
46
Where to find us:
[email protected]