Pattern Recognition Revisited, ICVSS 2016 presentaion

kenmaeda2 144 views 35 slides Jul 23, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

Another Story of Pattern Recognition.
Is Deep Learning the only way?International Computer Vision Summer School (ICVSS) 2016 presentation held in Sicily on 22nd July 2016.


Slide Content

Pattern Recognition Revisited
Another Story of Pattern Recognition
Is Deep Learning the only way?
Ken-ichi Maeda
ICVSS: Sicily, 22 July 2016

Deep Learning and
Convolutional Neural
Network
The state-of-the-art technology of
pattern recognition.
https://en.wikipedia.org/wiki/Convolutional_neural_network

Neocognitron(1980)
Fukushima, K. (1980). Neocognitron: A Self-organizing Neural Network Model
for a Mechanism of Pattern Recognition Unaffected by Shift in Position,
Biological Cybernetics, 36, 193 – 202.

Back Propagation
(Rumelhart1986, Amari 1967)
http://sig.tsg.ne.jp/ml2015/ml/2015/06/08/stochastic-gradient-descent.html

Framework of Pattern
Recognition
Given by K.S. Fu, first president of IAPR.
Feature
Extraction
Similarity
Calculation
Pattern Recognition
Dictionary
Feature
Extraction
Training

Framework of Pattern
Recognition
Similar to 3-layer Neural Network?
Input Hidden Layer Output
Pattern Feature ExtractionSimilarity

What is Feature?
Edge, corner, whiteness/blackness,
direction of vector (correlation of
meshes)

Neocognitron(1980)
Fukushima, K. (1980). Neocognitron: A Self-organizing Neural Network Model
for a Mechanism of Pattern Recognition Unaffected by Shift in Position,
Biological Cybernetics, 36, 193 – 202.

Layered Features
3-Layer Neural Network
Pattern Feature ExtractionSimilarity

Layered Features
4-Layer Neural Network
Pattern
Feature
Extraction 1
Similarity
Feature
Extraction 2

ASPET 70/71 (1970, 1971)
Analog Spatial Processor Developed by
Electro-Technical Laboratory and
Toshiba
OCR prototype
http://museum.ipsj.or.jp/en/heritage/ASPET/71.html

Analog Spatial Processor
Composed by analog IC and resistor
network
Op
Amp
R2
R3
R1
R4

Feature Extraction
Geometric Feature: Whiteness/blackness
convolution with Gaussian Function (Pool)
??????(??????
??????,??????)=න�??????
??????−??????,????????????(??????)d??????
�??????,??????=
1
2π??????
2
exp−
??????
2
2??????
2

Feature Extraction
Statistical Feature: Vector directions (
m)
calculated using PCA
??????,??????
�
????????????
�=�
�??????
�
??????=෍
??????
�
????????????
??????,??????
??????

Similarity
Multiple Similarity Measure: Angle
between Vector and Subspace
????????????=cos
2
�
=෍
�=1
??????
??????,??????
�
2
??????
2

Similarity Visualisation
Angle between a vector and a subspace
(f*: Nearest vector of f in the subspace)
S [f] = cos
2

??????
1
??????
2
f

f*

Problems in Features
Is convolution with Gaussian Function
effective enough to recognize Kanji
(Chinese characters)?

Extended Features
We need more complex features, e.g.,
edges
›Gaussian-weighted Hermite Polynomials
??????=��+��
??????
�+�
??????�
�
??????�
�
????????????,??????=න
??????
�+�
??????�
�
??????�
�
�??????−??????

,????????????(??????′)d??????′
=
�!�!
−2
�+� ??????
��??????,??????
??????
��??????,??????=න
1
�!�!
??????
2
�+�
�??????−??????

,??????�
�
�−�′
??????
�
�(
�−�′
??????
)??????(??????′)d??????′

Basic Equation of Figure and
Scale Space
Proposed by Iijima (1959) before scale
space by Witkin (1983).
??????(??????
??????,??????)=න�??????
??????−??????,????????????(??????)d??????
??????
2

??????
????????????
????????????,??????=0
??????=
??????
2
2

Gaussian-weighted Hermite
Polynomials (Maeda 1982)
Like Gabor Functions

Similar Feature (Gabor)
Equal interval 0 crossing

Similar Feature (Rubner1990)
Made using Oja’s equation.
Look like Gaussian-weighted Hermite
Polynomials

Similar Feature (Linsker1988)
Layered Linsker Network

Similar Feature (MacKay
1990)
Polar Representation

Deep Learning Features
Result of deep learning

HebbianLearning
A basic concept of correlation learning
presented by Hebb (1949)
›�
� input, � output, �
� connection,
� learning weight
∆�
�=��
��
�=෍
�
�
��
�

Modified Learning
Oja (1982) showed that a neuron model
could generate a subspace {??????
m}.
›??????
� input, �
� connection, � output
›Modified learning equation assuming �
(positive learning parameter) is small.
�
�??????+1=�
�??????+��????????????
�??????−�??????�
�??????+??????�
2
??????
�=
T
(�
1,�
2,⋯,�
�,⋯,�
??????)
�
�??????+1=
�
�??????+��????????????
�??????
σ
�=1
??????
�
�??????+��????????????
�??????
2
1/2
(original modification)

Modified Learning
von der Malsburg (1985) showed a set of
learning neurons form a columnar
structure also with discarding higher
order.
›??????
�� connection
ሶ??????
��=??????
��??????
��
2
−??????
��෍
�′
??????
�′�??????
�′�
2
+෍
�′
??????
��′??????
��′
2
??????
2
=?????? A static solution

What is Learning?
Learning is used to find geometric
features.
Learning is the training phase for
recognition.
›To find correlation features.

Back Propagation
(Rumelhart1986, Amari 1967)
http://sig.tsg.ne.jp/ml2015/ml/2015/06/08/stochastic-gradient-descent.html

Learning Subspace Method
(1979)
Maeda, K. (1990). Dimension Selection by Learning for Class Discrimination
and Information Representation. AIAI Technical Reports, AIAI-TR-75.
??????

=??????±�
??????,??????
??????
2
??????
A : projection
E : unit matrix
To learn an input ?????? ,

Averaged Learning Subspace
Method (Kuusela1982, Maeda
1980)
Maeda, K. (1990). Dimension Selection by Learning for Class Discrimination
and Information Representation. AIAI Technical Reports, AIAI-TR-75.
To learn an input ?????? ,
??????′=1∓�??????±�
??????,??????
??????
2
?????? : PCA correlation matrix

Conclusion
Deep Learning is the state-of-the-art
technique in pattern recognition and
machine learning, but similar concepts
and results existed before.
It is quite a powerful method, but is not
the only solution.
We sometimes should return to the
principle, so that we can continue
making progress.

Guidance on the future
Learn from the past, but never stick to it
only.
Back to the principle, back to what you
see.
Everything is useful if you can correctly
see.
The future is yours!

References of Historical Works
 Amari, S. (1967). Theory of Adaptive Pattern Classifiers, IEEE TransactionsEC-1, 299–307.
 Fukushima, K. (1980). Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition
Unaffected by Shift in Position, Biological Cybernetics, 36, 193 – 202.
 Hebb, D. O. (1949). The Organization of Behavior, Wiley.
 Hubel, D. H. and Wiesel, T. N. (1962). Receptive Fields, Binocular Interaction, and Fundamental Architecture in the Cat's Visual
Cortex, J. Physiol., 160, 106-154.
 Iijima, T. (1959). Basic Theory of Pattern Observation, Technical Group on Automata and Automatic Control, IECE, 1-37. In
Japanese.
 Iijima, T. (1963). Basic Theory of Feature Extraction from Visual Patterns, J. IECE, 46 (11), 1714. In Japanese.
 Iijima, T., et al. (1972). A Theoretical Study of Pattern Identification by Matching Method in Proc. of First USA-Japan Computer
Conf., 42–48.
 Iijima, T., et al. (1973). A Theory of Character Recognition by Pattern Matching Method, Proc. of First Int. Joint Conf. on Pattern
Recognition, 50-56.
 Irie, B. and Miyake, S. (1988). Capabilities of Three-Layered Perceptrons, Proc. of ICNN, Vol. 1, 641 – 648.
 Kohohen, T. et al. (1979). Spectral Classification of Phoneme by Learning Subspace, Proc. of Int. Conf. on Acoustics, Speech,
and Signal Processing, 807 – 809.
 Kuusela, M and Oja, E. (1982). Averaged Learning Subspace Method for Spectral Pattern Recognition, Proc. Of the 6
th
Int.
Cong. on Pattern Recognition (ICPR ‘82), 134 – 137.
 Linsker, R. (1988). Self-Organization in a Perceptual Network, Computer, 21 (3), 105-117.
 MacKay, D. J. C., et al. (1990). Analysis of Linsker's Simulations of Hebbian Rules, Neural Computation, 2 (2), 173-187.
 Maeda, K. (1980). Pattern Recognition Apparatus, Japanese Patent Public Disclosure, 137483/81, 1980
 Maeda, K., et al. (1982). Hand-printed Kanji Recognition by Pattern Matching Method, Proc. of the 6th Int. Conf. on Pattern
Recognition (ICPR '82), 789–792.
 Maeda, K. (1990). Dimension Selection by Learning for Class Discrimination and Information Representation. AIAI Technical
Reports, AIAI-TR-75.
 von der Malsburg, C. (1985). Nervous Structures with Dynamical Links, Ber. Bunsenges. Phys. Chem., 89, 703-710.
 Oja, E. (1982). A Simplified Neuron Model as a Principal Component Analyzer, J. Math. Biology, 15 (3), 267-273.
 Rubner, J., et al. (1990). A Self-Organizing Network for Complete Feature Extraction, Proc. of Int. Conf. on Parallel Processing in
Neural Systems and Computers, 365-368.
 Rumelhart, David E.; Hinton, Geoffrey E., Williams, Ronald J. (1986). Learning representations by back-propagating errors,
Nature323(6088): 533–536.
 Widrow, B., et al. (1960). Adaptive Switching Circuits, IRE WESCON Convention Record 4: 96-104.
 Witkin, A. P. (1983). Scale-space filtering, Proc. 8th Int. Joint Conf. Art. Intell.,1019–1022.