Emergence Berkeley presentation for devices

amirovski340 5 views 18 slides May 27, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Emergency Berkeley


Slide Content

Emergent Functions of
Simple Systems
J. L. McClelland
Stanford University

Topics
Emergent probabilistic optimization in neural
networks
Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
Some models that bring connectionist and
probabilistic approaches into proximal contact

Connectionist Units Calculate Posteriors
based on Priors and Evidence
Given
A unit representing hypothesis h
i, with
binary inputs j representing the state of
various elements of evidence e, where
for all j p(e
j) is assumed conditionally
independent given h
i
A bias on the unit equal to
log(prior
i/(1-prior
i))
Weights to the unit from each input
equal to log(p(e
j|h
i)/(log(p(e
j|not h
i))
If
the output of the unit is computed
from the logistic function
a
i= 1/[1+exp( bias
i+ S
ja
jw
ij)]
Then
a
i= p(h
i|e)
Unit i
Input from
unit j
w
ij

Choosing one of N alternatives
A collection of connectionist units representing
mutually exclusive alternative hypotheses can assign the
posterior probability to each in a similar way, using the
softmax activation function
net
i= bias
i+ S
ja
jw
ij
a
i= exp(gnet
i)/S
i’exp(gnet
i’)
If g= 1, this constitutes probability matching.
As gincreases, more and more of the activation goes to
the most likely alternative(s).

Emergent Outcomes from Local Computations
(Hopfield, ’82, Hinton & Sejnowski, ’83)
If w
ij= w
jiand if units are updated asynchronously,
setting
a
i= 1 if net
i>0, a
i= 0 otherwise
A network will settle to a state swhich is a local
maximum in a measure Rumelhart et al (1986) called G
G(s) =S
i<jw
ija
ia
j+ S
ia
i(bias
i+ ext
i)
If each unit sets its activation to 1 with probability
logistic(gnet
i) then
p(s) = exp(gG(s))/S
s’(exp(gG(s’))

A Tweaked Connectionist Model (McClelland &
Rumelhart, 1981) that is Also a Graphical Model
Each pool of units in the IA model is equivalent to
a Dirichlet variable (c.f. Dean, 2005).
This is enforced by using softmax to set one of the
a
iin each pool to 1 with probability:
p
j= e
gnet
j/S
j’e
gnet
j’
Weight arrays linking the variables are equivalent
of the ‘edges’ encoding conditional relationships
between states of these different variables.
Biases at word level encode prior p(w).
Weights are bi-directional, but encode generative
constraints (p(l|w), p(f|l)).
At equilibrium with g= 1, network’s probability of
being in state sequals p(s|I).

But that’s not the true PDP approach
to Perception/Cognition/etc…
We want to learn how to represent the
world and constraints among its
constituents from experience, using (to
the fullest extent possible) a domain-
general approach.
In this context, the prototypical
connectionist learning rules correspond
to probability maximization or
matching
Back Propagation Algorithm:
Treats output units (or n-way pools) as
conditionally independent given Input
Maximizes p(o
i|I) for each output unit.
I o

Overcoming the Independence
Assumption
The Boltzmann Machine learning algorithm learns to match probabilities of
entire output statesogiven current Input.
That is, it minimizes
∫p(o|I) log(p(o|I)/q(o|I)) do
where: p(o|I) is sampled from the environment (plus phase)
q(o|I) is net’s estimate of p(o|I) obtained by settling with the
input only (minus phase)
The algorithm is beautifully simple and local:
Dw
ij= e(a
i
+
a
j
+
-a
i
-
a
j
-
)

Recent Developments
Hinton’s deep belief
networks are fully distributed
learned connectionist models
that use a restricted form of
the Boltzmann machine (no
intra-layer connections) and
learn state-of-the-art models
very fast.
Generic constraints (sparsity,
locality) allow such networks
to learn efficiently and
generalize very well in
demanding task contexts.
Hinton, Osindero, and Teh (2006). A fast
learning algorithm for deep belief networks.
Neural Computation, 18, 1527-54.

Topics
Emergent probabilistic optimization in neural
networks
Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
Some models that bring connectionist and
probabilistic approaches into proximal contact

One take on the relationship between rational
analysis and human behavior
Characterizing what’s optimal is always a great thing to do
Optimality is always relative to some framework; what that framework should
be isn’t always obvious.
It is possible to construct a way of seeing virtually anything as optimal post hoc
(c.f. Voltaire’s Candide).
Optimization is also relative to a set of constraints
Time
Memory
Processing speed
Available mechanisms
Simplifying assumptions …
The question of whether people do behave optimally (according to some
framework and constraints) in any particular situation is an empirical
question.
The question of why and how people can/do so behave in some situations
and not in others is worth understanding more thoroughly.

Two perspectives
People are rational, their behavior is optimal.
They seek explicit internal models of the structure
of the world, within which to reason.
Optimal structure type for each domain
Optimal structure instance within type
Resource limits and implementation constraints
are unknown, and should be ignored in
determining what is rational/optimal.
Inference is still hard, and prior domain-specific
constraints are therefore essential.
People evolved through an optimization process,
and are likely to approximate optimality/rationality
within limits.
Many aspects of natural/intuitive cognition may
depend largely on implicit knowledge.
Natural structure (e.g. language) does not exactly
correspond to any specific structure type.
Culture/School encourages us to think and reason
explicitly, and gives us tools for this; we do so
under some circumstances.

Same experienced structure leads to different
outcomes under different performance conditions
(Sternberg & McClelland, in prep)
Box appears…
Then one or two objects appear
Then a dot may or may not appear
RT condition: Respond as fast as
possible when dot appears
Prediction condition: Predict whether a
dot will appear, get feedback after
prediction.
Each event in box occur several times,
interleaved, with reversal of outcome
on 10% of trials.
Half of participants are instructed in
Causal Powers model, half not.
All participants learn explicit relations.
Only Instructed Prediction subjects
show Blocking and Screening.
AB+,A+
CD+,C-
EF+
GH-,G-
fillers

Two perspectives
People are rational, their behavior is optimal.
They seek explicit internal models of the structure
of the world, within which to reason.
Optimal structure type for each domain
Optimal structure instance within type
Resource limits and implementation constraints
are unknown, and should be ignored in
determining what is rational/optimal.
Inference is still hard, and prior domain-specific
constraints are therefore essential.
People evolved through an optimization process,
and are likely to approximate optimality/rationality
within limits.
Many aspects of natural/intuitive cognition may
depend largely on implicit knowledge.
Natural structure (e.g. language) does not exactly
correspond to any specific structure type.
Culture/School encourages us to think and reason
explicitly, and gives us tools for this; we do so
under some circumstances.
Many connectionist models do not directly
address this kind of thinking; eventually they
should be elaborated to do so.
Human behavior won’t be understood without
considering the constraints it operates under.
Determining what is optimal sansconstraints is
always useful, even so
Such an effort should not presuppose individual
humans intend to derive an explicit model.
Inference is hard, and domain specific priors can
help, but domain-general mechanisms subject to
generic constraints deserve full exploration.
In some cases such models may closely
approximate what might be the optimal explicit
model.
But that model might only be an approximation and
the domain-specific constraints might not be
necessary.

What is happening here?
Prediction participants have both a causal framework
and the time to reason explicitly about which objects
have the power to make the dot appear and which do
not.
Recall of (e.g.) C-during a CD prediction trial, in
conjunction with the causal powers story, licenses the
inference to D+
This inference does not occur without the boththe time
to think and the appropriate cover story.

The Rumelhart Sematic Attribution Model is Approximated
by a Gradually Changing Mixture of Increasingly Specific
Naïve Bayes Classifiers (Roger Grosse, 2007)
Very young
Stillyoung
Older
Correlation of network’s attributions with indicated classifier

Topics
Emergent probabilistic optimization in neural
networks
Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
Some models that bring connectionist and
probabilistic approaches into proximal contact

Some models that bring connectionist and
probabilistic approaches into proximal contact
Graphical IA model of Context Effects in Perception
In progress; see Movellan & McClelland, 2001.
Leaky Competing Accumulator Model of Decision
Dynamics
Usher and McClelland, 2001, and the large family of related
decision making models
Models of Unsupervised Category Learning
Competitive Learning, OME, TOME (Lake et al, ICDL08).
Subjective Likelihood Model of Recognition Memory
McClelland and Chappell, 1998; c.f. REM, Steyvers and
Shiffrin, 1997), and a forthcoming variant using distributed
item representations.
Tags