Alan Turing and the
Turing Machine (1936)
4
https://www.felienne.com/archives/2974
Turing Test (1950) –a.k.a.
“the Imitation Game”
5
https://en.wikipedia.org/wiki/Turing_test
McCulloch-Pitts Model
(1943)
6
The first formal model of
computational mechanisms of
(artificial) neurons
Basis of
Modern
Artificial
Neural
Networks
7
Multilayer perceptron
(Rosenblatt 1958)
Backpropagation
(Rumelhart, Hinton &
Williams 1986)
Deep learning
https://commons.wikimedia.org/wiki/File:
Example_of_a_deep_neural_network.png
Cybernetics (1940s-80s)
8
“Cybernetics” as a
Precursor to “AI”
9
Norbert Wiener
(This is where the word “cyber-” came from!)
Good Old-Fashioned AI:
Symbolic Computation and
Reasoning
▪Herbert Simon et al.’s “Logic Theorist” (1956)
▪Functional programming, list processing (e.g.,
LISP (1955-))
▪Logic-based chatbots (e.g., ELIZA (1966))
▪Expert systems
▪Fuzzy logic (Zadeh, 1965)
10
Regression
▪Legendre, Gauss (early 1800s)
▪Representing the behavior of a
dependent variable (DV) as a
function of independent
variable(s) (IV)
▪Linear regression, polynomial
regression, logistic regression,
etc.
▪Optimization (minimization) of
errors between model and data
14
https://en.wikipedia.org/wiki/Regression_analysis
https://en.wikipedia.org/wiki/Polynomial_regression
Hypothesis Testing
▪Original idea dates back to
1700s
▪Pearson, Gosset, Fisher (early
1900s)
▪Set up hypothesis(-ses) and
see how (un)likely the
observed data could be
explained by them
▪Type-I error (false positive),
Type-II error (false negative)
15
https://en.wikibooks.org/wiki/Statistics/Testing
_Statistical_Hypothesis
Bayesian Inference
▪Bayes & Price (1763), Laplace
(1774)
▪Probability as a degree of belief
that an event or a proposition is
true
▪Estimated likelihoods updated
as additional data are obtained
▪Empowered by Markov Chain
Monte Carlo (MCMC) numerical
integration methods (Metropolis
1953; Hastings 1970)
16
https://en.wikipedia.org/wiki/Bayes%27_theorem
https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo
Key
Ingredient II:
Optimization
17
Least Squares Method
▪Legendre, Gauss (early 1800s)
▪Find the formula that minimizes
the sum of squared errors
(residuals) analytically
18
https://en.wikipedia.org/wiki/Least_squares
Gradient Methods
▪Find local minimum of a
function computationally
▪Gradient descent (Cauchy
1847) and its variants
▪More than 150 years later,
this is still what modern
AI/ML/DL systems are
essentially doing!!
▪Error minimization
19
https://commons.wikimedia.org/wiki/File:
Gradient_descent.gif
Linear/Nonlinear/Integer/
Dynamic Programming
▪Extensively studied and used in
Operations Research
▪Practical optimization algorithms
under various constraints
20
https://en.wikipedia.org/wiki/Linear_programming
https://en.wikipedia.org/wiki/Integer_programming
https://en.wikipedia.org/wiki/Floyd%E2%80%93Wa
rshall_algorithm
Other Population-Based
Learning & Optimization
▪Ant colony optimization
(Dorigo 1992)
▪Particle swarm optimization
(Kennedy & Eberhart 1995)
▪And various other metaphor-based metaheuristic algorithms
https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics
22
https://en.wikipedia.org/wiki
/Ant_colony_optimization_al
gorithms
https://en.wikipedia.org/wiki
/Particle_swarm_optimizati
on
Machine
Learning
23
Pattern Discovery,
Modern Way
▪Unsupervised learning
▪Find patterns in the data
▪Supervised learning
▪Find patterns in the input-output mapping
▪Reinforcement learning
▪Learn the world by taking actions and receiving
rewards from the environment
24
Unsupervised Learning
▪Clustering
▪k-means, agglomerative
clustering, DBSCAN,
Gaussian mixture, community
detection, Jarvis Patrick, etc.
▪Anomaly detection
▪Feature
extraction/selection
▪Dimension reduction
▪PCA, t-SNE, etc.
25
https://reference.wolfram.com/language/ref/FindClusters.html
https://commons.wikimedia.org/wiki/File:T-SNE_and_PCA.png
Supervised Learning
▪Regression
▪Linear regression, Lasso, polynomial
regression, nearest neighbors,
decision tree, random forest,
Gaussian process, gradient boosted
trees, neural networks, support vector
machine, etc.
▪Classification
▪Logistic regression, decision tree,
gradient boosted trees, naive Bayes,
nearest neighbors, support vector
machine, neural networks, etc.
▪Risk of overfitting
▪Addressed by model selection, cross-
validation, etc.
26
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
https://scikit-learn.org/stable/auto_examples/
model_selection/plot_underfitting_overfitting.html
Reinforcement Learning
▪Environment typically
formulated as a Markov
decision process (MDP)
▪State of the world + agent’s
action
→ next state of the world +
reward
▪Monte Carlo methods
▪TD learning, Q-learning
27
https://en.wikipedia.org/wiki/Markov_decision_process
Artificial
Neural
Networks
28
Hopfield Networks
▪Hopfield (1982)
▪A.k.a. “attractor networks”
▪Fully connected networks with
symmetric weights can recover
imprinted patterns from imperfect
initial conditions
▪“Associative memory”
Input Output
29
https://github.com/nosratullah/hopfieldNeuralNetwork
Boltzmann Machines
▪Hinton & Sejnowski (1983),
Hinton & Salakhutdinov (2006)
▪Stochastic, learnable variants
of Hopfield networks
▪Restricted (bipartite) Boltzmann
machine was at the core of the
HS 2006 Science paper that
ignited the current boom of “Deep
Learning”
30
https://en.wikipedia.org/wiki/Boltzmann_machine
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
Feed-Forward NNs and
Backpropagation
▪Multilayer perceptron
(Rosenblatt 1958)
▪Backpropagation (Werbos
1974; Rumelhart, Hinton &
Williams 1986)
▪Minimization of errors by
gradient descent method
▪Note that this is NOT how our
brain learns
▪“Vanishing gradient” problem
31
Computation
Error correction
Input
Output
Autoencoders
▪Rumelhart, Hinton & Williams
(1986) (again!)
▪Feed-forward ANNs that try
to reproduce the input
▪Smaller intermediate layers
→ dimension reduction,
feature learning
▪HS 2006 Science paper also
used restricted Boltzmann
machines as stacked
autoencoders
32
https://towardsdatascience.com/applied-deep-learning-part-3-
autoencoders-1c083af4d798
https://doi.org/10.1126/science.1127647
Recurrent Neural
Networks
▪Hopfield (1982);
Rumelhart, Hinton &
Williams (1986) (again!!)
▪ANNs that contain
feedback loops
▪Have internal states and
can learn temporal
behaviors of any long-
term dependencies
▪With practical problems
in vanishing or exploding
long-term gradients
33
https://commons.wikimedia.org/wiki/File:Neuronal-Networks-
Feedback.png
https://en.wikipedia.org/wiki/Recurrent_neural_network
h
o
V
nfold
t
1
h
t
1
o
t
1
t
h
t
o
t
t
+
1
h
t
+
1
o
t
+
1
VV V V
. . . . . .
Long Short-Term Memory
(LSTM)
▪Hochreiter & Schmidhuber
(1997)
▪An improved neural module
for RNNs that can learn long-
term dependencies
effectively
▪Vanishing gradient problem
resolved by hidden states
and error flow control
▪“The most cited NN paper of
the 20
th
century”
34
Reservoir Computing
▪Actively studied since 2000s
▪Use inherent behaviors of
complex dynamical systems
(usually a random RNN) as
a “reservoir” of various
solutions
▪Learning takes place only at
the readout layer (i.e., no
backpropagation needed)
▪Discrete-time, continuous-
time versions
35
https://doi.org/10.1515/nanoph-2016-0132
https://doi.org/10.1103/PhysRevLett.120.024102
Deep Neural Networks
▪Ideas originally around since
the beginning of ANNs
▪Became feasible and popular
in 2010s because of:
▪Huge increase in available
computational power thanks
to GPUs
▪Wide availability of training
data over the Internet
36https://commons.wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png
https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458
Convolutional Neural
Networks
▪Fukushima (1980), Homma
et al. (1988), LeCun et al.
(1989, 1998)
▪DNNs with convolution
operations between layers
▪Layers represent spatial
(and/or temporal) patterns
▪Many great applications to
image/video/time series
analyses
37
https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://cs231n.github.io/convolutional-networks/
Adversarial Attacks and
Generative Adversarial
Networks (GAN)
38
https://arxiv.org/abs/1412.6572
https://en.wikipedia.org/wiki/Generative_
adversarial_network
▪Goodfellow et al. (2014a,b)
▪DNNs are vulnerable
against adversarial attacks
▪Utilize it to create co-
evolutionary systems of
generator and discriminator
https://commons.wikimedia.org/wiki/File:A-Standard-GAN-and-b-conditional-GAN-architecturpn.png
Graph Neural
Networks
▪Scarselli et al. (2008),
Kipf & Welling (2016)
▪Non-regular graph
structure used as
network topology
within each layer of
DNN
▪Applications to graph-
based data modeling,
e.g, social networks,
molecular biology, etc.
39
https://tkipf.github.io/graph-convolutional-networks/
https://towardsdatascience.com/how-to-do-deep-learning-on-
graphs-with-graph-convolutional-networks-7d2250723780
History of “Chatbots”
▪ELIZA (Weizenbaum 1966)
▪A.L.I.C.E. (Wallace 1995)
▪Jabberwacky (Carpenter 1997)
▪Cleverbot (Carpenter 2008)
(and many others)
42
https://en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png
http://chatbots.org/
https://www.youtube.com/watch?v=WnzlbyTZsQY (by Cornell CCSL)
Language Models
“With great power comes great _____”
43
Probability of
the next word
… depends on the conte t
Function P( ) can be defined as an explicit dataset,
a heuristic algorithm, a simple statistical distribution,
a (deep) neural network, or anything else
“Large” Language Models
▪Language models meet
(1) massive amount of data
and (2) “transformers”!
▪Vaswani et al. (2017)
▪DNNs with self-attention
mechanism for natural language
processing
▪Enhanced parallelizability
leading to shorter training time
than LSTM
▪BERT (2018) for Google
search
▪Open AI’s GPT (2020-) and
many others
44https://arxiv.org/abs/1706.03762
GPT/LLM
Architecture
Details
45
https://www.youtube.com/watch?v=wjZofJX0v4M
https://www.youtube.com/watch?v=eMlx5fFNoYc
3Blue1Brown
offers some great
video explanations!
Promising Applications
▪Coding aid
▪Personalized tutoring
▪Conversation partners
▪Modality conversion for people
with disability
▪Analysis of qualitative scientific
data
(… and many
others)
50
“Foundation” Models
▪General-purpose AI
models “that are
trained on broad
data at scale and
are adaptable to a
wide range of
downstream tasks”
−Stanford Institute for
Human-Centered Artificial
Intelligence (2021);
https://arxiv.org/abs/2108.07
258
51
https://philosophyterms.com/the-library-of-babel/
Conscious-
ness in
LLMs?
52
Challenges
(Especially from Systems
Science Perspectives)
53
Various Societal
Concerns About AI
▪“Artificial General Intelligence” (AGI)
and the “e istential crisis of the humanity”
▪Significant job loss caused by AI
▪Fake information generated by AI
▪Biases and social (in)justice
▪Lack of transparency and over-concentration of AI
power
▪Huge energy costs of deep learning and LLMs
▪Rights of AI and machines
54
AI as a Threat to Humanity?
55
But Some Simple Tasks
Are Still Difficult for AI
▪Words, numbers, facts
▪Simple logic and
reasoning
▪Maintaining stability and plasticity
▪Catastrophic forgetting
56
https://spectrum.ieee.org/openai-dall-e-2
https://www.invistaperforms.
org/getting-ahead-forgetting-
curve-training/
57
58
“Hallucination”
(B.S.-ing)
Wrong Use Cases of AI
59
Contamination of AI-
Generated Data
60
Another “AI Winter”
Coming?
61
System-LevelChallenge:
Idea Homogenization and
Social Fragmentation
▪Widespread use of
common AI tools may
homogenize human ideas
▪Over-consumption of
catered AI-generated
information may accelerate
echo chamber formation
and social fragmentation
▪How can we prevent these
negative outcomes?
62
(Centola et al. 2007)
System-LevelChallenge:
Critical Decision Making in
the Absence of Data
63
Fall 2020: “How to
safely reopen the
campus”
How can we make
informed decisions
in a critical situation
when no prior data
are available?
System-Level
Challenge:
Open-Endedness
64
https://en.wikipedia.org/wiki/Tree_of_life_(biology)
How can we make AI able to
keep producing new things?
Are We Getting Any
Closer to the
Understanding of
True “Intelligence"?
65
Final Remarks
▪Don’t get drowned in the vast
ocean of methods and tools
▪Hundreds of years of history
▪Buzzwords and fads keep changing
▪Keep the big picture in mind –
focus on what your real problem
is and how you will solve it
▪Being able to think and develop
unique, original, creative
solutions is key to differentiate
your intelligence from
AI/LLMs/machines
66