The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative rando...
The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative randomization techniques in neural networks, the potential of reservoir computing, and the cutting-edge realm of neuromorphic computing. This webinar aims to connect theoretical knowledge with practical applications and provide insights into how these innovative approaches can lead to more robust, efficient, and environmentally conscious AI systems.
Webinar Speaker: Prof. Claudio Gallicchio, Assistant Professor, University of Pisa
Claudio Gallicchio is an Assistant Professor at the Department of Computer Science of the University of Pisa, Italy. His research involves merging concepts from Deep Learning, Dynamical Systems, and Randomized Neural Systems, and he has co-authored over 100 scientific publications on the subject. He is the founder of the IEEE CIS Task Force on Reservoir Computing, and the co-founder and chair of the IEEE Task Force on Randomization-based Neural Networks and Learning Systems. He is an associate editor of IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
Size: 9.17 MB
Language: en
Added: Jun 14, 2024
Slides: 54 pages
Slide Content
1
Sustainable Futures in AI
Exploring Efficiency and Emerging Paradigms
Claudio Gallicchio
University of Pisa, Italy
This Webinar is provided to you by
IEEE Computational Intelligence Society
https://cis.ieee.org
2
The current path of deep
learning research
is largely unsustainable.
3
Doubling time
DL training
≈ 3 months
Moore’s law
≈ 2 years
Sevilla, Jaime, et al. "Compute Trends Across Three Eras of Machine Learning."arXiv e-prints(2022): arXiv-2202.
4
https://www.technologyreview.com/2019/06/06/239031/train
ing-a-single-ai-model-can-emit-as-much-carbon-as-five-
cars-in-their-lifetimes/
Strubell, Emma, Ananya Ganesh, and Andrew McCallum.
"Energy and policy considerations for deep learning in
NLP." arXiv preprint arXiv:1906.02243 (2019).
Energy: ≈ 54 US households in 1 year
CO!: ≈125 round-trip flights on Boeing 737
5
GPT-4 has 1.7 trillion of trainable parameters.
Training costs
▶ 25k Nvidia A100 GPUs for 90+ days
$100+ millions.
▶ 50k-60k MWh
▪ more than 5 years of 1,000 average US
households.
Inference costs
▶ as much electricity per month as 26k US
households.
It continues to get worse
6
AI risks to sensibly accelerate
the climate crisis.
7
Paris Agreement
Hold “the increase in the
global average temperature
to well below 2°C above pre-
industrial levels” and pursue
efforts “to limit the
temperature increase to 1.5°C
above pre-industrial levels.”
https://unfccc.int/process-and-meetings/the-
paris-agreement
European Green Deal
Make Europeclimate neutral by 2050. To make this objective legally binding, the Commission proposed theEuropean Climate Law, which also sets a new, more ambitious net greenhouse gas emissions reduction target of at least-55% by 2030, compared to 1990 levels.
https://commission.europa.eu/strategy-and-
policy/priorities-2019-2024/european-green-
deal/climate-action-and-green-deal_en
8
Compute
up to ≈10!" FLOPS
Energy
Range from MW to GW
Compute
≈10## FLOPS
Energy
≈20 W
9
Deep Learning models achieved a
tremendous success over the years.
This comes at a very high cost.
Do we really need this all the time?
11
Deep neural networks: powerful representations by
applying multiple non-linear levels of transformation.
Deep Learning =
Architectural biases + Learning algorithms
12
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data
and tasks with minimal retraining.
13
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of
computational resources and energy
consumption.
14
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic
Information: handle time-series data and
dynamic environments effectively.
15
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic Information: handle time-series data and
dynamic environments effectively.
Ease of Hardware Implementation: suitable
for deployment on a wide range of hardware
platforms, including low-power devices.
16
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic Information: handle time-series data and
dynamic environments effectively.
Ease of Hardware Implementation: suitable for deployment on a wide
range of hardware platforms, including low-power devices.
17
Nakajima, Kohei, and Ingo Fischer.Reservoir computing.
Springer Singapore, 2021.
“Reservoir Computing
seems simple but is difficult,
feels new but is old, opens
horizons, and is brutally
limiting”
- H. Jaeger
18
RNN architecture
recurrent
layer
%(')
)(')
*(')
output layer
(readout)%&=()$*&+)%%&−1+-%
.&=)&%&+-&
trainable parameters
state transition function
+"
+#
+$
21
The Philosophy
“Randomization is computationally
cheaper than optimization”
Rahimi, A. and Recht, B., 2008. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.
Advances in neural information processing systems, 21, pp.1313-1320.
Rahimi, A. and Recht, B., 2007. Random features for large-scale kernel machines. Advances in neural information processing systems,
20, pp. 1177-1184.
22
Basic Idea
Use as much as possible the intrinsic computational
capabilities of RNNs.
Even prior to (or in absence of) learning of the
recurrent connections.
23
Echo State Network
recurrent
layer
%(')
)(')
*(')
output layer
(readout)%&=()$*&+)%%&−1+-%
.&=)&%&+-&
trainable parameters
state transition function
+"
+#
+$
randomly initialized
and left untrained
“reservoir”
layer
Jaeger, Herbert, and Harald Haas. Science 304.5667 (2004): 78-80.
the representation layer is fixed,
only the output layer is trained
24
Echo State Network
recurrent
layer
%(')
)(')
*(')
output layer
(readout)!"=$%!&"+%"!"−1+*"
state transition function
+"
+#
+$
“reservoir”
layer
Jaeger, Herbert, and Harald Haas. Science 304.5667 (2004): 78-80.
Random, ok… but stable.
Control the asymptotic stability by constraining
the eigenvalues of )% (echo state property)
Yildiz, Izzet B., Herbert Jaeger, and Stefan J. Kiebel. "Re-visiting the
echo state property."Neural networks35 (2012): 1-9.
the representation layer is fixed,
only the output layer is trained
25
Echo State Property (ESP)
Gallicchio, Claudio. "Euler state networks: nondissipative reservoir computing"Neurocomputing 2024.
27
AdvantagesClean mathematical Analysis
Efficiency
Hardware implementationMotionSenseHAR
0.930.972000x faster1700x more efficient
Gallicchio, Claudio. "Euler state networks: nondissipative reservoir computing"Neurocomputing 2024.
28
AdvantagesClean mathematical Analysis
Efficiency
Hardware implementation
Dragone, Mauro, et al. "A cognitive robotic ecology
approach to self-configuring and evolving AAL
systems."Engineering Applications of Artificial
Intelligence45 (2015): 269-280.
Neural learning in 8Kb of
memory + deployment
over-the-air
29
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
the reservoir can be implemented by any (controllable) physical substrate
●dynamics
●non-linearity
Yan, M., et al.Emerging opportunities and challenges for the
future of reservoir computing.Nat Commun15, 2056 (2024).
Nakajima, Kohei, et al. "Information processing via physical soft
body."Scientific reports5.1 (2015): 10487.
30
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
the reservoir can be implemented by any (controllable) physical substrate
●dynamics
●non-linearity
Yan, M., et al.Emerging opportunities and challenges for the
future of reservoir computing.Nat Commun15, 2056 (2024).
Nakajima, Kohei, et al. "Information processing via physical soft
body."Scientific reports5.1 (2015): 10487.
31
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
hardware–software co-design of
graph echo state nets
random resistive memory
arrays: low-cost, nanoscale and
stackable resistors for efficient
in-memory computing
state of the art performance +
up to 40x improvements in
energy efficiency
Wang, Shaocong, et al. "Echo state graph neural networks with
analogue random resistive memory arrays."Nature Machine
Intelligence5.2 (2023): 104-113.
32
Deep reservoirs
Reservoir = set of nested non-linear dynamical systems
#!$=tanh(+"
!#!$−1++#!#!$%$+/!)
…
#%$=tanh(+"
%#%$−1++#%0$+/%)
Gallicchio, Claudio, Alessio Micheli, and Luca
Pedrelli. "Deep reservoir computing: A critical
experimental analysis." Neurocomputing 268
(2017): 87-99
driving input
33
• Multiple time-scales
• Multiple frequencies
• Develop richer dynamics even without training of the recurrent connections
Gallicchio, Claudio, Alessio Micheli, and
Luca Pedrelli. "Deep reservoir computing: A
critical experimental analysis."
Neurocomputing 268 (2017): 87-99
Gallicchio, C., Micheli, A. and Pedrelli, L.,
2018. Design of deep echo state networks.
Neural Networks, 108, pp.33-47.
Deep reservoirs
34
Euler State Networks (EuSN)
Non-dissipative reservoir computing
1.impose antisymmetric recurrent weight
matrix to enforce critical dynamics
2.discretize the ODE
!"=!"−1++ tanh(%0 &"+%"−%"1−23!"−1+*)
untrained
step sizediffusion
dynamics are arbitrarily close to the edge-of-stability
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
/
/&ℎ&=tanh()$*&+)%%&+-)
35
Euler State Networks (EuSN)
Non-dissipative reservoir computing
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
High accuracy vs
SOTA fully trainable
models & ESNs
36
Euler State Networks (EuSN)
Non-dissipative reservoir computing
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
Extremely more
efficient (up to 1750x
faster) than fully
trainable models
37
Is it possible to create a
reservoir that is better than
just a random one?
38
52346=5(76+8)
Intrinsic Plasticity
Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J. and Stroobandt, D., 2008.
Improving reservoirs using intrinsic plasticity. Neurocomputing, 71(7-9), pp.1159-1171.
•Adapt gain and bias of the act. function
•Tune the probability density of reservoir
neurons to maximum entropy
gainbias
hyperparameters
Kullback–Leibler divergence minimization
G.B.Morales,C.Mirasso,M.C.Soriano, 2021.
Unveiling the role of plasticity rules in reservoir
computing. Neurocomputing.
39
Federated and Continual scenarios in Pervasive AI
Centralized setting:
all data is available
in advance on a
single machine
Federated stationary
scenario: each client
has its own private
dataset in advance
Continual scenario: a
single machine gets
the data from clients in
a streaming fashion
Federated Continual
scenario: each client
has its own private
data stream
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual adaptation of federated
reservoirs in pervasive environments."Neurocomputing 2023
40
,%,.%
,!,.!
,&,.&+$'(=(0
)
,))0
)
.)+2 3
*%
,%+,-=,.+4,.
.%+,-=..+4..
,&+,-=,/+4,/
.&+,-=./+4./
(exact) Federated
Learning at the
readout level
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual adaptation
of federated reservoirs in pervasive environments."Neurocomputing 2023
Bacciu, Davide, et al. "Federated reservoir computing neural networks."2021
International Joint Conference on Neural Networks (IJCNN). IEEE, 2021.
41
Continual & Federated
Intrinsic Plasticity Learning
at the reservoir level
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual
adaptation of federated reservoirs in pervasive
environments."Neurocomputing 2023
Stress Recognition (LM)
Driving-style Personalization
(LM)
.
Stress
prediction
Driving profile.
.
EDA sensing
!!,#!
!",#"
!,#=%
&#
&!$
$∈&!
,%
&#
&#$
$∈&!
CPSoS Applications
with human in the loopteaching-h2020.eu
De Caro, Valerio, et al. "AI-as-a-Service Toolkit for Human-Centered Intelligence in
Autonomous Driving."arXiv preprint arXiv:2202.01645, PERCOM 2022
42
(Neural & Physical) ComputingwithReservoirs
Archetype Computing
System engine to run
dynamical systems
enriched with lifelong
and evolutionary
learning
https://eic-emerge.eu
43
NEURONE: extremely efficient NEUromorphic
Reservoir cOmputing in Nanowire network hardwarE
Advanced
Reservoir
Computing nets
Nanowire
networks
memristive HW
Ultra energy
efficient
learning
Milano, Gianluca, et al. "In materiareservoir computing with a fully memristivearchitecture
based on self-organizing nanowire networks."Nature materials21.2 (2022): 195-202.
45
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.
Dynamics constrained in a “smart” way to
offer useful architectural biases.
46
Dynamics constrained in a “smart” way to offer useful architectural
biases.
Great potential in pervasive AI scenarios
where reduced computational resources
are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.
47
Dynamics constrained in a “smart” way to offer useful architectural
biases.
Great potential in pervasive AI scenarios where reduced
computational resources are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.
Enable the development of smarter neural
architectures that can be trained end-to-end.
48
Dynamics constrained in a “smart” way to offer useful architectural biases.
Great potential in pervasive AI scenarios where reduced computational
resources are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-efficient
sequence models.
Enable the development of smarter neural architectures that can be trained end-to-end.
A greener path for AI development without
the need of extensive parameter training.
49
Resources
50
https://github.com/reservoirpy/reservoirpy
51
AI-Toolkit
https://github.com/EU-
TEACHING/teaching-ai-toolkit
Lomonaco, Vincenzo, et al.
"AI-Toolkit: A Microservices
Architecture for Low-Code
Decentralized Machine
Intelligence." ICASSPW
2023.
Deep Echo State Networks
Euler State Networks
https://github.com/galli
cch/DeepRC-TF
https://github.com/gallic
ch/EulerStateNetworks
53
IEEE Task Force on Reservoir Computing
Promote and stimulate the development of Reservoir Computing
research under both theoretical and application perspectives.
https://sites.google.com/view/reservoir-computing-tf/
IEEE Task Force on Randomization-based Neural Networks and Learning
Systems
Promote the research and applications of deep rand. neural networks
and learning systems.
https://sites.google.com/view/randnn-tf/
54
Sustainable Futures in AI
Exploring Efficiency and
Emerging Paradigms
Claudio Gallicchio
University of Pisa, Italy
This Webinar is provided to you by
IEEE Computational Intelligence Society
https://cis.ieee.org [email protected]
https://www.linkedin.com/in/claudio-gallicchio-05a47038/
https://twitter.com/claudiogallicc1
collaborations welcome!