IEEE CIS Webinar Sustainable futures.pdf

ClaudioGallicchio 59 views 54 slides Jun 14, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative rando...


Slide Content

1
Sustainable Futures in AI
Exploring Efficiency and Emerging Paradigms
Claudio Gallicchio
University of Pisa, Italy
This Webinar is provided to you by
IEEE Computational Intelligence Society
https://cis.ieee.org

2
The current path of deep
learning research
is largely unsustainable.

3
Doubling time
DL training
≈ 3 months
Moore’s law
≈ 2 years
Sevilla, Jaime, et al. "Compute Trends Across Three Eras of Machine Learning."arXiv e-prints(2022): arXiv-2202.

4
https://www.technologyreview.com/2019/06/06/239031/train
ing-a-single-ai-model-can-emit-as-much-carbon-as-five-
cars-in-their-lifetimes/
Strubell, Emma, Ananya Ganesh, and Andrew McCallum.
"Energy and policy considerations for deep learning in
NLP." arXiv preprint arXiv:1906.02243 (2019).
Energy: ≈ 54 US households in 1 year
CO!: ≈125 round-trip flights on Boeing 737

5
GPT-4 has 1.7 trillion of trainable parameters.
Training costs
▶ 25k Nvidia A100 GPUs for 90+ days
$100+ millions.
▶ 50k-60k MWh
▪ more than 5 years of 1,000 average US
households.
Inference costs
▶ as much electricity per month as 26k US
households.
It continues to get worse

6
AI risks to sensibly accelerate
the climate crisis.

7
Paris Agreement
Hold “the increase in the
global average temperature
to well below 2°C above pre-
industrial levels” and pursue
efforts “to limit the
temperature increase to 1.5°C
above pre-industrial levels.”
https://unfccc.int/process-and-meetings/the-
paris-agreement
European Green Deal
Make Europeclimate neutral by 2050. To make this objective legally binding, the Commission proposed theEuropean Climate Law, which also sets a new, more ambitious net greenhouse gas emissions reduction target of at least-55% by 2030, compared to 1990 levels.
https://commission.europa.eu/strategy-and-
policy/priorities-2019-2024/european-green-
deal/climate-action-and-green-deal_en

8
Compute
up to ≈10!" FLOPS
Energy
Range from MW to GW
Compute
≈10## FLOPS
Energy
≈20 W

9
Deep Learning models achieved a
tremendous success over the years.
This comes at a very high cost.
Do we really need this all the time?

10
Example: embedded applications
Source: https://bitalino.com/en/freestyle-kit-bt
Source: https://www.eenewsembedded.com/news/
raspberry-pi-3-now-compute-module-format

11
Deep neural networks: powerful representations by
applying multiple non-linear levels of transformation.
Deep Learning =
Architectural biases + Learning algorithms

12
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data
and tasks with minimal retraining.

13
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of
computational resources and energy
consumption.

14
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic
Information: handle time-series data and
dynamic environments effectively.

15
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic Information: handle time-series data and
dynamic environments effectively.
Ease of Hardware Implementation: suitable
for deployment on a wide range of hardware
platforms, including low-power devices.

16
Desirable AI systems’ properties
Fast Learning: quickly adapts to new data and tasks with minimal retraining.
Low Training Cost: economical in terms of computational resources and energy
consumption.
Efficient in Modeling Dynamic Information: handle time-series data and
dynamic environments effectively.
Ease of Hardware Implementation: suitable for deployment on a wide
range of hardware platforms, including low-power devices.

17
Nakajima, Kohei, and Ingo Fischer.Reservoir computing.
Springer Singapore, 2021.
“Reservoir Computing
seems simple but is difficult,
feels new but is old, opens
horizons, and is brutally
limiting”
- H. Jaeger

18
RNN architecture
recurrent
layer
%(')
)(')
*(')
output layer
(readout)%&=()$*&+)%%&−1+-%
.&=)&%&+-&
trainable parameters
state transition function
+"
+#
+$

19
Propagation Issues: fading / exploding memory
"(4) "(3) "(2) "(1) "(5)
*
!
*
"
+(1)
,(1)
*
!
*
"
+(2)
,(2)
*
!
*
"
+(3)
,(3)
*
!
*
"
+(4)
,(4)
*
!
*
"
+(5)
,(5)
*
# *# *
# *
#
Instabilities in the
forward propagation of
the input.

20
Propagation Issues: fading / exploding gradients
"(4) "(3) "(2) "(1) "(5)
*
!
*
"
+(1)
,(1)
*
!
*
"
+(2)
,(2)
*
!
*
"
+(3)
,(3)
*
!
*
"
+(4)
,(4)
*
!
*
"
+(5)
,(5)
*
# *# *
# *
#
Instabilities in the
backward propagation of the gradients.

21
The Philosophy
“Randomization is computationally
cheaper than optimization”
Rahimi, A. and Recht, B., 2008. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.
Advances in neural information processing systems, 21, pp.1313-1320.
Rahimi, A. and Recht, B., 2007. Random features for large-scale kernel machines. Advances in neural information processing systems,
20, pp. 1177-1184.

22
Basic Idea
Use as much as possible the intrinsic computational
capabilities of RNNs.
Even prior to (or in absence of) learning of the
recurrent connections.

23
Echo State Network
recurrent
layer
%(')
)(')
*(')
output layer
(readout)%&=()$*&+)%%&−1+-%
.&=)&%&+-&
trainable parameters
state transition function
+"
+#
+$
randomly initialized
and left untrained
“reservoir”
layer
Jaeger, Herbert, and Harald Haas. Science 304.5667 (2004): 78-80.
the representation layer is fixed,
only the output layer is trained

24
Echo State Network
recurrent
layer
%(')
)(')
*(')
output layer
(readout)!"=$%!&"+%"!"−1+*"
state transition function
+"
+#
+$
“reservoir”
layer
Jaeger, Herbert, and Harald Haas. Science 304.5667 (2004): 78-80.
Random, ok… but stable.
Control the asymptotic stability by constraining
the eigenvalues of )% (echo state property)
Yildiz, Izzet B., Herbert Jaeger, and Stefan J. Kiebel. "Re-visiting the
echo state property."Neural networks35 (2012): 1-9.
the representation layer is fixed,
only the output layer is trained

25
Echo State Property (ESP)
Gallicchio, Claudio. "Euler state networks: nondissipative reservoir computing"Neurocomputing 2024.

26
Advantages
Clean mathematical Analysis
Efficiency
Hardware implementation

27
AdvantagesClean mathematical Analysis
Efficiency
Hardware implementationMotionSenseHAR
0.930.972000x faster1700x more efficient
Gallicchio, Claudio. "Euler state networks: nondissipative reservoir computing"Neurocomputing 2024.

28
AdvantagesClean mathematical Analysis
Efficiency
Hardware implementation
Dragone, Mauro, et al. "A cognitive robotic ecology
approach to self-configuring and evolving AAL
systems."Engineering Applications of Artificial
Intelligence45 (2015): 269-280.
Neural learning in 8Kb of
memory + deployment
over-the-air

29
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
the reservoir can be implemented by any (controllable) physical substrate
●dynamics
●non-linearity
Yan, M., et al.Emerging opportunities and challenges for the
future of reservoir computing.Nat Commun15, 2056 (2024).
Nakajima, Kohei, et al. "Information processing via physical soft
body."Scientific reports5.1 (2015): 10487.

30
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
the reservoir can be implemented by any (controllable) physical substrate
●dynamics
●non-linearity
Yan, M., et al.Emerging opportunities and challenges for the
future of reservoir computing.Nat Commun15, 2056 (2024).
Nakajima, Kohei, et al. "Information processing via physical soft
body."Scientific reports5.1 (2015): 10487.

31
AdvantagesClean mathematical Analysis
EfficiencyHardware implementation
hardware–software co-design of
graph echo state nets
random resistive memory
arrays: low-cost, nanoscale and
stackable resistors for efficient
in-memory computing
state of the art performance +
up to 40x improvements in
energy efficiency
Wang, Shaocong, et al. "Echo state graph neural networks with
analogue random resistive memory arrays."Nature Machine
Intelligence5.2 (2023): 104-113.

32
Deep reservoirs
Reservoir = set of nested non-linear dynamical systems
#!$=tanh(+"
!#!$−1++#!#!$%$+/!)

#%$=tanh(+"
%#%$−1++#%0$+/%)
Gallicchio, Claudio, Alessio Micheli, and Luca
Pedrelli. "Deep reservoir computing: A critical
experimental analysis." Neurocomputing 268
(2017): 87-99
driving input

33
• Multiple time-scales
• Multiple frequencies
• Develop richer dynamics even without training of the recurrent connections
Gallicchio, Claudio, Alessio Micheli, and
Luca Pedrelli. "Deep reservoir computing: A
critical experimental analysis."
Neurocomputing 268 (2017): 87-99
Gallicchio, C., Micheli, A. and Pedrelli, L.,
2018. Design of deep echo state networks.
Neural Networks, 108, pp.33-47.
Deep reservoirs

34
Euler State Networks (EuSN)
Non-dissipative reservoir computing
1.impose antisymmetric recurrent weight
matrix to enforce critical dynamics
2.discretize the ODE
!"=!"−1++ tanh(%0 &"+%"−%"1−23!"−1+*)
untrained
step sizediffusion
dynamics are arbitrarily close to the edge-of-stability
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
/
/&ℎ&=tanh()$*&+)%%&+-)

35
Euler State Networks (EuSN)
Non-dissipative reservoir computing
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
High accuracy vs
SOTA fully trainable
models & ESNs

36
Euler State Networks (EuSN)
Non-dissipative reservoir computing
Gallicchio, Claudio. "Euler state networks: Non-dissipative
Reservoir Computing."Neurocomputing 2024.
Extremely more
efficient (up to 1750x
faster) than fully
trainable models

37
Is it possible to create a
reservoir that is better than
just a random one?

38
52346=5(76+8)
Intrinsic Plasticity
Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J. and Stroobandt, D., 2008.
Improving reservoirs using intrinsic plasticity. Neurocomputing, 71(7-9), pp.1159-1171.
•Adapt gain and bias of the act. function
•Tune the probability density of reservoir
neurons to maximum entropy
gainbias
hyperparameters
Kullback–Leibler divergence minimization
G.B.Morales,C.Mirasso,M.C.Soriano, 2021.
Unveiling the role of plasticity rules in reservoir
computing. Neurocomputing.

39
Federated and Continual scenarios in Pervasive AI
Centralized setting:
all data is available
in advance on a
single machine
Federated stationary
scenario: each client
has its own private
dataset in advance
Continual scenario: a
single machine gets
the data from clients in
a streaming fashion
Federated Continual
scenario: each client
has its own private
data stream
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual adaptation of federated
reservoirs in pervasive environments."Neurocomputing 2023

40
,%,.%
,!,.!
,&,.&+$'(=(0
)
,))0
)
.)+2 3
*%
,%+,-=,.+4,.
.%+,-=..+4..
,&+,-=,/+4,/
.&+,-=./+4./
(exact) Federated
Learning at the
readout level
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual adaptation
of federated reservoirs in pervasive environments."Neurocomputing 2023
Bacciu, Davide, et al. "Federated reservoir computing neural networks."2021
International Joint Conference on Neural Networks (IJCNN). IEEE, 2021.

41
Continual & Federated
Intrinsic Plasticity Learning
at the reservoir level
De Caro, Valerio, Claudio Gallicchio, and Davide Bacciu. "Continual
adaptation of federated reservoirs in pervasive
environments."Neurocomputing 2023
Stress Recognition (LM)
Driving-style Personalization
(LM)
.
Stress
prediction
Driving profile.
.
EDA sensing
!!,#!
!",#"
!,#=%
&#
&!$
$∈&!
,%
&#
&#$
$∈&!
CPSoS Applications
with human in the loopteaching-h2020.eu
De Caro, Valerio, et al. "AI-as-a-Service Toolkit for Human-Centered Intelligence in
Autonomous Driving."arXiv preprint arXiv:2202.01645, PERCOM 2022

42
(Neural & Physical) ComputingwithReservoirs
Archetype Computing
System engine to run
dynamical systems
enriched with lifelong
and evolutionary
learning
https://eic-emerge.eu

43
NEURONE: extremely efficient NEUromorphic
Reservoir cOmputing in Nanowire network hardwarE
Advanced
Reservoir
Computing nets
Nanowire
networks
memristive HW
Ultra energy
efficient
learning
Milano, Gianluca, et al. "In materiareservoir computing with a fully memristivearchitecture
based on self-organizing nanowire networks."Nature materials21.2 (2022): 195-202.

44
Reservoir Computing: principled
sustainable AI, designing training-efficient
sequence models.

45
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.
Dynamics constrained in a “smart” way to
offer useful architectural biases.

46
Dynamics constrained in a “smart” way to offer useful architectural
biases.
Great potential in pervasive AI scenarios
where reduced computational resources
are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.

47
Dynamics constrained in a “smart” way to offer useful architectural
biases.
Great potential in pervasive AI scenarios where reduced
computational resources are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-
efficient sequence models.
Enable the development of smarter neural
architectures that can be trained end-to-end.

48
Dynamics constrained in a “smart” way to offer useful architectural biases.
Great potential in pervasive AI scenarios where reduced computational
resources are available, and neuromorphic HW.
Reservoir Computing: principled sustainable AI, designing training-efficient
sequence models.
Enable the development of smarter neural architectures that can be trained end-to-end.
A greener path for AI development without
the need of extensive parameter training.

49
Resources

50
https://github.com/reservoirpy/reservoirpy

51
AI-Toolkit
https://github.com/EU-
TEACHING/teaching-ai-toolkit
Lomonaco, Vincenzo, et al.
"AI-Toolkit: A Microservices
Architecture for Low-Code
Decentralized Machine
Intelligence." ICASSPW
2023.
Deep Echo State Networks
Euler State Networks
https://github.com/galli
cch/DeepRC-TF
https://github.com/gallic
ch/EulerStateNetworks

52
Reservoir Computing NNs
https://www.youtube.com/
watch?v=XJg7VdN7g-0
IJCNN 2021 Tutorial: Reservoir Computing:
Randomized Recurrent Neural Networks
WCCI IJCNN 2024
Special Session
https://dyn.web.nitech.ac.jp/en/wc
ci2024_ss_rc

53
IEEE Task Force on Reservoir Computing
Promote and stimulate the development of Reservoir Computing
research under both theoretical and application perspectives.
https://sites.google.com/view/reservoir-computing-tf/
IEEE Task Force on Randomization-based Neural Networks and Learning
Systems
Promote the research and applications of deep rand. neural networks
and learning systems.
https://sites.google.com/view/randnn-tf/

54
Sustainable Futures in AI
Exploring Efficiency and
Emerging Paradigms
Claudio Gallicchio
University of Pisa, Italy
This Webinar is provided to you by
IEEE Computational Intelligence Society
https://cis.ieee.org
[email protected]
https://www.linkedin.com/in/claudio-gallicchio-05a47038/
https://twitter.com/claudiogallicc1
collaborations welcome!