[DSC DACH 24] Data Engineering for Sustainable AI: Optimizing Energy Usage and Real-World Impact - Jesse Anderson

DataScienceConferenc1 45 views 23 slides Sep 16, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

AI and data engineering are poised to revolutionize how we address global challenges. This talk explores the synergistic relationship between these disciplines in optimizing energy usage and driving sustainability. By leveraging advanced data engineering techniques, we can create AI models capable o...


Slide Content

Data Engineering for
Sustainable AI
Optimizing Energy Usage and Real-World Impact

How can we make
AI more efficient
and sustainable?

We’re going to focus on data engineering
There are different ways
●Optimizing training
●Optimizing scoring
●Improving architectures

I’ve dealt with or
already have data
engineers and it
didn’t…

Data Engineer: A
software engineer
who has specialized in
creating data systems

How good is a data engineer at
a data science task?
How good is a data scientist at
a data engineering task?

A data engineer is the not the
data scientist with the best
programming skill or the data
warehouse engineer who
learned a little programming

Engineers and
scientists think
about problems
differently

Examples:
●Efficient data retrieval (databases)
●Efficient data movement (pub/subs)
●Right tool for the job
●Understanding nuances and tradeoffs
Optimizing Systems
Engineers excel at optimizing and architecting systems that are more efficient

Examples:
●Are the right systems being used?
●Is the code efficient?
●Is more time being spent in an unexpected
code path?
Is training using the right systems
Optimizing Training

Examples:
●Is the scoring system optimized for the use
case?
●Is external data accessed or cached efficiently?
●Can the process be sped up with concurrency?
Optimizing Scoring
Is the scoring system optimized?

A lot of a little or a little of
a lot. A 0.1% improvement
on a trillion or 2 hours of
an eight hour job.

Let’s make a concrete example
Energy usage will be optimized:
●By the car driving less
●By using less power at the data
center for training and scoring
Optimizing Car
Directions Using AI

1.Identify the inefficient parts of the system
2.Estimate the difficulty and efficiency gains of
optimizing the parts
3.Gradually improve the specific parts of the system
4.Repeat
Three Pronged Approach
Looking for efficiency through the eyes of a data engineer

Improving how we acquire data
Data Acquisition
●Sustainability systems often have
real-time ingestion requirements
●Real-time systems have different levels
of engineering rigor
●Dissemination of data is difficult

Getting the data ready
Preprocessing
●Use the right distributed system
for compute and storage
●Use binary formats

Improving data preparation
Feature engineering
●Make sure queries and data preparation
is efficient
●Ensure queries and data paths are
understood
●Use best practices for running queries

Distribute and optimize
Model Development
●Ensure you know about the latest
technologies in distributed systems
●Check for performance and memory
loading issues
●A little of a lot or a lot of a little

The Right People
at the Right Ratios
We need data scientists, data
engineers, and operations engineers.
Each person is important with the
right ratio.
Who

Data Teams
All Three Teams Are Required For Success

Questions?

Thank you
bigdatainstitute.io
Tags