“Testing Cloud-to-Edge Deep Learning Pipelines: Ensuring Robustness and Efficiency,” a Presentation from Instrumental

embeddedvision 36 views 32 slides Sep 30, 2024

Slide 1 of 32

About This Presentation

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/09/testing-cloud-to-edge-deep-learning-pipelines-ensuring-robustness-and-efficiency-a-presentation-from-instrumental/

Rustem Feyzkhanov, Staff Machine Learning Engineer at Instrumental, presents the “Testin...

Size: 1012.71 KB

Language: en

Added: Sep 30, 2024

Slides: 32 pages

Slide Content

Testing Cloud-to-Edge Deep
Learning Pipelines: Ensuring
Robustness and Efficiency
RustemFeyzkhanov
Senior Staff Machine Learning Engineer
Instrumental

•Introduction to cloud-to-edge deep learning pipelines
•Testing strategies for cloud-to-edge pipelines
•ML pipeline dev tests
•ML pipeline internal tests
•Stage tests
•Summary
Agenda
2© 2024 Instrumental

Example –corgi/not corgi model
3© 2024 Instrumental
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance
Model
Model
corgi
not corgi

Example –corgi/not corgi model
4© 2024 Instrumental
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance

Example –corgi/not corgi model
5© 2024 Instrumental
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance

Example –corgi/not corgi model
6© 2024 Instrumental
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance

Example –corgi/not corgi model
7© 2024 Instrumental
Business
understanding
Data
acquisition
Modeling Deployment
Customer
acceptance
corgi
…
…
Why did it happen?
Prediction
service
Prediction
service

•Data preprocessing is different between cloud and edge
•Inference framework has a different version and doesn’t support model
•NPU drivers are not backward compatible and drop some of the tensors
•Data drift happened on edge
•Training set wasn’t comprehensive enough
=> proper testing can solve most of the above issues (not all though)
Possible edge cases with model on edge
8© 2024 Instrumental

9© 2024 Instrumental
Cloud-to-edge deep learning pipeline evolution

Cloud-to-edge deep learning pipeline –phase 1
10© 2024 Instrumental
Data extraction
and analysis
Data
preparation
Model training
Model
evaluation and
validation
Trained model
Model registry Model serving
Prediction
service
Research
Production
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

ML training pipeline
Cloud-to-edge deep learning pipeline –phase 2
11© 2024 Instrumental
Data extraction
Data
preparation
Model training
Model
evaluation
Data validation
Model
validation
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Cloud-to-edge deep learning pipeline –phase 2
12© 2024 Instrumental
Source code
Model registry Model serving
Prediction
service
Research
Production
Data
Automated ML training
pipeline
Experimental ML training
pipeline
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Cloud-to-edge deep learning pipeline –phase 3
13© 2024 Instrumental
Source code
Model registry Model serving
Prediction
service
Research
Production
Data
Automated ML training
pipeline
Experimental ML training
pipeline
CI: Build/test/
package
Packages/
artifacts
Performance monitoringTrigger
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Phase 3
•Pros
•Fully automated model training
•Quick releases/model retraining
•Cons
•New model may run into errors on the edge device
•New model may perform on the edge differently from the cloud
Cloud-to-edge deep learning pipeline –phase 3
14© 2024 Instrumental

Cloud-to-edge deep learning pipeline –model lifecycle
16© 2024 Instrumental
Source code
Development
/experiment
Pipeline
continuous
integration
Packages
Pipeline
continuous
delivery
Automated
pipeline
Continuous
training
Trained model
Model
continuous
delivery
Prediction
service
New data
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Let’s unwrap the pipeline into ML model lifecycle

Cloud-to-edge deep learning pipeline –model lifecycle
17© 2024 Instrumental
Source code
Development
/experiment
Pipeline
Continuous
Integration
Packages
Pipeline
Continuous
Delivery
Automated
pipeline
Continuous
training
Trained model
Model
Continuous
Delivery
Prediction
service
New data
Pipeline development
Training pipeline
Model serving
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Cloud-to-edge deep learning pipeline –model lifecycle
18© 2024 Instrumental
Source code
Development
/experiment
Pipeline
Continuous
Integration
Packages
Pipeline
Continuous
Delivery
Automated
pipeline
Continuous
training
Trained model
Model
Continuous
Delivery
Prediction
service
New data
ML Pipeline dev tests
Pre-train tests
Post-train tests
Stage tests
training
Stage tests
prediction
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Categories of tests
19© 2024 Instrumental
ML pipeline
tests
Run in local dev
and CI
Test pipeline
code before
packages are
generated
Pre-train
tests
Run before
training in the
cloud
Checks our
assumptions
about data
Post-train
tests
Run after
training in the
cloud
Checks our
assumptions
about model
Stage test –
training
Run in stage env
after the training
package is
deployed
Checks for
regression in the
cloud
Stage test –
prediction
Run in stage env
after the
inference
package is
deployed
Checks for
regression on
edge

ML pipeline tests
21© 2024 Instrumental
Unit tests
Check
individual
components
of the
pipeline
Regression
tests
Check that
change
didn’t affect
existing
functionality
and artifacts
Smoke
tests
Cheap tests
to check
that model
performs as
expected on
simple
dataset
Integration
tests
Check that
components
work
together in
an expected
way

ML pipeline tests examples
22© 2024 Instrumental
Unit tests
-Data
split/preparation
-Naive cases for
training
-Training-serving
skew
Regression
tests
-Backward
compatibility
with old models
-Data shift due
to preprocessing
library change
Smoke
tests
-Does model
correctly predict
sample from
train?
-Does model
return result in
expected format?
Integration
tests
-Test that
pipeline works
together
-Test that there
is no discrepancy
between train
and serving env

Cloud-to-edge deep learning pipeline
24© 2024 Instrumental
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
ML Training Pipeline
Data extraction
Data
preparation
Model training
Model
evaluation
Data validation
Model
validation

Pre-train and post-train tests
25© 2024 Instrumental
Pre-train tests
Data validation
Explicit checks
for data which
we will use for
training
Tests that identify
data/pipeline
assertion failure
and fail early
Post-train tests
Model evaluation
Performance on
a validation or
test dataset
Tests that model
was able to
converge and reach
acceptable
performance
Model validation
Invariance tests
Test that specific
data perturbations
don’t affect model
output
Directional
expectation tests
Test that specific
data perturbations
do affect model
output
Minimum
functionality
tests
Ensure that model
works in critical
scenarios/failure
modes/edge cases

Pre-train and post-train tests
26© 2024 Instrumental
Pre-train tests
Data validation
Explicit checks
for data which
we will use for
training
-Incorrect labels
-Incorrect
features
Post-train tests
Model evaluation
Performance on
a validation or
test dataset
-Low accuracy
Model validation
Invariance tests
-Rotated/shifted
image have
similar
predictions
Directional
expectation tests
-Apples are at
the top when
searching for
apples
Minimum
functionality
tests
-Blurry images
-Overexposed
images

How cloud-to-edge deep learning pipeline looks like
28© 2024 Instrumental
Source code
Development
/experiment
Pipeline
Continuous
Integration
Packages
Pipeline
Continuous
Delivery
Automated
pipeline
Continuous
training
Trained model
Model
Continuous
Delivery
Prediction
service
New data
ML Pipeline dev tests
Pre-train tests
Post-train tests
Stage tests
training
Stage tests
prediction
From https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

•Run on stage (could be dev environment)
•Tests should expose bugs with more realistic data and load
•Tests could be more complex
•Examples
•Shift in prediction scores
•Shift in predictions themselves
•CPU/RAM utilization (to catch memory leak)
Stage tests
29© 2024 Instrumental

•Tests are a way to ensure predictable and transparent model behavior
•Tests are the best way to catch ML bugs early
•ML tests ≠classic software engineering tests, but similar mindset could be
applied:
•Unit tests to check for bugs
•Testing specific scenarios
•Testing expected edge cases
Summary
31© 2024 Instrumental

•Posts and presentations
•https://arseny.info/reliable_ML
•https://www.jeremyjordan.me/testing-ml/
•https://eugeneyan.com/writing/testing-ml/
•https://krokotsch.eu/cleancode/2020/08/11/Unit-Tests-for-Deep-Learning.html
•https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
•GithubRepositories
•https://github.com/marcotcr/checklist
•https://github.com/great-expectations/great_expectations
•https://github.com/HypothesisWorks/hypothesis
Resources
32© 2024 Instrumental

“Testing Cloud-to-Edge Deep Learning Pipelines: Ensuring Robustness and Efficiency,” a Presentation from Instrumental

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

“Testing Cloud-to-Edge Deep Learning Pipelines: Ensuring Robustness and Efficiency,” a Presentation from Instrumental

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx