AI/ML Infra Meetup | ML explainability in Michelangelo

Alluxio 383 views 33 slides May 24, 2024

Slide 1 of 33

About This Presentation

AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Eric Wang (Software Engineer, @Uber)

Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how...

Size: 2.57 MB

Language: en

Added: May 24, 2024

Slides: 33 pages

Slide Content

ML Explainability in
Michelangelo
Eric Wang
Michelangelo - Uber ML Platform Team

01 Challenges and Needs
02 Importance of ML explainability
03 Explainers
04 Architecture
05 User workﬂow and case studies
06 Future opportunities and Q&A

Contents

Challenges and Needs
Needs
-Understand and interpret how models make decisions.
-Provide transparency and understanding for ML practitioners
and stakeholders
-Model exploration and understanding
-Eﬃciency of their features

Why this is important
●Uber operates ~1000 ML pipelines
●No oﬀer of feature importance insight for DL models
●Time consuming to explore features by training new models
(Eﬃciency/Resource)

Challenges and Needs
Model performance
Feature null rate monitoring
Michelangelo provided visual interfaces for:

Importance of Model Explainability
model_1
score
model_2
score
… ...
better
model?
better score?
??
Summary stats (AUC, MAE…) are informative, but not instructive for debugging

Questions:
1. Some features drift/quality
changes, are they important
enough?
2. Why two models performs
diﬀerently which features
drive more to the outcome?
3. Provide explanations for
operations and legals.

User’s Request
Making models more transparent and
interpretable
Needs to implement Explainable AI (XAI) for their Keras model to provide
clear explanations for model decisions, Users are investigating replacing a
formulaic model with the DNN model. Obviously the formulaic model is
more interpretable, Hence the team is looking for the DNN model to roughly
be explained by the same features.
Needs to provide explanation for business owners
regarding how a feed is promoted, this is also involved in
understanding how the decision is made for legal and
marketing teams.
A need to provide explanation for DL model in the online
prediction which is the same process for existing XGBoost
model. This requires us to develop a solution that can
integrate in Training so we can have baselines for the
explainer during realtime.
“
“
“
”
”
”

Importance of Model Explainability
Model debugging in Michelangelo: to make the 80% eﬀort more eﬃcient and eﬀective.
Explainability in model debugging: Transparency and Trust, Feature importance
analysis,, Comparison
ML is a widely used technology for Uber’s business
However, developing successful models is a long and non-trivial process
80/20 rule in machine learning: 20% of percent eﬀort building the initial working
model, 80% eﬀort to improve its performance to the ideal level.
From https://cornellius.substack.com/p/pareto-principle-in-machine-learning

Explanation methods

TreeShap
Interactive tree ensemble model visualizer on frontend
Data source: any serialized tree model (Spark GBDT, XGBoost, ...)

KernelShap
The good
Model Agnostic
Local explanation support
Captures Feature Interactions
Comprehensive Explanations

The bad
Computational Complexity
Scalability Issues
Independence Assumption

Integrated gradients
Why?

●Gradient based with baselines comparison
●a popular interpretability technique to any
diﬀerentiable model (e.g. images, text,
structured data)
●Scalable with large computation needs
●Many machine learning libraries
(TensorFlow, PyTorch) provide
implementations of IG.

Integrated gradients
Beneﬁts

●Completeness
●Interpretability
●Feature dependency
agnostic
●Eﬃciency
Feature values
Predicted score
Average prediction
Effect of feature on prediction:
0 + 0.17 +0.06 - 0.06 - 0.07 - 0.08 + 0.09 - 0.1 - 0.1 - 0.13 - 0.36 = -0.58 ~ -0.6

Integrated gradients
Notes

●Flatten Input features
●Choose the right layers (especially with categorical features)
●Use model wrapper to aggregate all outputs if possible

Using integration gradients
Model, model and model

Explainer
Model packaging for serving
Basis Feature
set
Feature joins
Prediction
(serving model)
Aggregated
Feature set
Feature
transformation
Decision
threshold
Post
processing
Not a raw model!

Using integration gradients
Save DL model separately

Explainer
Train Serving model
Raw model
Deploy to
endpoint
[keras.model, torch.nn.model, lightning…]
[torchscript, tf.compat.v1…]

Using integration gradients
Flatten input features

-Entity - an Uber business entity
such as city, rider, and driver (ex:
city, rider, driver, store)
-Feature Group - a feature group
for a given entity maps to a Hive
table and has features that are
related and convenient to
compute together
Entity
Feature
Group 1
Feature
Group 2
Feature
Group 3
Feature 1 Feature 2 Feature 3

Importance
level
Using integration gradients
Flatten input features

Entity
Feature
Group 1
Feature
Group 2
Feature
Group 3
Feature 1 Feature 2 Feature 3
Input to
model
Vectorized BucketizedVectorized
Feature 4

Importance
level
Using integration gradients
Flatten input features

Feature 1 Feature 2 Feature 3
Input to
model
Vectorized BucketizedVectorized
Feature 4

Using integration gradients
Choose the right layers
●Support both pyTorch
and Keras
●Support gradients on
input or output
●Ideal to pick the layer
for categorical features

Explainer
Using integration gradients
Use model wrapper to aggregate all outputs if possible

Model prediction pipelines
Basis Feature
set
Feature joins Prediction
Aggregated
Feature set
Feature
transformation
Decision
threshold
Post
processing

Using integration gradients
Use model wrapper to aggregate all outputs if possible

Model prediction pipelines
Basis
Feature set
Feature joins Prediction
Aggregate
d Feature
set
Feature
transformation
Decision
threshold
Post
processing
Calibrated

ML explainer in Michelangelo
Notebooks
1.Model debugging
2.Feature importance comparing
3.Visualization

Enabled for users
1. Diﬀerent explainers (IG, TreeShap, KernelShap, etc)
2. Data conversion among diﬀerent formats
3. Plotting
4. Model wrapper for calibration

●Backed in intuitive notions of what a good explanation
●Allows for both local and global reasoning, and it is
●Model agnostic
●Good adoption from popular explanation techniques
Visualize using Shapely value
Feature_0
Feature_1
Feature_2
Feature_3
Feature_4
Feature_5
Feature_6

ML explainer in Michelangelo
Generate feature importance in training pipeline

Model training pipelines
Basis Feature
set
Feature joins Trainer
Aggregated
Feature set
Feature
transformation
Explainer Packaging

ML explainer in Michelangelo

Monitoring Pipelines

1.Generate features importance during training time
2.Diﬀerent thresholds based on importance
3.Reduce noise from feature quality null rate

Case Studies
Case 1. Identifying Useful Features

suburb Non-suburb
compare
Scenario
-A team at Uber is evaluating the order
conversion rate is very different
between suburb and non-suburb areas.
-Adding new features did not change
model’s overall performance
-What feature affect the most between
the different datasets.

Method: compared different datasets in the
same model

Findings: The location feature is more
important than engagement features such as
historical orders in the non-suburb dataset

Conclusion: should zoom-in the location
feature more to make it bit more accurate.
Smaller hexagon size helps

Scenario
-A team at Uber is evaluating the order
conversion rate is very different
between suburb and non-suburb areas.
-Adding new features did not change
model’s overall performance
-What feature affect the most between
the different datasets.

Method: compared different datasets in the
same model

Findings: The location feature is more
important than engagement features such as
historical orders in the non-suburb dataset

Conclusion: should zoom-in the location
feature more to make it bit more accurate.
Smaller hexagon size helps
Case Studies
Case 1. Identifying Useful Features

Scenario
A team want to see what photos the model
predicted incorrectly. Since our action involves
a cost associated with a wrong prediction

Method:
●generate importance for all low
prediction score features
●generate our features by calling the
label/object detection models from
external

Findings: Some object names not categorized
properly

Conclusion:
Created one hot encoded features from
dropped objects
Created features that look at whether the string
contains certain words
Case Studies
Case 2. Identifying false positive/negative

Architecture
XAI framework

Architecture
Components:

1.Data processing
a.Converting from pySpark to numpy
b.Feature ﬂatten for calculating gradients
2.Explainer
a.Support multiple explainers (TreeShap/Kernel/IG…)
3.Model wrapper
a.Support diﬀerent model caller function or forward
function. (keyword based or array based)
b.Support calibration and aggregation or speciﬁc output
layer
4.Importance aggregation
a.Aggregate importance from multiple dimensions
b.Feature mapping from output to input.
XAI framework

●Support LLM explanation in prompting engineer
●Feature selection assistant
●Interactive visualization tools

Future opportunities

AI/ML Infra Meetup | ML explainability in Michelangelo

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

AI/ML Infra Meetup | ML explainability in Michelangelo

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx