KI soll fair sein. Da sind wir uns alle einig. Entsprechend gilt es, eine „Voreingenommenheit“ der eigenen KI-Lösung zu vermeiden.
Leichter gesagt als getan, denn Bias kann sich an verschiedenen Stellen innerhalb des AI/ML-Lifecycles einschleichen – vom ini...
SPAGAT ZWISCHEN BIAS UND FAIRNESS
KI soll fair sein. Da sind wir uns alle einig. Entsprechend gilt es, eine „Voreingenommenheit“ der eigenen KI-Lösung zu vermeiden.
Leichter gesagt als getan, denn Bias kann sich an verschiedenen Stellen innerhalb des AI/ML-Lifecycles einschleichen – vom initialen Design bis hin zum produktiven Einsatz des Modells. Diese Stellen gilt es zu identifizieren und im Detail zu verstehen. Denn nicht jede Art von Voreingenommenheit ist automatisch auch böse bzw. unfair.
Die Session zeigt, wie potenzielles Auftreten von unerwünschtem Bias in der eigenen KI-Lösung aufgedeckt und vermieden werden kann.
Size: 8.18 MB
Language: en
Added: Apr 25, 2024
Slides: 97 pages
Slide Content
Der Spagat zwischen
Bias und Fairness
#WISSENTEILEN
Lars Röwekamp |@mobileLarson
@mobileLarson
CIO New Technologies
OPEN KNOWLEDGE
Lars Röwekamp
(Architecture, Microservices, Cloud, AI & ML)
„The good,
the bad,
and
the ugly?“
https://www.bmc.com
„... is a phenomenon that
skews the result of an algorithm
in favor or against an idea.“
Bias
Y
X
Bias-Variance Tradeoff
BIAS: Difference between the
average prediction of the model
and the correct value.
VARIANCE: The variablility of
model prediction for a given
data point.
Y
X
Bias-Variance Tradeoff
HIGH BIAS & LOW VARIANCE:
Oversimplified and under-fitted
model. High error on training
and high error on test data.
Y
X
Bias-Variance Tradeoff
LOW BIAS & HIGH VARIANCE:
Complex and over-fitted model.
Performes well on training data
but high error rates on test data.
Y
X
Bias-Variance Tradeoff
LOW BIAS & LOW VARIANCE:
Sweet spot where the model is
optimal with most optimal error
rate and not too complex model.
Y
Model Complexity
Bias-Variance Tradeoff
Prediction
Error
High Bias
Low Variance
Low Bias
High Variance
Test Sample
Training Sample
Y
Model Complexity
Bias-Variance Tradeoff
Prediction
Error
High Bias
Low Variance
Test Sample
Training Sample
under-fitted
Low Bias
High Variance
Y
Model Complexity
Bias-Variance Tradeoff
Prediction
Error
High Bias
Low Variance
Test Sample
Training Sample
under-fittedover-fitted
Low Bias
High Variance
Y
Model Complexity
Bias-Variance Tradeoff
Prediction
Error
High Bias
Low Variance
Test Sample
Training Sample
under-fittedover-fitted
„You want
to be here“
Low Bias
High Variance
( Total Error = Bias2+ Variance + Irreducable Error )
Total Error
Conclusion:
Bias is not bad in general, but …
Bias-Variance Tradeoff
https://preply.com/en/question/what-does-bias-mean-in-english-49251#
“… simply means inclination or prejudice
for or against one person or group,
especially in a way
considered to be unfair.“
Bias in sense of „Fairness“
„A model should not make its decisions
based on sensitive attributes!“*
*Ethnic or social origin, gender, age, income, marital status,
sexual orientation, educational background or religious affiliation.
Bias in sense of „Fairness“
„A model should not make its decisions
based on sensitive attributes!“*
*Ethnic or social origin, gender, age, income, marital status,
sexual orientation, educational background or religious affiliation.
Bias in sense of „Fairness“
height & weightzip
Be also aware of proxy attributes!
Rule #1„Be aware
of the
possibility
of bias.“
https://research.aimultiple.com/ai-bias/
“… is an anomaly in the output of
machine learning algorithms, due
to the prejudiced assumptions made during
the algorithm development process or
prejudices in the training data.“
Bias
ActionOutcomeAI/ML
Model
ML based Process
ActionOutcomeAI/ML
Model
ML based Process. Is it fair?
ActionOutcome
Fair compared to … ?
Current
(Human)
Decision
DataAI/ML
Pipeline
Human
ReviewActionWorld
Sources of Bias
Example: grant / deny Loan
Goal: Provide loans while balancing repayment rates for bank loans
Adult income data*
•size: 48.843
•task: income > $50K / year
•sensitive attributes
•gender
•race
https://archive.ics.uci.edu/ml/datasets/Adult
DataAI/ML
Pipeline
Human
ReviewActionWorld
Sources of Bias
Goal: Provide loans while balancing repayment rates for bank loans
Historical loans and
payments, credit
reporting data,
background checks.
Build model to
predict risk of
not repaying on
time.
Deny loan or
increase interest
rate and/or
penalties
Manual review
by the personal
customer
advisor.
Current and
potential bank
customers
AI/ML
Pipeline
Human
Review
Sources of Bias
Historical Bias
Data ActionWorld
Historical Bias:In the past, the husband kept the salary account and the wife the household account.
AI/ML
Pipeline
Human
Review
Sources of Bias
Sample Bias
Measurement Bias
Label Bias
Historical Bias
Data ActionWorld
Sample Bias:
Measurement Bias:
Label Bias:
Grouping of nationality by German/EU and others.
Overdraft as a proxy for credit risk.
Same scale for feature “income" low/medium/high for man and woman.
AI/ML
Pipeline
Human
Review
Sources of Bias
Sample Bias
Measurement Bias
Label Bias
Historical Bias
Features Bias
Learning Bias
Evaluation Bias
Data ActionWorld
Feature Bias:
Learning Bias:
Evaluation Bias:
Feature „income“ is less meaningful for people with irregular income.
Model is not suitable for the data and amplifies the BIAS effect.
Data is not representative for small online loans (wrong benchmark).
AI/ML
Pipeline
Human
Review
Sources of Bias
Sample Bias
Measurement Bias
Label Bias
Historical Bias
Features Bias
Learning Bias
Evaluation Bias
Supportive Bias
Conservatism Bias
Zero-Risk Bias
Data ActionWorld
Supportive Bias:
Conservatism Bias:
Zero-Risk Bias:
Clerk overrules AI based on "experience" or "I know him, he's creditworthy".
Clerk cannot evaluate the risk of a new business idea and rejects loan.
Clerk only approves loans with zero risk because his bonus depends on it.
AI/ML
Pipeline
Human
Review
Sources of Bias
Sample Bias
Measurement Bias
Label Bias
Historical Bias
Features Bias
Learning Bias
Evaluation Bias
Action Bias
Intervention Bias
Deployment Bias
Data ActionWorld
Supportive Bias
Conservatism Bias
Zero-Risk Bias
Action Bias:
Intervention Bias:
Deployment Bias:
Not approving a credit can have different implications for different income groups.
Loan disbursement delay can cause problems.
Credit AI is also used to decide on account opening / account fees individually.
DEFINE
(equity)
DETECT
(bias)
MITIGATE
(impact)
MONITOR
(effect)
Lots of Bias ;-(
How do we make the overall system
and outcomes (more) fair ?
Rule #2„Define
fairness in
context of
your goal.“
There are many different ways of defining
what constitutes a fair machine learning
(ML) model.
Typically, an ML model can not be fair in all
aspects at the same time.
Define Fairness
ACCURACY Parity
DEMOGRAPHIC Parity
EQUAL Opportunity
EQUALIZED Odds
GROUP Unaware
Define Fairness
Income
POSITVE
grant loan
NEGATIVE
don‘t grant loan
Paid loan in fullDefaulted
Define Fairness
People who should
be approved and
are approved by
the model.
People who should
be approved and
are denied by
the model.
People who should
be denied and
are approved by
the model.
People who should
be denied and
are denied by
the model.
PREDICTED
approveddenied
denied
approved
TRUE
TPFN
FPTN
TP
TN
FP
FN
True Positive
True Negative
False Positive
False Negative
ACC = 7/10 = 70%
Group B
PREDICTED
TRUE
approve
deny
approve
deny
63
01
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
Accuracy Parity
P(Ŷ=Y|A=0) = P( Ŷ=Y|A=1)
ACC = 7/10 = 70%
Group B
20
35
PREDICTED
TRUE
approve
deny
approve
deny
63
01
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
Accuracy Parity
ACC = 7/10 = 70%
P(Ŷ=Y|A=0) = P( Ŷ=Y|A=1)
Group BGroup A
Paid loan in full
Defaulted
Income
Accuracy Parity
„What could
go wrong?“
„Lost opportunities
in group A vs.
high risk in
group B!“
P(Ŷ=Y|A=0) = P( Ŷ=Y|A=1)
Demographic Parity
PR = 4/8 = 50%
Group B
PREDICTED
TRUE
approve
deny
approve
deny
30
14
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
P(Ŷ|A=0) = P( Ŷ|A=1)
Demographic Parity
PR = 4/8 = 50%
Group B
10
34
PREDICTED
TRUE
approve
deny
approve
deny
30
14
PREDICTED
TRUE
approve
deny
approve
deny
Group A
PR = 4/8 = 50%
Paid loan in full
Defaulted
Income
P(Ŷ|A=0) = P( Ŷ|A=1)
Demographic Parity
Group BGroup A
Paid loan in full
Defaulted
Income
P(Ŷ|A=0) = P( Ŷ|A=1)
„What could
go wrong?“
„Oops! Do we have
a plan for this? “
TPR = 2/4 = 50%
Group B
PREDICTED
TRUE
approve
deny
approve
deny
22
14
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
Equal Opportunities
P(Ŷ=1|A=0, Y=1) = P( Ŷ=1|A=1, Y=1)
TPR = 2/4 = 50%
Group B
11
23
PREDICTED
TRUE
approve
deny
approve
deny
22
14
PREDICTED
TRUE
approve
deny
approve
deny
Group A
TPR = 1/2 = 50%
Paid loan in full
Defaulted
Income
Equal Opportunities
P(Ŷ=1|A=0, Y=1) = P( Ŷ=1|A=1, Y=1)
Group BGroup A
Paid loan in full
Defaulted
Income
Equal Opportunities
„What could
go wrong?“
TPR = 9/12 = 75%
„Wow! Lots of
False Positives!“
TPR = 3/4 = 75%
P(Ŷ=1|A=0, Y=1) = P( Ŷ=1|A=1, Y=1)
TPR = 2/4 = 50%
FPR = 1/4 = 25%
Group B
PREDICTED
TRUE
approve
deny
approve
deny
22
13
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
Equalized Odds
P(Ŷ=1|A=0, Y=y) = P( Ŷ=1|A=1, Y=y), y ∈ {0,1}
TPR = 2/4 = 50%
FPR = 1/4 = 25%
Group B
11
PREDICTED
TRUE
approve
deny
approve
deny
22
PREDICTED
TRUE
approve
deny
approve
deny
Group A
Paid loan in full
Defaulted
Income
Equalized Odds
TPR = 1/2 = 50%
FPR = 1/4 = 25%
P(Ŷ=1|A=0, Y=y) = P( Ŷ=1|A=1, Y=y), y ∈ {0,1}
1313
Equalized Odds„What could
go wrong?“
P(Ŷ=1|A=0, Y=y) = P( Ŷ=1|A=1, Y=y), y ∈ {0,1}
What is fair?
Accuracy Parity?
Demographic Parity?
Equal Opportunity?
Equalized Odds?
What is fair?
Accuracy Parity?
Demographic Parity?
Equal Opportunity?
Equalized Odds!
What is fair?
When to use Equalized Odds …
•strong emphasis onpredicting the positive outcome correctly
e.g.: correctly identifying who should get a loan drives profit
•strongly care about minimising costly False Positives
e.g.: reducing the grant of loans to people who would not be able to pay back
•the reward function of the model is not heavily compromised
e.g.: revenue or profit function for the business remains high
•the target variable isnot considered subjective
Mitigate Bias
Quantitative Approach: Making adjustment
to data, model, or predictions.
Non-Qualitative Approach: Step back from
the computer and look at bias / fairness with
a wider lens.
Quantitative Approach
Pre-Processing
In-Processing
Post-Processing
Modify data before processing
Modify algorithm that is trained
Modify prediction of model
Quantitative ApproachPre-Processing
Pre-Processing methods try to remove bias
in data before it is used to train the model.
Quantitative ApproachPre-Processing
#1: Handle unbalanced datasets
See also
for implementations
Quantitative ApproachPre-Processing
#1: … via Undersampling / Oversampling
Original datasetOriginal Dataset
Samples of
majority class
UndersamplingOversampling
copies or
synthetics of
minority class
Quantitative ApproachPre-Processing
#1: … via Undersampling / Oversampling
Original datasetOriginal Dataset
Samples of
majority class
UndersamplingOversampling
copies or
synthetics of
minority class
Oversampling with different MethodsOversampling Pitfalls
Quantitative ApproachPre-Processing
#1: .. via Grouping
White
Black
Asian-Pacific
Hispanic
White
Others
Quantitative ApproachPre-Processing
#2: Disperate impact removal
DIR
Quantitative ApproachPre-Processing
#3: Elimination of proxy variables
Fair
Representation
X, AZ
Ŷ
Â
Predictor
Adversary
negative
gradient
g
max I(X;Z)
min I(A;Z)
-- Rich Zemel --
https://www.cs.toronto.edu/~toni/Papers/icml-final.pdf
Quantitative ApproachIn-Processing
In-Processing methods work by adjusting the
objective to also consider fairness. This can be
done by changing the cost function or by
imposing constraints on the model.
Quantitative ApproachPost-Processing
Post-Processing methods work by changing
the predictions made by a model if indicated
to be unfair. E.g by setting different tresholds
for priviliged and unpriviliged groups.
Non-Quantitative Approach
Awareness of
the problem
Don‘t use
ML at all
Limit the use
of ML
Address the
root cause
Undestand
the model
Give
explanations
Give
opportunities
Support
team diversity
Non-Quantitative Approach
Awareness of
the problem
Don‘t use
ML at all
Limit the use
of ML
Address the
root cause
Undestand
the model
Give
explanations
Give
opportunities
Support
team diversity
Non-Quantitative Approach
Awareness of
the problem
Don‘t use
ML at all
Limit the use
of ML
Address the
root cause
Undestand
the model
Give
explanations
Give
opportunities
Support
team diversity
Non-Quantitative Approach
Awareness of
the problem
Don‘t use
ML at all
Limit the use
of ML
Address the
root cause
Undestand
the model
Give
opportunities
Give
explanation
Support
team diversity
AIF360
AIF360
AIF360
Rule #5„Monitor
for silent
Failures.“
Monitoring
What to look for?
•Performance Drift
Monitoring
Best-Case: Ground truth is immediately accessible
Monitoring
Not-so-good-Case: Ground truth is postponed in time
Monitoring
Worst-Case: Absent ground Truth
Monitoring
What to look for?
•Performance Drift*
•Input Data Drift
•Prediction Drift
•Concept Drift
*ground truth required