Fairness and Bias in AI Ethics and Explainability

shiwanigupta 0 views 35 slides Oct 15, 2025
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

The slides talk about fairness and bias in AI ethics and explainability. Types of fairness, sources of Bias, Exploratory Data Analysis Process. Few case studies are discussed.


Slide Content

Responsible AI and Ethics
Module 2
Fairness and Bias
Dr. ShiwaniGupta

Learning Outcomes
Students will be able to:
•Identify different sources of bias in AI systems.
•Perform EDA to detect dataset limitations.
•Apply fairness interventions (pre/in/post processing).
•Distinguish between group fairness, individual fairness, and counterfactual fairness.
•Critically analyze real-world case studies on fairness in AI.

Introduction to Fairness & Bias in AI
Why fairness matters in AI & ML
1. Ethical & Moral Responsibility
AI systems often make or support decisions in hiring, lending, healthcare, education, and law.
If biased, they may unfairly disadvantage groupsbased on gender, race, age, or socio-economic status.
Ensuring fairness aligns with values of justice, equality, and human dignity.
2. Legal & Regulatory Compliance
Many governments have regulations (e.g., GDPR, AI Act in EU, India’s DPDP Act) that require non-discriminationin automated
systems.
Unfair models can lead to lawsuits, fines, or bans on deployment.
3. Trust & Adoption
Fair AI builds trust among users, customers, and society.
If people feel an AI system is biased (e.g., only recommending loans to certain groups), they will reject and resist adoption.
4. Business & Societal Impact
Biased AI can harm brand reputation and reduce customer base.
Fair systems, on the other hand, expand access and opportunities→ e.g., fair loan models can bring financial inclusion.
5. Technical Robustness
Bias is often linked with imbalanced data or hidden correlations.
By auditing fairness, we also improve data quality, generalization, and robustnessof models.
Example:
•A recruitment AI trained mostly on past resumes of male candidates may favor men for tech jobs.
•By enforcing fairness (e.g., gender-neutral embeddings, reweighing), the system ensures skills, not gender, drive hiring decisions.

Real-world failures
1. Amazon Hiring Tool (Recruitment Bias)
Amazon developed an AI tool to screen resumesfor technical jobs.
The system was trained on historical resumes, which were mostly from men
(reflecting gender imbalance in the tech industry).
As a result, the model downgraded resumescontaining the word “women’s” (e.g.,
“women’s chess club captain”) and favored male-dominated patterns.
Failure:It unintentionally discriminated against women candidates.
Lesson:Biased training data → biased predictions.

Real-world failures
2. COMPAS Tool (Criminal Justice Bias)
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions)
was used in the US to predict recidivism risk(likelihood of reoffending).
Investigations (ProPublica, 2016) showed it was biased against Black defendants:
False positives(predicting high risk when they were not) were more common for Black
individuals.
False negatives(predicting low risk when they were high risk) were more common for White
individuals.
Failure:Led to unfair sentencing and parole decisions.
Lesson:Lack of fairness testing can perpetuate systemic injustice.

Real-world failures
3. Facial Recognition Bias
Multiple studies (MIT Media Lab, NIST) showed commercial facial recognition
systems have higher error rates for women and people with darker skin tones.
Example: Some systems misclassified darker-skinned women up to 34% of the
time, while near-perfect accuracy for lighter-skinned men.
Failure:Biased algorithms used in law enforcement can result in wrongful arrests
and reinforce inequality.
Lesson:Poorly diverse training datasets lead to disproportionate harmin critical
applications.

Types of fairness: procedural, distributive, outcome-based
1. Procedural Fairness
Focuses on the processes and rulesused by the AI system.
Ensures that decisions are made using consistent, transparent, and unbiased procedures.
Example: In a loan approval system, every applicant should be evaluated with the same set
of features and algorithms, without hidden criteria.
2. Distributive Fairness
Focuses on the allocation of resources or benefitsamong groups.
Ensures that the AI system distributes opportunities, risks, or outcomes equitably.
Example: An AI system for job recommendations should not disproportionately favor one
gender or community over others.
3. Outcome-Based Fairness
Focuses on the final resultsof AI decisions.
Ensures that outcomes do not unfairly disadvantage or advantage certain individuals or
groups.
Example: A healthcare diagnostic AI should not have systematically worse accuracy for
underrepresented populations.

“Should AI be allowed to decide parole?”
https://www.youtube.com/watch?v=6G0vQ7kwz14&utm_source=chatgpt.com
"AI in the courtroom –Fair judge or biased machine?“
Imagine a judge replaced by an AI system. Data about past crimes, behavior in prison, and re-
offense risk are fed into an algorithm. The AI then decides: Should the prisoner be granted
parole?
Supporters argue that AI can reduce human bias, ensure consistency, and process thousands of
cases quickly.
But here’s the concern: AI learns from historical data. If past decisions were biased, the AI could
repeat —or even amplify —unfairness. For example, studies show some parole algorithms
unfairly rated minorities as “high risk.”
Pros:Consistency, speed, reduced human subjectivity.
Cons:Hidden biases, lack of transparency, accountability issues.
So, should AI decide parole? Maybe as a tool to assist judges, but not as the final authority.
Because when freedom is at stake, fairness must come before efficiency.
"AI should support justice —not replace it."

Sources of Bias
1 Data Collection Bias(sampling bias, historical bias)
Meaning: Bias introduced during how data is gathered.
Types:
•Sampling bias–collected data doesn’t represent the real population.
•Historical bias–data reflects outdated or unfair social patterns.
Example: If a facial recognition dataset contains mostly lighter-skinned faces, the model will perform
poorly on darker-skinned faces.
2 Measurement Bias(proxy variables, flawed labels)
Meaning: Bias from how features/labels are measured or defined.
Causes:
•Proxy variables–using an indirect feature that doesn’t truly represent the target.
•Flawed labels–errors or inconsistencies in ground truth.
Example: Using zip codes as a proxy for incomecan unfairly penalize certain communities because it
reflects segregation rather than true wealth.
3 Representation Bias(under-represented groups)
•Meaning: Some groups or categories are under-represented or over-represented in the dataset.
•Example: A health diagnostic model trained mostly on male patient data may fail to detect diseases in
women.

Sources of Bias
4 Algorithmic Bias(model assumptions, inductive bias)
Meaning: Bias built into the algorithm itself.
Causes:
•Model assumptions–e.g., linear models oversimplify relationships.
5 Inductive bias–the mathematical/structural preference of an algorithm.
Example: A credit scoring model designed to maximize accuracy may systematically deny loans
to minorities because it inherits biased correlations.
6 Evaluation Bias(benchmark datasets, skewed test sets)
Meaning: Bias from how the model is tested and validated.
Causes:
•Using benchmark datasets that don’t match real-world diversity.
•Skewed or small test sets.
Example: An NLP model benchmarked only on English Wikipedia may fail on dialects or low-
resource languages.

Exploratory Data Analysis (EDA) for Bias Detection
1. Handling Missing Values
Missing data can bias your analysis, so it's important to identify and handle it.
Techniques:
•Check for missing values:
import pandas as pd
df.isnull().sum()
df.info()
Handle missing values:
•Drop rows/columns: df.dropna(), df.dropna(axis=1)
•Impute:
•Numerical: mean/median (df['age'].fillna(df['age'].mean()))
•Categorical: mode (df['gender'].fillna(df['gender'].mode()[0]))
•Advanced: KNN or iterative imputation (via sklearn.impute)

Exploratory Data Analysis (EDA) for Bias Detection
2. Checking for Imbalance
Especially important for classification tasks.
Techniques:
Count class distribution:
df['target'].value_counts()
df['target'].value_counts(normalize=True) # percentages
Visualize imbalance:
import seabornas sns
sns.countplot(x='target', data=df)
Address imbalance (if needed):
Oversampling (SMOTE)
Undersampling
Class weighting in models

•categoricaldataasrectangularbarswiththeheightof barsproportionaltothevalue
theyrepresent
•example,dataontheheightof personsbeinggroupedas ‘Tall’,‘Medium’,‘Short’etc.
•usedtocomparebetweenvaluesofdifferent categoriesin thedata
•categoricaldatais nothingbuta groupingof dataintodifferentlogicalgroups
•Typesinclude:Simple, Horizontal,GroupedandStacked
https://www.machinelearningplus.co
m/plots/bar-plot-in-python/
8

Exploratory Data Analysis (EDA) for Bias Detection
3. Distribution Plots
Understanding the distribution of variables helps in preprocessing and feature engineering.
Variables like gender, age, region, income group:
Categorical: bar plots, count plots
Numerical: histograms, density plots
Examples:
# Histogram for numerical variable
df['age'].hist(bins=20)
# Count plot for categorical variable
sns.countplot(x='gender', data=df)

•visualizethefrequency distributionofnumericarraybysplittingittosmallequal-sizedbins.
•Ahistogramisdrawn on largearrays.Itcomputes the frequencydistributionon an arrayand
makes ahistogramoutof it.
•Typesincludebasic,grouped,Densitycurve,Facets
https://www.machinelearningplus.com/plots/matplotlib-histogram-python-examples/
9

Exploratory Data Analysis (EDA) for Bias Detection
4. Correlation with Outcome Variable
Correlation helps identify features most related to the target.
For numerical variables:
df.corr()['target'].sort_values(ascending=False)
•For categorical variables:
•Chi-square test (scipy.stats.chi2_contingency)
•Group-wise mean/median:

Exploratory Data Analysis (EDA) for Bias Detection
5. Visualizations
Key for spotting trends, outliers, and patterns.
Common plots:
Histograms→ distribution of numerical features
Boxplots→ outliers and spread
sns.boxplot(x='gender', y='income', data=df)
Group-wise statistics→ mean/median of features by target or category
df.groupby('income_group')['target'].mean().plot(kind='bar')
Heatmapfor correlations:
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

•Correlationbetweenthevariablesindicateshowthevariablesareinter-related
•CorrelationisnotCausation
1.Eachcellinthegridrepresentsthevalue ofthecorrelationcoefficient
betweentwovariables.
2.Itisasquareandsymmetricmatrix.
3.Alldiagonal elementsare1.
4.Theaxesticksdenotethefeatureeachofthemrepresents.
5.Alargepositivevalue (nearto1.0) indicatesastrongpositive correlation.
6.Alargenegativevalue (near to -1.0) indicatesa strong negative
correlation.
7.Avaluenearto0 (bothpositiveor negative)indicatestheabsence ofany
correlation between the two variables, and hence those variables are
independentof eachother.
8.Eachcellin theabove matrixis alsorepresentedbyshadesofacolor.
Here darker shades of the color indicate smaller values while brighter
shadescorrespondto larger values(near to1).
9.Thisscaleis givenwiththehelpofa color-bar ontheright side of the
plot.

12
probability/summarizing-quantitative-data/box-whisker-
plots/a/box-plot-review
•visualizehow a givendata (variable)is distributedusing quartiles
•showstheminimum,maximum,median,first quartileand thirdquartileinthedataset
•method tographicallyshowthe spreadof a numericalvariablethroughquartiles
•Middle50%ofalldatapoints:IQR= Q3-Q1
•upperandlowerwhisker mark1.5timestheIQR
from thetop(andbottom)ofthebox
•pointsthatlieoutsidethewhiskers,i.e.1.5xIQR
in both directions are generally considered as
outliers(< Q1-1.5*IQR | >Q3+1.5*IQR)
•Typesincludebasic,notched,violinplot
https://www.khanacademy.org/math/statistics-

Limitations of a Dataset1. Small Sample Size Problem
Description:When a dataset has very few examples, the model may fail to generalize to unseen data.
Impact:
Overfitting → model memorizes training data instead of learning patterns.
High variance → predictions fluctuate widely for new inputs.
Mitigation:
Data augmentation
Transfer learning
Cross-validation
2. Skewed Class Distribution
Description:One or more classes dominate the dataset, while others are underrepresented.
Impact:
•Model becomes biased towards majority classes.
•Poor performance on minority classes (e.g., fraud detection).
Mitigation:
•Oversampling minority classes (SMOTE)
•Undersamplingmajority classes
•Class weighting in model loss function

Limitations of a Dataset
3. Out-of-Distribution (OOD) Samples
Description:Test or real-world data differs from the training data distribution.
Impact: Model may make unpredictable or incorrect predictions., Reduces reliability in production systems.
Example: Training on clear daytime traffic images, testing on foggy night images.
Mitigation: Domain adaptation, Data augmentation to simulate diverse conditions
4. Synthetic vs. Real-World Data
Description:Synthetic datasets are artificially generated, while real-world datasets are collected from actual
observations.
Impact: Synthetic data may not capture all nuances → reduced generalization., Real-world data may contain noise or
missing values.
Example: Synthetic facial images for training AI vs. real-world faces with varying lighting, occlusion, or ethnicity.
Mitigation: Combine synthetic and real data, Validate models on real-world datasets

Limitations of a Dataset
5. Ethical Implications of Dataset Selection
Description:Dataset biases can propagate harmful stereotypes or discriminatory behaviorin AI models.
Examples:
ImageNet→ had inappropriate, offensive labels in some categories.
Face recognition→ models biased towards certain races or genders due to underrepresentation.
Impact:
Reinforces societal biases
Potential legal and reputational risks
Mitigation:
Diverse and representative data collection
Bias auditing of datasets
Transparency about dataset limitations

Fairness Techniques & Frameworks
In-processing & Post-processing Techniques
1. Pre-processing Techniques
Pre-processingmethods modify the data before trainingto reduce bias.
Techniques:
1.Resampling
1.Adjust dataset to balance class representation.
2.Examples: oversampling minority groups, undersamplingmajority groups.
2.Reweighting
1.Assign different weights to samples based on group membership to mitigate bias.
2.Ensures that the model gives fair consideration to underrepresented groups.
3.Data Augmentation
1.Generate synthetic examples for underrepresented groups.
2.Useful in image, text, or speech datasets to improve diversity.
Python Example (conceptual):
from imblearn.over_samplingimport SMOTE
smote = SMOTE()
X_res, y_res= smote.fit_resample(X, y)

Fairness Techniques & Frameworks
In-processing & Post-processing Techniques
2. In-processing Techniques
In-processingmethods modify the learning algorithm or loss functionto incorporate fairness during training.
Techniques:
Adding fairness constraints to loss function
Example: Penalize predictions that violate demographic parity or equal opportunity.
Adversarial debiasing
Train a model to predict the target while an adversary tries to predict sensitive attributes (e.g., gender, race).
Model learns representations that are predictive but independent of protected attributes.
Python Example (conceptual):
# pseudo-code
loss_total= loss_prediction+ lambda * loss_adversary
# lambda controls the trade-off between accuracy and fairness

Fairness Techniques & Frameworks
In-processing & Post-processing Techniques
3. Post-processing Techniques
Post-processingmethods modify the model outputsafter training to satisfy fairness criteria.
Techniques:
1.Calibration
Adjust predicted probabilities so they are accurate for different demographic groups.
2.Equalized Odds
Adjust decisions so that true positive and false positive rates are similar across groups.
3.Rejection Option Classification
For uncertain predictions, defer the decision or request human review, especially for sensitive cases.
Python Example (using fairlearn):
from fairlearn.postprocessingimport ThresholdOptimizer
postprocessor = ThresholdOptimizer(estimator=model, constraints="equalized_odds")
postprocessor.fit(X_train, y_train, sensitive_features=gender)
y_pred= postprocessor.predict(X_test, sensitive_features=gender)

Stage Techniques Goal
Pre-processing
Resampling, Reweighting, Data
Augmentation
Reduce bias in data before
training
In-processing
Fairness-constrained loss,
Adversarial debiasing
Learn fair representations
during training
Post-processing
Calibration, Equalized odds,
Rejection option
Adjust predictions to satisfy
fairness

Fairness Definitions (Group vs. Individual)
1. Group Fairness
Group fairnessfocuses on ensuring that entire demographic groupsreceive fair treatment, regardless of individual differences.
Key Concepts:
Demographic Parity (Statistical Parity)
The decision rate should be independent of sensitive attributes(e.g., gender, race).
Formula:
P(Y^=1∣A=0)=P(Y^=1∣A=1) where A is the protected attribute, Y^ is predicted outcome.
Example: Same loan approval rate for men and women.
Equalized Odds
Predictions should have equal true positive and false positive rates across groups.
Formula:
P(Y^=1∣Y=y,A=0)=P(Y^=1∣Y=y,A=1),y∈{0,1}
Example: For a recidivism model, the proportion of correctly predicted re-offenders is equal across races.
Predictive Parity (Positive Predictive Value Parity)
The probability that a positive prediction is correct should be the same across groups.
Formula:
P(Y=1∣Y^=1,A=0)=P(Y=1∣Y^=1,A=1)
Example: If a loan is approved, the chance of repayment should be similar for all groups.
Trade-offs:
Not all group fairness metrics can be satisfied simultaneously; you often need to choose based on context and regulatory requirements.

Fairness Definitions (Group vs. Individual)
2. Individual Fairness
Individual fairnessensures that similar individuals receive similar treatment, regardless of group membership.
Key Concepts:
•Principle:
“Similar individuals should be treated similarly.”
•Requires defining a similarity metricbetween individuals.
Technique:
Metric Learning for Fairness
•Learn a distance function d(xi,xj)) such that individuals close in the metric space get similar predictions.
•Example: Two applicants with similar income, credit score, and debt levels should receive similar loan
decisions, even if they belong to different demographic groups.
Challenges:
•Defining a fair similarity metric is non-trivial.
•May conflict with group fairness constraints.

Fairness Definitions (Group vs. Individual)
Fairness Type Definition Key Metric / Example
Group Fairness
Groups with shared
attributes treated fairly
Demographic parity,
Equalized odds, Predictive
parity
Individual Fairness
Similar individuals treated
similarly
Metric learning, similarity-
based constraints

Counterfactual Fairness
Idea:
A decision is fair if it would not changefor an individual had their sensitive attribute(e.g.,
gender, race) been different.
Focuses on the individual level, combining ideas from causal inferencewith fairness.
Key Question:
“Would the decision for this individual be different if their sensitive attribute had been different,
all else equal?”
Example:
Predicting loan approval:
Applicant is female and gets rejected.
Counterfactual test: If this same applicant were male, would the decision still be rejection?
If yes → decision is counterfactually fair.
If no → potential bias exists.

Counterfactual Fairness
Formal Definition Using Causal Graphs
Uses Structural Causal Models (SCM)to model relationships between variables.
Components:
Sensitive attribute:A (e.g., gender)
Other features:X (e.g., income, credit score)
Outcome:Y (e.g., loan approval)
Counterfactual Fairness Criterion:
YA←a(U)=YA←a′(U)
YA←a(U) → predicted outcome for individual under actual attribute a
YA←a′(U) → predicted outcome if sensitive attribute is changed to a′
U → unobserved background variables
The prediction is counterfactually fairif it does not changefor different values of A.

Counterfactual Fairness
Steps to Implement Counterfactual Fairness
Model causal relationshipsbetween sensitive attributes, features, and outcomes.
Generate counterfactual scenariosby changing the sensitive attribute.
Compare predicted outcomesunder original vs. counterfactual attribute.
Adjust model or featuresif disparities are found.
Example Workflow (Loan Approval):
Build SCM: Gender → Income, Credit Score → Loan Approval
Compute counterfactual outcome: Change Gender from Female → Male
Check if model output changes; if it does → adjust model (e.g., remove sensitive attribute influence, use fairness-
aware algorithm).
Key Insight:
•Counterfactual fairness is stronger than group fairnessbecause it tests
individual-level fairness, not just statistics across groups.
•Requires a causal model, not just correlation-based methods.

Wrap-up, Case Studies, and Discussion
1. Autonomous Cars: The Trolley Problem in AI
Scenario:
An autonomous car must choose between two outcomes in an imminent accident:
Hit a pedestrian
Swerve and risk passengers’ lives
Ethical Questions:
Should the car minimize total harm(utilitarian approach)?
Should it prioritize passengers over pedestrians?
Who decides the moral rules—the manufacturer, government, or society?
Discussion Prompt:
Can we design a set of rules that is universally acceptable for all autonomous vehicles?

Wrap-up, Case Studies, and Discussion
2. Healthcare Robots: Priority in Treatment
Scenario:
AI-driven healthcare robot allocates limited resources (e.g., ICU beds, organ transplants).
Ethical Questions:
Should priority be given to patients with higher survival probability?
Should social factors (age, occupation, dependents) influence decisions?
How do we avoid discrimination against marginalized groups?
Discussion Prompt:
How can AI balance efficiencywith equityin healthcare?

Wrap-up, Case Studies, and Discussion
3. Open Discussion: Is Complete Fairness Possible?
Points to consider:
Conflicting fairness definitions:Demographic parity vs. equalized odds vs. counterfactual
fairness
Trade-offs with accuracy:Fairness constraints can reduce predictive performance.
Societal biases:AI often reflects biases in historical data or societal norms.
Ethical pluralism:Different cultures may prioritize fairness differently.
Discussion Questions:
Can AI ever be truly fair if human society is biased?
Should AI adopt one fairness metricuniversally, or context-dependent rules?
Is transparencyin AI decisions as important as fairness itself?