A Talk on Deep Causal Representation Learning

YusufBrima 88 views 26 slides Jul 02, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

This presentation explores the emerging field of causal representation learning, which aims to combine causal modeling with deep learning techniques.
It discusses:
Differences between natural and machine intelligence
Challenges in current AI approaches
The importance of causal reasoning and discove...


Slide Content

Causal Representation Learning
Yusuf Brima
CV Colloquium, Osnabrueck University
March 13, 2024

Natural Intelligence
●Systematic Generalization
●Causal Reasoning
●Planning/Action
●etc

●~86 billion neurons in human
brain
●~20 watts of energy usage
●Ability to zero/few-shot learning
●Sparse connectivity and activation
Sidebar
2

Machine Intelligence
●Relies on the i.i.d. assumption of data
distributions
●Requires extensive dataset (labelled and/or
unlabeled) mostly observational to learn
statistical dependencies
●Huge capacity models (e.g., foundation
models)

●~1.7 trillion parameters for GPT-4
●~$100M to train
●Ability to zero/few-shot, in-context
learn
●Claimed to have passed the Turing
Test
●Hallucination is a big challenge
●Toxicity is another challenge
Sidebar
3

Challenges of Machine Intelligence
●Robustness
○Even though data augmentation, pre-training,
self-supervision, and architectures priors have been
invented
●Learning Reusable Mechanisms
○Humans and other animals learn intuitive physics and
psychology of the world
●Huge Capacity and Compute (e.g., LLMs and
VLLMs)
●Interpretability
●etc
4
https://llm-attacks.org/

Missing Link
●Causal discovery and factorization
of underlying independent
mechanisms

This is the cornerstone of the scientific
method
Sidebar
5

Levels of Modelling Systems
6

Statistical Modelling
as inputs to a model s.t.
as responses/labels to a model s.t.
With the i.i.d. assumption, i.e.,
For regression
For classification
is assumed to exist and unchanging
7

Statistical Modelling
Probabilistic
Model
Observations &
Outcomes
Statistical Learning
Probabilistic Reasoning
8

From Statistics to Causality

If X and Y are statistically dependent, there exists a variable Z that
causally influences both and explains all the dependence thus
making them independent conditioned on Z
Reichenbach’s Common Cause Principle
frequency of storks
human birth rate
If storks bring babies
If babies attract storks
If there is some other unobserved
variable that causes both, we have
E.g., of confounders: land area,
urbanization, seasonality, etc.
9

Structural Equations
Structural equation for A as a cause of B
B
A U
DAG representation
deterministic functional form
stochastic functional form
10

Structural Equations
B
A U
C
D
U
U
A more complex dependency structure
would be:

DAG representation
11

Structural Equations
Given a set of observables ,
assuming each being a result of structural
assignment:
The noise terms ensure that we
can represent the
we can achieve causal factorization
And the set of noises are jointly
independent, thus:
12

Structural Equations and Causal Factorization
R
A simple illustration
C
W S
Rain
Car Wash
Wet
Slip
Causal Factorization
Structural Causal Equations
13

Causal Learning and Reasoning
The conceptual basis of statistical learning is a joint distribution
where often one of the is a response variable
And we make assumptions about function classes used to approximate, say, a
regression
Causal learning considers a richer class of assumptions, and seeks to exploit
the fact that the joint distribution possesses a causal conditional factorization
Once a causal model is available, we can draw conclusions on the effect of
interventions, counterfactuals and potential outcomes. Statistical models only allow
to reason about the outcome of i.i.d. experiments
14

Causal Modelling
Causal Model
Observations &
Outcomes including
changes &
interventions
Causal Learning
Causal Reasoning
15

The Causal Discovery Process
Data Collection
Observation (and interventional)
Causal Discovery
constraint-based methods (e.g., Peter-Clark algorithm) or
score-based methods (e.g., Greedy Equivalence Search)
Evaluation
causal reasoning, such as counterfactual prediction,
transfer learning, or causal effect estimation
Data Collection
Observation (and interventional)
Causal Discovery
constraint-based methods (e.g., PC algorithm) or
score-based methods (e.g., Greedy Equivalence Search)
Evaluation
causal reasoning, such as counterfactual prediction,
transfer learning, or causal effect estimation
(Disentangled)
Representation Learning
Conventional approach Deep Learning-based
16

Deep Causal Representation Learning Framework
Unknown underlying
causal graph
Data space (observational and
interventional)
Learned latent causal variables
(and DAG)
Learned causal graph
17

Deep Causal Representation Learning Framework
From the causal conditional factorization
If
The result is a disentangled factorization
Causal factorization
Disentangled factorization
18
For disentanglement

Application areas: Healthcare
●Learning causal representations of medical data (e.g., electronic health records,
medical images) to discover causal relationships between risk factors, diseases,
and treatments.
●Predicting counterfactual outcomes under different interventions (e.g., medication
changes, lifestyle modifications) for personalized treatment recommendations.
●Improving the robustness and fairness of medical diagnostic models by
accounting for causal confounders.
19

Application areas: Computer Vision
● Disentangling causal factors in images (e.g., shape, texture, lighting) for better
transfer learning and domain adaptation.
●Learning causal representations for robust object recognition and detection under
varying conditions.
●Generating counterfactual samples by intervening on specific causal factors (e.g.,
changing the object's color, viewing angle or pose).
20

Application areas: Speech Processing
● Source Separation
○Learn representations that causally factorize speaker identity, speech content, audio sources,
acoustic environments
○Enable applications like speaker diarization, audio source separation, controllable speech synthesis
●Robustness and Generalization
○Account for causal factors: accents, recording conditions, acoustic environments
○Improve robustness and generalization for speech recognition, acoustic scene analysis, transfer
learning
●Editing and Manipulation
○Intervene on causal factors to generate counterfactual speech/audio
○Applications: audio editing, speech conversion (change identity/accent), controllable synthesis
21

Research Directions
●Causal Discovery from Complex, High-Dimensional Data
○Developing methods for high-dimensional, multimodal data (images, genomic, video, multi-sensor)
●Scalable & Efficient Learning Algorithms
●Benchmarking & Comprehensive Evaluation
○Reliable performance assessment across tasks/domains
●Advancing Theory & Foundations
○Connections to causality, invariance and robustness
●Applications & Real-World Impact
○Novel applications in healthcare, robotics, computer vision, climate studies, science
●…
22

Summary
● Causal Representation Learning is a potential path in bridging the gap between
natural and machine intelligence
●Leverages causal modeling and representation learning
●Promises enhanced robustness, generalization, interpretability
●Emerging applications in healthcare, computer vision, speech, and beyond
●Key Challenges:
○Complex data and unobserved confounders
○Scalability and evaluation
○Theoretical foundations
○Real-world application
●It is an exciting interdisciplinary research direction at the intersection of causality
and representation learning
23

Research Question
Can we learn causal variables and causal structure from sensory signals with weak
supervision?
24
*How beneficial causal representation really is for robustness?

Next Steps
*Demonstrating usefulness on downstream tasks
Identifiability theorems have been shown already
25

Thank you for your attention
26