ptit, Apr. 1st, 2024
ptit, Apr. 1st, 2024
ptit, Apr. 1st, 2024
Deep Unsupervised Learning using
Nonequilibrium Thermodynamics
INTRODUCTION
PTIT, Apr. 1st, 2024
- Title: Deep Unsupervised Learning using Nonequilibrium Thermodynamics
- Authors: Jascha Dickstein, Eric Weiss, Surya Ganguli, Niru Maheswaranathan
- Published on November 18th, 2015
INTRODUCTION
PTIT, Apr. 1st, 2024
Objective
Background
Introduce a novel approach for modeling complex datasets using highly flexible families of
probability distributions while ensuring that learning, sampling, inference, and evaluation
processes remain computationally tractable.
Involves addressing a significant challenge in the field of machine learning, particularly within the
domain of unsupervised learning
Problem
The difficulty in balancing the flexibility and tractability of models for complex data,
which the discussed research seeks to overcome through its innovative methodology.
Laplace Distribution: The Laplace
distribution, also known as the double
exponential distribution, is similar to the
Gaussian distribution but with heavier tails.
Gaussian (Normal) Distribution: The Gaussian
distribution, often called the Normal distribution, is
perhaps the most well-known probability
distribution, characterized by its bell-shaped curve.
However, these models are unable to aptly describe structure in rich datasets
The article has come up with a new way
1. Diffusion probabilistic models
Variational Autoencoders (VAEs)
and Generalized Adversarial
Networks (GANs)
Deep learning generative models have been more exposed in the recent years with the
raise of some images or videos generation models capable of outstanding results
DALL-E 2 (Open AI)
Imagen (Google)
Make-A-Scene (Meta )
Diffusion probabilistic models (DPMs)
Learn the reverse process of a well
defined stochastic process that
progressively destroy information
Taking data from our complex target
distribution and bringing them to a
simple Gaussian distribution.
Reverse process is then expected to take the path in the opposite direction, taking
gaussian noise as an input and generating data from the distribution of interest.
Extreme flexibility
Exact sampling
Easy multiplication
Cheaply evaluated
The distribution of images of size 100*100 is a very complex high dimensional
distribution that can be progressively turned into a very simple distribution of same
dimensionality: 100*100 isotropic noise.
The obvious downside of DPMs is that sampling requires multiple steps
meaning that the generative process will be longer than it is for GANs or VAEs
Wake-sleep
Algorithm
First introduced in the mid-90s by Hinton and Dayan
Proposing a method to train inference and
generative probabilistic models in opposition to
each other.
The initial concept of the wake-sleep algorithm has evolved into a foundational
element of modern machine learning, illustrating the progress and potential of
generative models in understanding and synthesizing high-dimensional data.
Related Techniques
Reweighted Wake-Sleep
(Bornschein & Bengio, 2015)
Generative Stochastic Networks
(Bengio & Thibodeau-Laufer, 2013; Yao et al 2014)
Neural Autoregressive Distribution Estimators
(NDAE) (Larochelle & Murray, 2011)
Adversarial Networks
(Goodfellow et al., 2014)
Bijective Deterministic Maps (Rippel
& Adams, 2013; Dinh et al., 2014)
Stochastic Inverses for Bayesian
Networks (Stuhlmuller et al., 2013)
Mixtures of Conditional Gaussian Scale
Mixtures (MCGSMs) (Theis et al., 2012)
Develops extensions and
improved learning rules for the
original wake-sleep algorithm
Optimizing the training process
through reweighted sampling.
These networks train a Markov kernel
to match its equilibrium distribution to
the data distribution
Creating an efficient stochastic
generative process.
Decomposes a joint distribution
into a sequence of tractable
conditional distributions over
each dimension.
Trained against a classifier that
attempts to distinguish generated
samples from true data
Representing a novel
approach in model training.
Learns deterministic maps to a
latent representation with a
factorial density function.
Optimizing the
transformation from latent
space to data space.
Introduces the learning of
stochastic inverses for
Bayesian networks
Offering a new
perspective on data
complexity modeling.
Describes a dataset using Gaussian
scale mixtures with parameters
dependent on a sequence of causal
neighborhoods
Enhancing data distribution
representation.
The intersection of ideas from physics and machine learning, particularly in the development
and understanding of generative models, highlights a rich avenue of interdisciplinary research
Jarzynski equality
Annealed Importance Sampling
(AIS)
Langevin dynamics
Kolmogorov forward and
backward equations
Offering both theoretical insights and practical algorithms for machine learning
Main issue
THE INHERENT TRADE-OFF
BETWEEN FLEXIBILITY
AND TRACTABILITY IN
DEEP GENERATIVE
MODELS FOR
UNSUPERVISED LEARNING
DIFFUSION PROBABILISTIC
MODEL WHICH IS INSPRIED
BY NON-EQUILIBRIUM
THERMODYNAMICS
STATISTICAL PHYSICS
ptit, Apr. 1st, 2024
ptit, Apr. 1st, 2024
Forward Diffusion
Reverse Diffusion
Adding noise to an image until it becomes
pure static.
The diffusion model learns this process
mathematically, like a series of steps that
corrupt the data.
Learns to reverse the corruption process
It starts with pure noise and step-by-step denoises it to
recreate a realistic image.
Capture Complexity
Maintain Efficiency
ptit, Apr. 1st, 2024
Forward Trajectory
Taking a structured data distribution and
systematically adding noise to it, gradually
transforming it into a simpler, more tractable
distribution, typically a Gaussian.
Guided by a Markov chain and is akin to
simulating a physical process where a
system's structure gets destroyed over time.
π(y): The target distribution we
want to convert the original data
into.
Probability of a step in the diffusion process
from state x(t) to state x(t-1).
ptit, Apr. 1st, 2024
π(xT): The target distribution we
converted the original data into
Reverse Trajectory
Effectively learning to remove the noise and
restore structure from the simpler distribution
back to the original complex distribution.
The reverse process, when properly trained,
acts as a generative model, allowing one to
sample new data points that mimic the
structure of the original dataset.
p(x(t−1)∣x(t):Probability of a step in the diffusion
process from state x(t−1) to state x(t).
ptit, Apr. 1st, 2024
Visualize the processes
The first (top row) of
the three rows in the
figure shows the "time
slices" from the
forward trajectory.
q(x(0...T))
The data distribution (left)
undergoes a Gaussian
diffusion process, gradually
transforming it into an
identity-covariance
Gaussian distribution
(right).
The middle line shows the
corresponding "time slices"
from the trained reverse
trajectory. p(x(0...T)).
The images in the middle row trace
the reverse process, from noise (in
t=T) to the original explicit structure
(in t=0), simulates information
recovery.
The author has modified it to provide a formula for the probability model.
From the above model we can conclude For infinitesimal β the forward and reverse distribution over
trajectories can be made identical.
ptit, Apr. 1st, 2024
(b) Holdout data
corrupted with Gaussian
noise of variance
Example
(a) Example holdout data
(similar to training data)
(c) Denoised images ,
generated by sampling
from the posterior
distribution over denoised
images conditioned on the
images in (b)
(d)Samples generated by
the diffusion model
ptit, Apr. 1st, 2024
Major achievement
The utilization of concepts from nonequilibrium thermodynamics to inform the
development of deep generative models, particularly through the iterative process of
diffusion and reverse diffusion to model complex data distributions.
The specific methodology and the comprehensive application to deep unsupervised
learning as detailed in this paper represent a novel contribution at the time of its
publication.
This includes the combination of flexibility and tractability in modeling, the depth of the
models with thousands of layers or time steps, and the practical application to a wide
range of datasets with high efficiency in sampling and evaluation.
Major
achievement
Innovative Framework
Inspired by Physics
Deep Generative Models
with Thousands of Layers
High Log-Likelihood Models
for Various Datasets
ptit, Apr. 1st, 2024
Major
achievement
Efficient Evaluation
and Sampling
ptit, Apr. 1st, 2024
Handling Posterior
Computations
ptit, Apr. 1st, 2024
Innovative Framework Inspired by Physics
In simpler terms, they simulate a process where they first add noise to data (forward diffusion process)
to a point where its structure is completely lost (similar to reaching a thermal equilibrium in physics)
Then, they learn how to reverse this process (reverse diffusion process), effectively learning to
regenerate the original data from the noise
Allows the creation of a flexible and powerful model that can learn complex data distributions and
generate new data samples that are similar to the original dataset.
ptit, Apr. 1st, 2024
Deep Generative Models with Thousands of Layers
Traditional models often struggle with depth due to computational and practical limitations.
The authors leverage the iterative nature of their diffusion process to construct generative models that
can effectively have thousands of layers or time steps.
This depth significantly enhances the model's ability to capture and reproduce complex data patterns,
a substantial improvement over shallower models.
ptit, Apr. 1st, 2024
High Log-Likelihood Models for Various Datasets
The quality of a probabilistic model can be evaluated based on its log-likelihood, which measures
how well it predicts unseen data
The authors demonstrate their model's effectiveness across several datasets, including images (MNIST) and CIFAR-10
The dataset is divided into 50,000 training images and 10,000 testing images.
CIFAR-10 poses a more challenging problem than MNIST due to its color images and more complex objects
Notably, they achieve state-of-the-art performance on the dead leaves dataset, indicative of the model's capability to
understand and recreate the statistical properties of natural images.
ptit, Apr. 1st, 2024
Efficient Evaluation and Sampling
A practical challenge in probabilistic modeling is efficiently evaluating the likelihood of new data and
sampling new data points from the model.
Their method allows for rapid evaluation of probabilities and efficient sampling from the model
This efficiency is particularly beneficial for applications requiring real-time data generation or analysis.
ptit, Apr. 1st, 2024
Handling Posterior Computations
It's necessary to compute posterior distributions, which involve combining the learned
model with new observations
The framework provides a straightforward method to perform these computations.
The authors showcase this capability through examples of image denoising and inpainting
demonstrating the model's practical utility in tasks that require inference based on partial or noisy data
.
Drawback
Crucial of data
A large amount of training data is crucial
for a model to achieve optimal
performance. Without sufficient data, the
model may suffer from underfitting
--> This can result in poor performance
when the model is applied to new, unseen
data.
Computational resources-intensive
Training a diffusion model can take a significant amount of time,
especially for complex models with high-resolution outputs. Running
these models often necessitates powerful hardware like GPUs or TPUs
to achieve reasonable processing speeds.
--> This can be a barrier for researchers and
individuals with limited access to such
resources.
Summary
Presents a groundbreaking approach to deep
unsupervised learning that combines theoretical
physics concepts with advanced computational
techniques.
Achievements in creating deep, flexible models
capable of high-quality data generation and
efficient inference mark a significant
advancement in the field of machine learning.
Application
TEXT TO IMAGE TEXT TO VIDEO
IMAGE TO IMAGE
Applications in Image Recognition
Deep unsupervised learning has made significant contributions to image recognition tasks.
By training deep neural networks on large datasets, these models can learn to extract high-level features and
representations from images, allowing them to accurately classify and recognize objects, scenes, and patterns
.This has revolutionized fields like autonomous driving, facial recognition, and medical imaging, where accurate
and efficient image recognition is critical.
ptit, Apr. 1st, 2024
ptit, Apr. 1st, 2024
Text to Image
Transforms written text into detailed, contextually accurate visual art
Encoding textual prompts using advanced language understanding models (like transformers) to
capture the nuanced semantics
Through a forward diffusion process, the model gradually adds noise to these encodings, moving
towards a high-entropy state
In the reverse diffusion stage, it systematically removes noise, using learned parameters to guide
the transformation of abstract textual features into coherent and detailed images, step-by-step
ptit, Apr. 1st, 2024
Text to Video
Converts textual narratives into dynamic video sequences, embodying the narrative and emotional tones of the text.
Encoding the text using embeddings or transformer-based models to understand its semantic and emotional content
For the video, it initially represents frames as a sequence of noisy images
The model then undergoes a forward diffusion process, here it adds structured noise based on the text encoding,
progressively moving towards a distribution that represents the video content
In reverse diffusion, it refines these noisy frames into a coherent video sequence that aligns with the original text,
ensuring each frame transitions smoothly to reflect the narrative flow
ptit, Apr. 1st, 2024
Image to Image
Translates images from one domain to another, such as style changes or colorization, while preserving semantic content.
Conditioning on the source image and aiming to reach a target domain representation, both encoded in formats
that the model can interpret (e.g., pixel values or embeddings).
The forward diffusion phase involves gradually introducing noise into the source image, effectively distorting it
towards a more generic state that retains some encoded target domain attributes.
In the reverse phase, it applies learned transformations to this noised image, meticulously removing the noise
and adjusting the image's attributes to align with the target domain characteristics.
This iterative refinement continues until the image convincingly represents the target domain, ensuring visual
coherence and semantic integrity throughout the transformation.
Text encoding: DALL-E 2 uses CLIP, a language model, to encode the text prompt into a semantic vector
space.
Image encoding: A diffusion model called the "prior" predicts the encoding for the target image based on
the text encoding.
Image generation: A modified version of GLIDE, another diffusion model, decodes the image encoding
into a realistic image reflecting the text prompt.
ptit, Apr. 1st, 2024
HOW DALL-E 2 generate realistic and creative
images from text prompts
→ Large language models like CLIP allow it to understand relationships between text and images. Diffusion
models like the prior and GLIDE enable controllable image generation.