Boltzmann Machines in Deep learning and machine learning also used for training the models.
venkatasaisumanth74
436 views
17 slides
Mar 27, 2024
Slide 1 of 17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
About This Presentation
Boltzmann machines
Size: 1.68 MB
Language: en
Added: Mar 27, 2024
Slides: 17 pages
Slide Content
Boltzmann Machines
Boltzmann machines
Boltzmann Machines BMs were introduced to learn prob distrib over binary vectors Boltzmann machine over a d-dimensional binary random vector x ∈ {0, 1}d Boltzmann machine is an energy-based model meaning it define the joint probability distribution using an energy function Energy based probabilistic models define a probability distribution as: where
Introduction Decompose the units x into two subsets: the visible units v and the latent (or hidden) units h. The energy function becomes Energy(state) = - (Sum of weights connecting visible and hidden nodes) - (Sum of weights connecting visible nodes to themselves) - (Sum of weights connecting hidden nodes to themselves) Energy of a particular state is combination of visible and hidden node values
Components Nodes (Neurons): Nodes organized into two layers: visible and hidden. Each node represents a binary state, which means it can be in one of two states, typically denoted as 0 and 1. Connections (Edges): weighted connections (edges) between the nodes. weights represent the strength of the connections between nodes. Connections exist between visible-visible, hidden-hidden, and visible-hidden nodes. Energy Function: The Boltzmann Machine defines an energy function that depends on the states of the nodes and the weights of the connections. The energy function determines how well the network's current state agrees with the data it has learned.
Training and Applications Training: Learning in Boltzmann Machines involves adjusting the weights of the connections to minimize the energy of the system. Energy is a way to quantify how well the network's current configuration aligns with the patterns present in the training data. Applications: Boltzmann Machines and RBMs have been used in various machine learning tasks, including collaborative filtering, feature learning, dimensionality reduction, and generative modeling. Example: (m1,m2,m3,m4,m5) movie ratings with like or dislike User 1: [1, 0, 1, 0, 0] User 2: [0, 1, 0, 1, 1] User 3: [1, 1, 1, 0, 1] BM with visible nodes representing movies and hidden nodes representing user preferences. The connections between nodes have weights that need to be learned during training.
Training Example: User 1: [1, 0, 1, 0, 0] User 2: [0, 1, 0, 1, 1] User 3: [1, 1, 1, 0, 1] For ex:, if we have a state where User 1's preferences match Movie 1 and Movie 3, and User 2's preferences match Movie 2, the energy for this state would be lower or higher? Its lower because it aligns with the observed data. The network uses Gibbs sampling to explore different states and adjusts the weights of the connections to minimize the energy of the system using Contrastive Divergence . Lower-energy states are more likely to be sampled because they represent better matches to the training data. once trained, the BM can be used for generative modeling. Given a user's partial preferences (e.g., some movie ratings), the network can generate a completion of the preferences that aligns with the training data
Restricted Boltzmann Machine RBMs are undirected probabilistic graphical models containing a layer of observable variables and a single layer of latent variables. A classic architecture called the RBM assumes a bipartite graph over the visible units and hidden units A bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets V and H, that is every edge connects a vertex in V to one in H . The hidden units learn more abstract features of the data . RBM
RBM
Inference
DBM vs DBN
DBN issues Inference in a deep belief network is intractable due to explaining away effect within each directed layer the interaction between the two hidden layers that have undirected connections. maximizing the standard evidence lower bound on the log-likelihood is also intractable . Exact inference can be intractable, leading to the need for approximate inference techniques.
DBN Issues Explaining Away Effect within Each Directed Layer : Each layer in a DBN typically contains directed connections (it's a Bayesian network). During inference in a layer, all possible configurations of hidden units within that layer are considered. This leads to a combinatorial explosion in the number of configurations, making exact inference intractable. The explaining away effect occurs when the states of hidden units in the layer become correlated, making it challenging to compute the posterior probabilities efficiently.
DBN Issues Interaction Between Two Hidden Layers with Undirected Connections : DBNs consists multiple hidden layers with both directed and undirected connections. The interaction between these layers can create complex dependencies that are challenging to model and compute. The undirected connections often result in high-dimensional, continuous distributions that are difficult to handle analytically. Maximizing the Standard Evidence Lower Bound (ELBO) on the Log-Likelihood Is Intractable : train DBNs using techniques like variational inference or expectation-maximization ( EM) to maximize the ELBO on the log-likelihood. However , computing the ELBO often involves integrals over high-dimensional spaces, leading to computationally intractable, especially for deep networks with many layers.
Trained DBN The trained DBN may be used directly as a generative model M ost of the interest in DBNs arose from their ability to improve classification models T ake the weights from the DBN and use them to define an MLP This additional training of the MLP is an example of discriminative fine-tuning.