Em algorithm

sreedevibalasubraman 185 views 29 slides Jun 07, 2021
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

ML


Slide Content

Probabilistic Models with Latent Variables Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1

Density Estimation Problem Learning from unlabeled data Unsupervised learning, density estimation Empirical distribution typically has multiple modes   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2

Density Estimation Problem Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3 From http://courses.ee.sun.ac.za/Pattern_Recognition_813 From http://yulearning.blogspot.co.uk

Density Estimation Problem Conv. composition of unimodal pdf’s: multimodal pdf where Physical interpretation Sub populations   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4

Latent Variables Introduce new variable for each Latent / hidden: not observed in the data Probabilistic interpretation Mixing weights: Mixture densities:   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5

Generative Mixture Model For recovers mixture distribution   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6       Plate Notation

Tasks in a Mixture Model Inference Parameter Estimation Find parameters that e.g. maximize likelihood Does not decouple according to classes Non convex, many local minima   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7

Example: Gaussian Mixture Model Model For Inference Soft-max function   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8

Example: Gaussian Mixture Model Loglikelihood Which training instance comes from which component? No closed form solution for maximizing Possibility 1: Gradient descent etc Possibility 2: Expectation Maximization   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9

Expectation Maximization Algorithm Observation: Know values of easy to maximize Key idea: iterative updates Given parameter estimates, “infer” all variables Given inferred variables, maximize wrt parameters Questions Does this converge? What does this maximize?   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10

Expectation Maximization Algorithm Complete loglikelihood Problem: not known Possible solution: Replace w/ conditional expectation Expected complete loglikelihood Wrt where are the current parameters   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11

Expectation Maximization Algorithm Where Compare with likelihood for generative classifier   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12

Expectation Maximization Algorithm Expectation Step Update based on current parameters Maximization Step Maximize wrt parameters Overall algorithm Initialize all latent variables Iterate until convergence M Step E Step   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13

Example: EM for GMM E Step remains the step for all mixture models M Step Compare with generative classifier   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14

Analysis of EM Algorithm Expected complete LL is a lower bound on LL EM iteratively maximizes this lower bound Converges to a local maximum of the loglikelihood Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15

Bayesian / MAP Estimation EM overfits Possible to perform MAP instead of MLE in M-step EM is partially Bayesian Posterior distribution over latent variables Point estimate over parameters Fully Bayesian approach is called Variational Bayes Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16

(Lloyd’s) K Means Algorithm Hard EM for Gaussian Mixture Model Point estimate of parameters (as usual) Point estimate of latent variables Spherical Gaussian mixture components Where Most popular “hard” clustering algorithm   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17

K Means Problem Given , find k “means” and data assignments such that Note: is k-dimensional binary vector   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18

Model selection: Choosing K for GMM Cross validation Plot likelihood on training set and validation set for increasing values of k Likelihood on training set keeps improving Likelihood on validation set drops after “optimal” k Does not work for k-means! Why? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19

Principal Component Analysis: Motivation Dimensionality reduction Reduces #parameters to estimate Data often resides in much lower dimension, e.g., on a line in a 3D space Provides “understanding” Mixture models very restricted Latent variables restricted to small discrete set Can we “relax” the latent variable? Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20

Classical PCA: Motivation Revisit K-means W: matrix containing means Z: matrix containing cluster membership vectors How can we relax Z and W?   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21

Classical PCA: Problem X : Arbitrary Z of size , Orthonormal W of size   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22

Classical PCA: Optimal Solution Empirical covariance matrix Scaled and centered data where contains L Eigen vectors for the L largest Eigen values of Alternative solution via Singular Value Decomposition (SVD) W contains the “principal components” that capture the largest variance in the data   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23

Probabilistic PCA Generative model forced to be diagonal Latent linear models Factor Analysis Special Case: PCA with   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24

Visualization of Generative Process Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25 From Bishop, PRML

Relationship with Gaussian Density Why does need to be restricted? Intermediate low rank parameterization of Gaussian covariance matrix between full rank and diagonal Compare #parameters   Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26

EM for PCA: Rod and Springs Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27 From Bishop, PRML

Advantages of EM Simpler than gradient methods w/ constraints Handles missing data Easy path for handling more complex models Not always the fastest method Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28

Summary of Latent Variable Models Learning from unlabeled data Latent variables Discrete: Clustering / Mixture models ; GMM Continuous: Dimensionality reduction ; PCA Summary / “Understanding” of data Expectation Maximization Algorithm Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29
Tags