Linear Discriminant Analysis of machine learning.pptx
HazharAhmed1
19 views
29 slides
Mar 08, 2025
Slide 1 of 29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
About This Presentation
Linear Discriminant Analysis of machine learning
Size: 2.48 MB
Language: en
Added: Mar 08, 2025
Slides: 29 pages
Slide Content
Firat University Linear Discriminant Analysis Student name : Aydil Jomaa Bapir Supervisor: Doç. Dr. BURHAN ERGEN M.C.s in Computer Engineering Student number : 191129114
Linear Discriminant Analysis LDA Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs. It is a very common technique used for supervised classification problems Linear Discriminant Analysis is a dimensionality reduction technique used as a preprocessing step in Machine Learning and pattern classification applications. The main goal of dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent features by transforming the features from higher dimensional space to a space with lower dimensions. What is Linear Discriminant Analysis ?
Linear Discriminant Analysis LDA Lets see what it does…. We are going to take an example showing us why we might need Linear discriminant analyses, and then we will talk about the details of how it is work. Imagine that we hav e this cancer drug, and that cancer drug works great for some people but for other people it just make them feel worse, we want to figure out who to give the drug to. We want to give it to people who it’s going to help, but we don’t want to give it to people that it might harm. Lets take a gene expression as an example and have some explanation to understand more about LDA Here is an example usin g one gene to decide who gets the drug and who doesn’t = The drug works = The drug does not works Fewer Transcript more Transcript We’ve got a number line and on the left side we’ve got a fewer transcription and on the right side we’ve got more transcription. The dots represent individual people, the green dots are people who the drug works for them and the red dots represent people whom the drug just make them feel worse
Linear Discriminant Analysis LDA We can see that for the most part the drug works for people with low transcription of Gene X = The drug works = The drug does not works Fewer Transcript more Transcript And for the most part the drug does not work for people with high transcription of Gene X In the middle we see that there is overlap and that there is no obvious cutoff for who to give the drug to In summary gene X does an okay job at telling us who should take the drug and who shouldn’t Can we do better ? Lets see ….
Linear Discriminant Analysis LDA What if we use more than one gene to make a decision ? = The drug works = The drug does not works Here is an example of using two genes to decide who gets the drug and who doesn’t More Y Transcript More X Transcript On the X-axis we have gene X, and in the Y-axis we have the gene Y Now that we have 2 gene, we can draw a line that separates the two categories , the Green with the drug works and the Red with the drug doesn’t work. And we can see that using 2 genes does a better job separating the two categories than just using 1 gene.
More Y Transcript More X Transcript Linear Discriminant Analysis LDA = The drug works = The drug does not works How ever its not perfect, with using 3 genes be even better Here I’ve got an example where we’re trying to use 3 genes to decide who gets the drug and who doesn’t Gene Z is on the Z-axis which represent depth, so imagine a line going through you computer and inter the wall behind it, and the big circles or the big samples are the ones that are closer to you and the smaller circles that represent smaller samples are the ones that are farther away and those are along Z-axis When we have three dimensions, we use a plane to try to separate the two categories
Linear Discriminant Analysis LDA What if we need four ( or more ) genes to separate the two categories The first problem is: We can’t draw 4-D graph or a more – dimensional graph We went into the same problem we are going to PCA. PCA reduces dimensions by focusing variables with the most variation , this is useful when plotting data with a lot of dimensions or a lot of variables on to a simple XY plot However we are not interested with the variables with the most variation , instead we are interested of maximizing the separability between the two groups so that we can make the best decisions. LDA is like PCA reduce dimensions however it focuses on maximizing the separability among the categories
Linear Discriminant Analysis LDA Lest repeat that emphasize the point
Linear Discriminant Analysis LDA here we're going to start with a super simple example, we're just going to try to reduce a two-dimensional graph to a one-dimensional graph that is to say we want to take this two-dimensional graph aka and XY graph and reduce it to a one-dimensional graph aka a number line in such a way that maximizes the separability of the two categories 2-D graph (aka X/Y Graph) 1-D graph (aka number line)
Linear Discriminant Analysis LDA what's the best way to reduce the dimensions? - well to answer that let’s start by looking at a bad way and understanding what its flaws are one bad option would be to ignore gene Y and if we did that we would just project the data down on to the x axis Gene Y this is bad because it ignores the useful information that gene Y provides projecting the genes onto the y axis ie ignoring the gene X isn't any better.
Linear Discriminant Analysis LDA LDA provides a better way…. here we're going to try to reduce this two-dimensional graph to a 1d graph using LDA. Gene Y LDA uses the information from both genes to create a new access. LDA uses both genes to create new axis… and it projects the data onto this new axis in a way to maximize the separation of the two categories.
Linear Discriminant Analysis LDA So the general concept here is that LDA creates a new axis and it projects the data onto that new access in a way that maximizes the separation of the two categories Gene Y Now let's look at the nitty-gritty details and figure out how LDA does that, how does LDA create the new axis, the new axis is created according to two criteria that are considered simultaneously. µ µ
Linear Discriminant Analysis LDA the first criteria is that once the data is projected onto the new axis we want to maximize the distance between the two means. - on the left side we see the scatter around the green dots µ µ Maximize distance between means here we have a green mu character which is a Greek character representing the mean for the green category and a red mu representing the mean for the red category the second criteria is that we want to minimize the variation which LDA calls scatter and is represented by squared within each category. - on the right side we see the scatter around the red dots. S 2 S 2
Linear Discriminant Analysis LDA And this is how we consider those two criteria simultaneously we have a ratio of the difference between the two mean squared over the sum of the scatter, the numerator is squared because we don't know if the Green mu is going to be larger than the red view or the red gene going to be larger than the green meter we don't want that number to be negative we want it to be positive so whatever it is µ µ Maximize distance between means S 2 S 2 whether it's negative or positive begin with we square it and it becomes a positive number now ideally the numerator would be very large there'd be a big difference or a big distance between the two means and ideally the denominator would be very small in that the scatter the variation of the data around each mean in each category would be small.
Linear Discriminant Analysis LDA To make the thing simple later on in this discussion let's call the difference between the two means D. D for distance so we can replace the difference between the two means with D Now lets take an example to understand why both the distance between the two means and the scatter are important. here’s a new data set we still just have two categories green and red, in this case there's a little bit of overlap on the y-axis but lots of spread along the x-axis.
Linear Discriminant Analysis LDA and the result is we'll have a lot of overlap in the middle this isn't great separation if we only maximize the distance between the means then we'll get something like this. however if we optimize the distance between the means and the scatter then we get nice separation here the means are a little closer to each other than they were in the graph on the top the scatter is much less.
Linear Discriminant Analysis LDA so if we optimize both criteria at the same time we can get good separation so what if we have more than two genes that is to say what if we have more than two dimensions ? the good news is that the process is the same we create a new access that maximizes the distance between the means for the two categories while minimizing the scatter. .
Linear Discriminant Analysis LDA Similarity between PCA and LDA both methods rank the new axes that they create in order of importance PC1 (the first new access that PCA creates) accounts for the most variation in the data . PC2 (the second new axis) does the second best job and this goes on and on for the number of axes that are created from the data . LD1 (the first new axis that LDA creates) accounts for the most variation between the categories. LD2 ( to the second new access) does the second best job etc. both methods let you dig in and see which genes are driving the new axes
Linear Discriminant Analysis LDA summary LDA is like PCA both try to reduce dimensions PCA does this by looking at the genes with the most variation in contrast LDA tries to maximize the separation of known categories.
Linear Discriminant Analysis LDA Mathematics Behind LDA Now lets know more about LDA and the mathematics behind LDA and then we are going to solve an example problem mathematically which gives us more understanding about it.
Linear Discriminant Analysis LDA Mathematics Behind LDA Now lets take an example and workout step by step Lets take a 2-D dataset C1= x1 = (x1,x2) = { (4,1) , (2,4) , (2,3) , (3,6) , (4,4) } C2= x2 = (x1,x2) = { (9,10) , (6,8) , (9,5) , (8,7) , (10,8) } Step 1: Compute with-in class scatter matrix ( SW ) Representing Class 1 Representing Class 2 SW = S1 + S2 S1 : the covariance matrix of class 1 S2 : the covariance matrix of class 2
Linear Discriminant Analysis LDA Mathematics Behind LDA So now lets find the covariance Matrix of each class µ1 is the mean of the class c1 , which is computed by : µ1 = { , µ1 = [3.00 8.60] µ2 = { , C1= x1 = (x1,x2) = { (4,1) , (2,4) , (2,3) , (3,6) , (4,4) } C2= x2 = (x1,x2) = { (9,10) , (6,8) , (9,5) , (8,7) , (10,8) } µ2 = [8.40 7.60] µ2 is the mean of the class c2 , which is computed by :
Linear Discriminant Analysis LDA Mathematics Behind LDA finding the covariance Matrix of class 1 C1= x1 = (x1,x2) = { (4,1) , (2,4) , (2,3) , (3,6) , (4,4) } C2= x2 = (x1,x2) = { (9,10) , (6,8) , (9,5) , (8,7) , (10,8) } S1= µ1 = [3.00 3.60] [x1 - 1] = 1 -1 -1 0 1 -2.6 0.4 -0.6 2.4 0.4 Now, for each x , we are going to calculate , so we will have 5 such matrix We will go one by one …. 1 -2.6 1 -2.6 1 -2.6 = -2.6 6.76 First matrix
Linear Discriminant Analysis LDA Mathematics Behind LDA Similarly for all columns we can find the product of the transposes and we get that 5 matrixes -1 0.4 -1 0.4 1 -0.4 = -0.4 0.16 Second matrix - 1 -0.6 -1 -0.6 1 0.6 = 0.6 0.36 Third matrix 0 2.4 0 2.4 0 0 = 0 5.76 Forth matrix 1 0.4 1 0.4 1 0.4 = 0.4 0.16 Fifth matrix Adding ( 1 ) + ( 2 ) + ( 3 ) + ( 4 ) + ( 5 ) , We get covariance Matrix S1 S1 = 0.8 -0.4 -0.4 2.6 1 -2.6 1 -2.6 First matrix 1 -2.6 -2.6 6.76
Linear Discriminant Analysis LDA Mathematics Behind LDA Now, we do the same steps to find the covariance Matrix for Class 2 the covariance Matrix for Class 2 is given by “ S2 = 1.84 -0.04 -0.04 2.64 µ2 = [8.40 7.60] , SW = S1 + S2 SW = 2.46 -0.44 -0.44 5.28 Step 2: Compute between class scatter matrix SB - 5.4 -4 -5.4 - 4 29.16 21.4 = 21.4 16.00 S1 = 0.8 -0.4 -0.4 2.6 SB = SB = µ1 = [3.00 3.60]
Linear Discriminant Analysis LDA Mathematics Behind LDA Step 3: find the best LDA projection vector Similar to principle component analysis we find this using eigen vector having the largest eigen value. | * SB – λ | = 0 8.81 5.08 3.76 - λ = 0 Solving λ , we get λ = 15.65 * SBV = λ V ………………….. (a) In order to know how to find the eigenvector there is a good video that explain the calculation of eigenvector and eigen value very clearly, you can find the video in this Link Note : projection vector
Linear Discriminant Analysis LDA Mathematics Behind LDA Step 3: find the best LDA projection vector * SBV = λ V Substituting in (a) , we get 8.81 5.08 3.76 V V2 = 15.65 V V2 we get = V V2 0.91 0.39
Linear Discriminant Analysis LDA Mathematics Behind LDA Step 4: Dimension Reduction Y = X Input data Samples Projection vector Projection vector corresponding to highest Eigen value We can see as a data, the dimensionality is reduced as well as the discrimination between the classes also visualized So the red is representing class 1 with X1, C1 class. and the green one represent the other class when it is projected in the projection vector. separation between classes is also maximized .
Linear Discriminant Analysis LDA Thanx for your watching