Tariku Bokila SVMA Presentation.pptx ddd

TarikuBokila1 8 views 28 slides Jun 14, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

riiiiiclcllllllllllllllllllllllllnnnnkkkkkkkkkkkkk


Slide Content

Support Vector Machine Algorithm (SVMA) Tariku Bokila 26 th of May 2024

Outline of presentation Introduction to Support Vector Machines ( SVM) Basic Concepts of SVM Linear SVM Kernel Trick and Non-linear SVM Applications of SVM Comparison with Other Methods Summary References

Objective of the session Explain the basic concepts of Support Vector Machines, including separating hyperplanes, margins, and support vectors . Discuss the formulation of linear SVM and the use of the kernel trick to handle non-linear data through various kernel functions . Highlight Practical Applications of SVM Compare SVM with Other Machine Learning Methods:

Introduction to SVMA Classifying or Grouping Things : •Imagine you have a bunch of things, like fruits, and you want to sort them into two groups, say apples and oranges. But these fruits are all mixed up, and you can't tell them apart by just looking.

Finding the Best Line: SVM helps you draw a line between these fruits, but not just any line. It tries to find the best line that separates apples from oranges. This line is called the "decision boundary" or "hyperplane." Creating Space: The SVM doesn't just draw any line; it tries to create as much space as possible between the apples and oranges. This space is called the "margin." The bigger the margin, the better the line because it means there's less chance of making mistakes.

Picking the Right Fruits: SVM also looks for the most important fruits to help draw the line. These special fruits are called " support vectors ." They're the ones closest to the decision boundary and play a crucial role in defining it.

Handling Curvy Lines: Sometimes , the fruits aren't neatly separated by a straight line. In such cases, SVM can use tricks to transform the space, making it possible to draw curved lines or more complex shapes to separate them. If the fruit data is too mixed up to be separated by a simple line, SVM has tricks up its sleeve. It can transform the data into a higher dimension where it's easier to separate. This allows SVM to handle even the most complex data sets.

Making Predictions : Once the line is drawn, you can use it to guess the type of new fruits you haven't seen before. If a new fruit falls on one side of the line, SVM predicts it belongs to one group; if it's on the other side, it predicts it belongs to the other group. So , in simple terms, SVM is like a smart tool that helps you draw the best line to separate different things into groups, making it easier to understand and classify them.

What are support vectors? In SVM support vectors are the data points that lie closest to the decision boundary or hyperplane . They are the critical data points that define the margin, which is the distance between the hyperplane and the nearest data points from each class. Support vectors play a crucial role in determining the optimal decision boundary and maximizing the margin. The term "Machine" in SVM refers to the algorithm itself Combining these concepts, the name " Support Vector Machine" indicates an algorithm that constructs a decision boundary (hyperplane) based on the support vectors to classify data points into different classes . By maximizing the margin and utilizing the support vectors, SVM aims to create an effective and efficient machine learning model for classification tasks .

What is a hyperplane? As a simple example, for a classification task with only two features, you can think of a hyperplane as a line that linearly separates and classifies a set of data . Intuitively, the further from the hyperplane our data points lie, the more confident we are that they have been correctly classified. We therefore want our data points to be as far away from the hyperplane as possible, while still being on the correct side of it. So when new testing data are added, whatever side of the hyperplane it lands will decide the class that we assign to it.

How do we find the right hyperplane? How do we best segregate the two classes within the data? The distance between the hyperplane and the nearest data point from either set is known as the margin . The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly. There will never be any data point inside the margin. 13

But what happens when there is no clear hyperplane? Data are rarely ever as clean as our simple example above. A dataset will often look more like the jumbled balls below which represent a linearly non separable dataset. In order to classify a dataset like the one presented here, it’s necessary to move away from a 2d view of the data to a 3d view. Explaining this is easiest with another simplified example. Imagine that our two sets of colored pictures here are sitting on a sheet and this sheet is lifted suddenly, launching the balls into the air. While the balls are up in the air, you use the sheet to separate them. This ‘lifting’ of the balls represents the mapping of data into a higher dimension. This is known as kernelling . 14

Because we are now in three dimensions, our hyperplane can no longer be a line. It must now be a plane as shown in the example above. The idea is that the data will continue to be mapped into higher and higher dimensions until a hyperplane can be formed to segregate it. 15

How does it work? How can we identify the right hyper-plane? You need to remember a thumb rule to identify the right hyper-plane: “ Select the hyper-plane which segregates the two classes better”.  16

Identify the right hyperplane (Scenario-1):  Here, we have three hyperplanes (A, B and C). Now, identify the right hyperplane to classify star and circle . Hyperplane “B” has excellently performed this job . 17

Identify the right hyperplane (Scenario-2): Here, we have three hyperplanes (A, B and C) and all are segregating the classes well. Now, how can we identify the right hyperplane? Here, maximizing the distances between nearest data point (either class) and hyperplane will help us to decide the right hyperplane. 18

Scenario-2 This distance is called as  Margin . Let’s look at the below snapshot: We can see that the margin for hyperplane C is high as compared to both A and B. Hence, we name the right hyperplane as C. Another lightning reason for selecting the hyperplane with higher margin is robustness. If we select a hyperplane having low margin then there is high chance of missclassification . 19

Identify the right hyperplane (Scenario-3) Some of you may have selected the hyper-plane  B  as it has higher margin compared to  A.  But, here is the catch, SVM selects the hyperplane which classifies the classes accurately prior to maximizing margin. Here, hyperplane B has a classification error and A has classified all correctly. Therefore, the right hyperplane is  A. 20

Can we classify two classes (Scenario-4)? We are unable  to segregate the two classes using a straight line, as one of star lies in the territory of other ( circle) class as an outlier.  One star at other end is like an outlier for star class. SVM has a feature to ignore outliers and find the hyperplane that has maximum margin. Hence, we can say, SVM is robust to outliers. 21

Find the hyperplane to segregate to classes (Scenario-5) In the scenario below, we can’t have linear hyperplane between the two classes, so how does SVM classify these two classes? Till now, we have only looked at the linear hyperplane. SVM can solve this problem. It solves this problem by introducing additional feature. Here, we will add a new feature z=x 2 +y 2 .   22

Scenario-5 Now, let’s plot the data points on axis x and z : In above plot, points to consider are: All values for z would be positive always because z is the squared sum of both x and y In the original plot, red circles appear close to the origin of x and y axes, leading to lower value of z and star relatively away from the origin result to higher value of z. 23

Application of support vector machine algorithm Disease Diagnosis and Prediction SVMs are used for disease diagnosis and prediction based on medical data such as patient demographics, clinical measurements, and diagnostic tests. They can classify patients into different disease categories (e.g., cancer vs. non-cancer) or predict the likelihood of developing a specific disease (e.g., diabetes, cardiovascular disease) based on risk factors. Medical Image Analysis: SVMs are applied in medical image analysis tasks such as tumor detection, segmentation, and classification from MRI, CT, and PET scans. They can learn to distinguish between different tissues, identify abnormalities, and assist radiologists in early disease detection and treatment planning.

Difference between SVMA and Logistic Regression SVMA aims to find the hyperplane that maximizes the margin between different classes in the feature space. The decision boundary is determined by the support vectors, which are the data points closest to the hyperplane SVM aims to maximize the margin between different classes while minimizing the classification error SVM outputs class labels directly based on which side of the decision boundary a new data point falls. It does not provide probabilistic outputs. Logistic Regression (LR) LR models the probability that a given input belongs to a particular class using a logistic function (sigmoid function). The decision boundary is a linear function of the input features, and it separates the classes based on the predicted probabilities A ims to maximize the likelihood function (or minimize the logistic loss) of the observed data under the assumed logistic distribution LR outputs probabilities of class membership for each class. These probabilities can be converted into class labels using a threshold (e.g., 0.5), where data points with probabilities above the threshold are assigned to one class and below to the other class.

Summary Support Vector Machine Algorithm (SVMA) is a powerful supervised machine learning technique used for both classification and regression tasks . It works by finding the best boundary (hyperplane) that separates different classes in the data. The main goal of SVMA is to maximize the margin between different classes, ensuring that the closest data points from each class (called support vectors) are as far apart as possible. In cases where data is not linearly separable, SVMA employs a technique called the kernel trick. This transforms the data into a higher-dimensional space where it can be separated by a hyperplane. Common kernel functions include linear, polynomial, and radial basis function (RBF). Key Features of SVMA: Maximizes margin: Ensures the widest possible separation between classes. Uses support vectors: Relies on a few critical data points to define the decision boundary. Kernel trick: Transforms data to higher dimensions for better separability . Versatile applications: Effective in tasks like classification, regression, and anomaly detection.

References Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Bottou , L., & Lin, C.-J. (2007). Support Vector Machines: Theory and Applications. Springer. Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-167. Cortes, C., & Vapnik , V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. Cristianini , N., & Shawe -Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Hastie, T., Tibshirani , R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. Joachims , T. (1998). Text categorization with support vector machines: Learning with many relevant features. European Conference on Machine Learning. Springer. Schölkopf , B., & Smola , A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. Shawe -Taylor, J., & Cristianini , N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. Smola , A. J., & Schölkopf , B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. Vapnik , V. N. (1998). Statistical Learning Theory. Wiley .

Thank You