Soumitra Ghorai ----------------------------------- Enrollment No : MT20EE007 National Institute of Technology, Mizoram ----------------------------------- Date: 17 /03/2021 Support Vector Machine Of SOFT COMPUTING TECHNIQUES AND APPLICATIONS
TOPICS Introduction Separators Support Vector Support Vector Machine Linear SVM Nonlinear SVM Kernel Method KNN Method Conclusion References Acknowledgement 2
4 Weight Value and Bias Value Artificial Neural Network y x B W is changing. Fig. – 2A: Artificial Neural Network Fig. – 2B: Use of Separator Class “B” Class “A”
Separator Linear Separator 5 y x Best Hyperplane Fig. – 3: Linear Separator
Separator 6 y x Nonlinear Separator Best Hyperplane Fig. – 4: Nonlinear Separator
Support Vectors 7 y x Best Hyperplane Fig. – 5: Support Vectors Support Vectors
Support Vector Machine 8 Fig. – 6: Margin Width Support Vectors Margin Width y x
Support Vector Machine 9 Fig. – 7: Support Vector Machine Support Vectors b/||w|| ρ w T *x + b > 0 w T *x + b < 0 w T *x + b = 0
Support Vector Machine 10 Fig. – 7: Support Vector Machine ρ x1 x2 w T *x + b > 0 w T *x + b < 0
Linear SVM Mathematically 11 Margin Width, ρ = 2/||w|| Support Vectors: w T *x i + b ≤ – ρ/2 , if y i = – 1 w T *x i + b ≥ ρ/2 , if y i = 1 w*x 2 + b = 1 w*x 1 + b = – 1 (w*x 2 + b) – (w*x 1 + b)= 1 – (– 1) w*x 2 – w*x 1 = 2 ( )*(x 2 – x 1 ) =
Logically “Not Linearly Separable” 12 Fig. – 8: Not Linearly Separable y x ξ i ξ j ξ = ‘Slack’ Variables Soft Margin: Margin for which, Slack Variables ξ i can be added to allow misclassification of difficult or noisy examples. Hard Margin: So far we require all data points be classified correctly.
Mathematical Representation 13 k x i,k + b ≥ 1 – ξ i when, y i = +1 k x i,k + b ≤ – 1 + ξ i when, y i = – 1 y i ( w⃗.x i ⃗ + b) ≥ 1− ξ i for all 1≤i≤n, ξ i ≥0 and the optimization problem changes to, Minimize in ( w⃗,b ){ +C subject to y i ( w⃗.x ⃗ + b)≥1− ξ i for any i =1,…,n Soft Margin Mathematically Hard Margin Mathematically k x i,k + b ≥ 1 when, y i = +1 k x i,k + b ≤ – 1 when, y i = – 1 Minimize in ( w⃗,b ){ subject to y i ( w⃗.x ⃗ + b)≥1 for any i =1,…,n
Regularization Parameter 14 The parameter C is the Regularization Parameter . For C = Small, makes the constraints easy to ignore which leads to a large margin . For C = Large, allows the constraints hard to be ignored which leads to a small margin . For C = Infinite, all the constraints are enforced which leads to a hard margin .
15 Common Kernel Methods Gaussian Radial Basis Function Hyperbolic tangent Homogeneous Inhomogeneous k (x i ,x j ) = (x i .x j ) d Polynomial k( x i ,x j ) = ( x i .x j + 1 ) d k( x i ,x j ) = k( x i ,x j ) =
Kernel Method 16 Φ : x → φ ( x ) Fig. – 9A: Different Classes Before Applying Kernel Method Fig. – 9B: Different Classes After Applying Kernel Method
Mapping using Kernel Method 17 Φ : x → φ ( x ) Fig. – 10A: Low-D Space Fig. – 10B: High-D Space x b x a x a x b
Kernel Function Mathematically 18 If ϕ is the kernel function which maps x i to ϕ(x i ), the constraints change to y i (w⃗ .ϕ(x i )+b)≥1− for all 1≤i≤n, ≥0 and the optimization problem is Minimize in ( w⃗,b ){ +C subject to y i ( w⃗. ϕ (x i ) + b)≥1− ξ i for all 1≤i≤n, ≥0 We will not get into the solution of these optimization problems. The most common method used to solve these optimization problems is Convex Optimization.
Example of Kernel Method 19 Assume we measure x 1 , x 2 and we use the mapping: Φ :< x 1 , x 2 >→{ x 1 ,x 1 ,√2x 1 x 2 ,√2x 1 ,√2x 2 ,1} Consider the function: K( x,z ) = ( x.z + 1) Then, 𝜙(𝑥) t 𝜙(𝑧) = 𝑥 1 2 𝑧 1 2 + 𝑥 2 2 𝑧 2 2 + 2𝑥 1 𝑥 2 𝑧 1 𝑧 2 + 2𝑥 1 𝑧 1 + 2𝑥 2 𝑧 2 + 1 = ( 𝑥 1 𝑧 1 + 2𝑥 2 𝑧 2 + 1) 2 = ( x.z + 1) 2 = K( x,z )
20 K Nearest Neighbors Algorithm y x Fig. – 11: K Nearest Neighbors Algorithm Example: k = 6 Classes: & Find class for: ------------------------- Class of is:
Characteristics of SVM 21 High dimensional input space; Few irrelevant features (dense concept); Sparse document vectors (sparse instances); Text categorization problems are linearly separable; We can separate both linear and nonlinear space; It is less likely to result into overfitting; It’s complexity with number of dimensions (k) is linear; Can work with even smaller datasets; Not very suitable for nonlinear separation of humongous datasets in it’s original format – it’s complexity with number of records (n) in the dataset is n 3 .
Applications 22 SVM has been used successfully in many real-world problems 1. Face detection; 2. Text and hypertext categorization; 3. Classification of images; 4. Bioinformatics (Protein classification, Cancer classification); 5. Handwriting recognition; 6. Protein fold and remote homology detection; 7. Generalized predictive control (GPC).
Conclusion 23 For classification or regression problems we can use SVM. It uses Kernel trick to transform any data and then based on these transformations it finds an optimal boundary between the possible outputs. In high dimensional spaces also it effectively works with clear margin separation.
References Support Vector Machines Lecture by Dr. Geoffrey Hinton Support Vector Machines, Andrew Moore (CMU), Andrew Zisserman (Oxford), MingyueTan (UBC) Support Vector Machines, Aliferis & Tsamardinos , Vanderbilt University http://discover1.mc.vanderbilt.edu/discover/public/ml_tutorial_old/index.html RongJin , Language Technology Institute www.contrib.andrew.cmu.edu/~jin/ir_proj/svm.ppt Support Vector Machines Lecture by from University of Texas at Austin https://www.hackerearth.com/blog/ https://intellipaat.com/ http://www.mlfactor.com/preface.html 24
Acknowledgement I would like to thank our respected faculty Prof. (Dr.) Saibal Chatterjee Sir for giving me such a platform to share my knowledge. Last but not the least we are very thankful to all our classmates for standing with all their helping during the process. I feel very fortunate for getting this opportunity which helped me in gaining knowledge. 25