Machine Learning Techniques - Linear Model.pptx

GoodReads1 213 views 51 slides Oct 04, 2024
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

This presentation covers the Multi Layer Perceptron Network and SVM model


Slide Content

Machine Learning Techniques Dr. M. Lilly Florence Professor Adhiyamaan College of Engineering (Autonomous) Hosur , Tamilnadu

Linear Models Multi-layer Perceptron Going Forwards Going Backwards: Back Propagation Error Multi-layer Perceptron in Practice Examples of using the MLP Overview – Deriving Back-Propagation Radial Basis Functions and Splines – Concepts, RBF Network Curse of Dimensionality Interpolations and Basis Functions Support Vector Machines https://www.kaggle.com/scratchpad/notebook4d655d3c18/edit

Multi-layer Perceptron The Multi-layer Perceptron (MLP), which is still one of the most commonly used machine learning methods around. The MLP is one of the most common neural networks in use. It is often treated as a ‘black box’, in that people use it without understanding how it works.

Multi-layer Perceptron – Going Forward Training the MLP consists of two parts: working out what the outputs are for the given inputs and the current weights, and then updating the weights according to the error, which is a function of the difference between the outputs and the targets . These are generally known as going forwards and backwards through the network . Biases

Multi-layer Perceptron – Going Backward Going Backward – Back Propagation of Error The method that we are going to look at is called back-propagation of error, which makes it clear that the errors are sent backwards through the network. It is a form of gradient descent What we did was to choose an error function for each neuron k : Ek = yk − tk , and tried to make it as small as possible . Sum of squares error function

Multi-layer Perceptron – Going Backward The Most commonly used form of the error function is; We will do that first for the nodes connected to the output layer, and after we have updated those, we will work backwards through the network until we get back to the inputs again. There are just two problems: • for the output neurons, we don’t know the inputs. • for the hidden neurons, we don’t know the targets; for extra hidden layers, we know neither the

MLP Algorithm We will assume that there are L input nodes, plus the bias, M hidden nodes, also plus a bias, and N output nodes, so that there are (L+1)×M weights between the input and the hidden layer and (M+1)× N between the hidden layer and the output. The sums that we write will start from 0 if they include the bias nodes and 1 otherwise, and run up to L,M, or N, so that x0 = −1 is the bias input, and a0 = −1 is the bias hidden node.

MLP Algorithm

MLP in Practice Amount of Training Data Number of Hidden Layer When to stop Number of iterations Reach minimum error (sum of squares)

Examples of Using MLP Regression Problem

2. Classification Problem

3. Classification Example – Iris Dataset https:// www.kaggle.com/scratchpad/notebook4d655d3c18/edit 4. Time Series Prediction 5. Data Compression – Auto Associative network

Deriving Backpropagation One is the derivative (with respect to x) of ½ x 2 , which is x, and another is the chain rule , which says that dy /dx = dy / dt.dt /dx . The third thing is very simple: dy /dx = 0 if y is not a function of x. 1. The Network Output and the Error The output of the neural network (the end of the forward phase of the algorithm) is a function of three things: • the current input ( x ) • the activation function g (·) of the nodes of the network • the weights of the network ( v for the first layer and w for the second )

Deriving Backpropagation 2. Error of the Network

Deriving Backpropagation 2. Error of the Network

Deriving Backpropagation 3. Requirements of activation function

Deriving Backpropagation 3. Requirements of activation function

Deriving Backpropagation 4. Back Propagation of Error

Deriving Backpropagation 4. Back Propagation of Error

Deriving Backpropagation 4. Back Propagation of Error

Deriving Backpropagation 4. Back Propagation of Error

5. Output Activation Function https://datamapu.com/posts/deep_learning/backpropagation/

Radial Basis Function Radial Basis Function is defined as the mathematical function that takes real-valued input and provides the real-valued output determined by the distance between the input value and a fixed point projected in space. This fixed point is positioned at an imaginary location within the plane The most commonly used Radial Basis Function in machine learning is the Gaussian Radial Basis Function spatial context . One primary reason for the need of RBFs is their ability to efficiently capture complex, non-linear relationships within data.

Radial Basis Function The RBF calculates the Euclidean distance between the input vector and the center , squares it, divides it by 2σ 2 , and applies an exponential function. The result is a weighted output that reflects the proximity of the input to the center . 

Types of Radial Basis Function

Radial Basis Network

Radial Basis Network

Radial Basis Network - Training RBF

References https :// towardsdatascience.com/radial-basis-functions-neural-networks-all-we-need-to-know-9a88cc053448 https://www.hackerearth.com/blog/developers/radial-basis-function-network / https:// hackernoon.com/radial-basis-functions-types-advantages-and-use-cases

RBF and Spline In order to overcome the disadvantages of polynomial regression, we can use an improved regression technique which, instead of building one model for the entire dataset, divides the dataset into multiple bins and fits each bin with a separate model.  Such a technique is known as Regression spline . Regression splines is one of the most important non linear regression techniques. In polynomial regression, we generated new features by using various polynomial functions on the existing features which imposed a global structure on the dataset. To overcome this, we can divide the distribution of the data into separate portions and fit linear or low degree polynomial functions on each of these portions.

RBF and Spline

RBF and Spline The points where the division occurs are called Knots . Functions which we can use for modelling each piece/bin are known as Piecewise functions. There are various piecewise functions that we can use to fit these individual bins. https://www.analyticsvidhya.com/blog/2018/03/introduction-regression-splines-python-codes/

Curse of Dimensionality Curse of Dimensionality refers to a set of problems that arise when working with high-dimensional data. The dimension of a dataset corresponds to the number of attributes/features that exist in a dataset. A dataset with a large number of attributes, generally of the order of a hundred or more, is referred to as high dimensional data. Some of the difficulties that come with high dimensional data manifest during analyzing or visualizing the data to identify patterns, and some manifest while training machine learning models. The difficulties related to training machine learning models due to high dimensional data is referred to as ‘Curse of Dimensionality’. The popular aspects of the curse of dimensionality; ‘data sparsity’ and ‘distance concentration ’.

Curse of Dimensionality The training samples do not capture all combinations, is referred to as ‘ Data   sparsity ’ or simply ‘ sparsity’  in high dimensional data. Data sparsity is one of the facets of the curse of dimensionality. Training a model with sparse data could lead to high-variance or overfitting condition. This is because while training the model, the model has learnt from the frequently occurring combinations of the attributes and can predict the outcome accurately . In real-time when less frequently occurring combinations are fed to the model, it may not predict the outcome accurately. 

Curse of Dimensionality ‘Distance concentration’ refers to the problem of all the pairwise distances between different samples/points in the space converging to the same value as the dimensionality of the data increases. Several machine learning models such as clustering or nearest neighbours’ methods use distance-based metrics to identify similar or proximity of the samples.

Interpolation and Basis Function 1. Bases and Basic Function

Interpolation and Basis Function 2. Cubic Spline 3. Fitting Spline to the Data

Interpolation and Basis Function 4. Smoothing Spline 5.

Interpolation and Basis Function 5 . Higher Dimensions One common thing that is done is to take a set of independent basis functions in each different coordinate ( x , y , and z in 3D) and then to combine them in all possible combinations ( xi ( x ) yj ( y ) zk ( z )). This is known as the tensor product basis. Curse of Dimensionality Smoothing Spline

Support Vector Machine Optimal Separable

Support Vector Machine Margin and support vector

Support Vector Machine A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below.

Support Vector Machine What is a hyperplane ? As a simple example, for a classification task with only two features (like the image above), you can think of a hyperplane as a line that linearly separates and classifies a set of data . The distance between the hyperplane and the nearest data point from either set is known as the margin. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly. Mapping of data into higher dimensional is referred as Kernelling.

Support Vector Machine Kernels SVM algorithms use a set of mathematical functions that are defined as the kernel. The function of kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions. These functions can be different types. For example  linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid . The most used type of kernel function is  RBF.  Because it has localized and finite response along the entire x-axis. The kernel functions return the inner product between two points in a suitable feature space. Thus by defining a notion of similarity, with little computational cost even in very high-dimensional spaces.

Exercise Create an RBF network that solves the XOR function . Back propagation Network design https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example / https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python / https:// neptune.ai/blog/backpropagation-algorithm-in-neural-networks-guide SVM https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code / https :// www.youtube.com/watch?v=ivPoCcYfFAw https:// axon.cs.byu.edu/Dan/678/miscellaneous/SVM.example.pdf