This presentation covers the Multi Layer Perceptron Network and SVM model
Size: 1.57 MB
Language: en
Added: Oct 04, 2024
Slides: 51 pages
Slide Content
Machine Learning Techniques Dr. M. Lilly Florence Professor Adhiyamaan College of Engineering (Autonomous) Hosur , Tamilnadu
Linear Models Multi-layer Perceptron Going Forwards Going Backwards: Back Propagation Error Multi-layer Perceptron in Practice Examples of using the MLP Overview – Deriving Back-Propagation Radial Basis Functions and Splines – Concepts, RBF Network Curse of Dimensionality Interpolations and Basis Functions Support Vector Machines https://www.kaggle.com/scratchpad/notebook4d655d3c18/edit
Multi-layer Perceptron The Multi-layer Perceptron (MLP), which is still one of the most commonly used machine learning methods around. The MLP is one of the most common neural networks in use. It is often treated as a ‘black box’, in that people use it without understanding how it works.
Multi-layer Perceptron – Going Forward Training the MLP consists of two parts: working out what the outputs are for the given inputs and the current weights, and then updating the weights according to the error, which is a function of the difference between the outputs and the targets . These are generally known as going forwards and backwards through the network . Biases
Multi-layer Perceptron – Going Backward Going Backward – Back Propagation of Error The method that we are going to look at is called back-propagation of error, which makes it clear that the errors are sent backwards through the network. It is a form of gradient descent What we did was to choose an error function for each neuron k : Ek = yk − tk , and tried to make it as small as possible . Sum of squares error function
Multi-layer Perceptron – Going Backward The Most commonly used form of the error function is; We will do that first for the nodes connected to the output layer, and after we have updated those, we will work backwards through the network until we get back to the inputs again. There are just two problems: • for the output neurons, we don’t know the inputs. • for the hidden neurons, we don’t know the targets; for extra hidden layers, we know neither the
MLP Algorithm We will assume that there are L input nodes, plus the bias, M hidden nodes, also plus a bias, and N output nodes, so that there are (L+1)×M weights between the input and the hidden layer and (M+1)× N between the hidden layer and the output. The sums that we write will start from 0 if they include the bias nodes and 1 otherwise, and run up to L,M, or N, so that x0 = −1 is the bias input, and a0 = −1 is the bias hidden node.
MLP Algorithm
MLP in Practice Amount of Training Data Number of Hidden Layer When to stop Number of iterations Reach minimum error (sum of squares)
Examples of Using MLP Regression Problem
2. Classification Problem
3. Classification Example – Iris Dataset https:// www.kaggle.com/scratchpad/notebook4d655d3c18/edit 4. Time Series Prediction 5. Data Compression – Auto Associative network
Deriving Backpropagation One is the derivative (with respect to x) of ½ x 2 , which is x, and another is the chain rule , which says that dy /dx = dy / dt.dt /dx . The third thing is very simple: dy /dx = 0 if y is not a function of x. 1. The Network Output and the Error The output of the neural network (the end of the forward phase of the algorithm) is a function of three things: • the current input ( x ) • the activation function g (·) of the nodes of the network • the weights of the network ( v for the first layer and w for the second )
Deriving Backpropagation 2. Error of the Network
Deriving Backpropagation 2. Error of the Network
Deriving Backpropagation 3. Requirements of activation function
Deriving Backpropagation 3. Requirements of activation function
Deriving Backpropagation 4. Back Propagation of Error
Deriving Backpropagation 4. Back Propagation of Error
Deriving Backpropagation 4. Back Propagation of Error
Deriving Backpropagation 4. Back Propagation of Error
5. Output Activation Function https://datamapu.com/posts/deep_learning/backpropagation/
Radial Basis Function Radial Basis Function is defined as the mathematical function that takes real-valued input and provides the real-valued output determined by the distance between the input value and a fixed point projected in space. This fixed point is positioned at an imaginary location within the plane The most commonly used Radial Basis Function in machine learning is the Gaussian Radial Basis Function spatial context . One primary reason for the need of RBFs is their ability to efficiently capture complex, non-linear relationships within data.
Radial Basis Function The RBF calculates the Euclidean distance between the input vector and the center , squares it, divides it by 2σ 2 , and applies an exponential function. The result is a weighted output that reflects the proximity of the input to the center .
RBF and Spline In order to overcome the disadvantages of polynomial regression, we can use an improved regression technique which, instead of building one model for the entire dataset, divides the dataset into multiple bins and fits each bin with a separate model. Such a technique is known as Regression spline . Regression splines is one of the most important non linear regression techniques. In polynomial regression, we generated new features by using various polynomial functions on the existing features which imposed a global structure on the dataset. To overcome this, we can divide the distribution of the data into separate portions and fit linear or low degree polynomial functions on each of these portions.
RBF and Spline
RBF and Spline The points where the division occurs are called Knots . Functions which we can use for modelling each piece/bin are known as Piecewise functions. There are various piecewise functions that we can use to fit these individual bins. https://www.analyticsvidhya.com/blog/2018/03/introduction-regression-splines-python-codes/
Curse of Dimensionality Curse of Dimensionality refers to a set of problems that arise when working with high-dimensional data. The dimension of a dataset corresponds to the number of attributes/features that exist in a dataset. A dataset with a large number of attributes, generally of the order of a hundred or more, is referred to as high dimensional data. Some of the difficulties that come with high dimensional data manifest during analyzing or visualizing the data to identify patterns, and some manifest while training machine learning models. The difficulties related to training machine learning models due to high dimensional data is referred to as ‘Curse of Dimensionality’. The popular aspects of the curse of dimensionality; ‘data sparsity’ and ‘distance concentration ’.
Curse of Dimensionality The training samples do not capture all combinations, is referred to as ‘ Data sparsity ’ or simply ‘ sparsity’ in high dimensional data. Data sparsity is one of the facets of the curse of dimensionality. Training a model with sparse data could lead to high-variance or overfitting condition. This is because while training the model, the model has learnt from the frequently occurring combinations of the attributes and can predict the outcome accurately . In real-time when less frequently occurring combinations are fed to the model, it may not predict the outcome accurately.
Curse of Dimensionality ‘Distance concentration’ refers to the problem of all the pairwise distances between different samples/points in the space converging to the same value as the dimensionality of the data increases. Several machine learning models such as clustering or nearest neighbours’ methods use distance-based metrics to identify similar or proximity of the samples.
Interpolation and Basis Function 1. Bases and Basic Function
Interpolation and Basis Function 2. Cubic Spline 3. Fitting Spline to the Data
Interpolation and Basis Function 4. Smoothing Spline 5.
Interpolation and Basis Function 5 . Higher Dimensions One common thing that is done is to take a set of independent basis functions in each different coordinate ( x , y , and z in 3D) and then to combine them in all possible combinations ( xi ( x ) yj ( y ) zk ( z )). This is known as the tensor product basis. Curse of Dimensionality Smoothing Spline
Support Vector Machine Optimal Separable
Support Vector Machine Margin and support vector
Support Vector Machine A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes. SVMs are more commonly used in classification problems and as such, this is what we will focus on in this post. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes, as shown in the image below.
Support Vector Machine What is a hyperplane ? As a simple example, for a classification task with only two features (like the image above), you can think of a hyperplane as a line that linearly separates and classifies a set of data . The distance between the hyperplane and the nearest data point from either set is known as the margin. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly. Mapping of data into higher dimensional is referred as Kernelling.
Support Vector Machine Kernels SVM algorithms use a set of mathematical functions that are defined as the kernel. The function of kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions. These functions can be different types. For example linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid . The most used type of kernel function is RBF. Because it has localized and finite response along the entire x-axis. The kernel functions return the inner product between two points in a suitable feature space. Thus by defining a notion of similarity, with little computational cost even in very high-dimensional spaces.
Exercise Create an RBF network that solves the XOR function . Back propagation Network design https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example / https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python / https:// neptune.ai/blog/backpropagation-algorithm-in-neural-networks-guide SVM https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code / https :// www.youtube.com/watch?v=ivPoCcYfFAw https:// axon.cs.byu.edu/Dan/678/miscellaneous/SVM.example.pdf