ACTIVATION FUNCTIONS IN SOFT COMPUTING AW

sssmrockz 35 views 13 slides Aug 16, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

THIS PRESENTATION HAS THE SOFT COMPUTING REQUIRED ACTIVATION FUNCTIONS WHICH IS ALSO FOUND IN DEEP LEARNING AND MACHINE LEARNING AND ALSO WHICH IS USED EVEN IN AI(ARTIFICIAL INTELLIGENCE)


Slide Content

ACTIVATION FUNCTIONS An  activation function  in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. Sometimes the activation function is called a “ transfer function .” If the output range of the activation function is limited, then it may be called a “ squashing function .” These can either be linear or nonlinear depending on the function it represents and is used to control the output of neural networks across different domains. Many activation functions are nonlinear and may be referred to as the “ nonlinearity ” in the layer or the network design. Technically, the activation function is used within or after the internal processing of each node in the network, although networks are designed to use the same activation function for all nodes in a layer . All hidden layers typically use the same activation function. The output layer will typically use a different activation function from the hidden layers and is dependent upon the type of prediction required by the model. Activation functions are also typically differentiable, meaning the first-order derivative can be calculated for a given input value. This is required given that neural networks are typically trained using the backpropagation of error algorithm that requires the derivative of prediction error in order to update the weights of the model. There are many different types of activation functions used in neural networks, although perhaps only a small number of functions used in practice for hidden and output layers.

The need for these activation functions includes converting the linear input signals and models into non-linear output signals, which aids the learning of high order polynomials for deeper networks. Properties of activation functions Non Linearity  Continuously differentiable Range Monotonic Approximates identity near the origin Types of Activation Functions The activation function can be broadly classified into 3 categories. Binary Step Function Linear Activation Function Non-Linear Activation Functions What is a Good Activation Function? A proper choice has to be made in choosing the activation function to improve the results in neural network computing. All activation functions must be monotonic, differentiable, and quickly converging with respect to the weights for optimization purposes.

Binary Step Function A binary step function is generally used in the Perceptron linear classifier. It thresholds the input values to 1 and 0, if they are greater or less than zero, respectively. The step function is mainly used in binary classification problems and works well for linearly severable pr. It can’t classify the multi-class problems. Linear Activation Function A linear function is also known as a straight-line function where the activation is proportional to the input i.e. the weighted sum from neurons. The equation for Linear activation function is: f(x) = a.x   When a = 1 then f(x) = x and this is a special case known as identity. Properties: Range is -infinity to +infinity. Provides a convex error surface so optimisation can be achieved faster. df (x)/dx = a which is constant. So cannot be optimised with gradient descent. Limitations: Since the derivative is constant, the gradient has no relation with input. Back propagation is constant as the change is delta x . The linear activation function, also known as "no activation," or "identity function" (multiplied x1.0), is where the activation is proportional to the input. The function doesn't do anything to the weighted sum of the input, it simply spits out the value it was given.

Activation for Hidden Layers A hidden layer in a neural network is a layer that receives input from another layer (such as another hidden layer or an input layer) and provides output to another layer (such as another hidden layer or an output layer). A hidden layer does not directly contact input data or produce outputs for a model, at least in general. A neural network may have zero or more hidden layers. Typically, a differentiable nonlinear activation function is used in the hidden layers of a neural network. This allows the model to learn more complex functions than a network trained using a linear activation function There are perhaps three activation functions you may want to consider for use in hidden layers; they are: Rectified Linear Activation ( ReLU ) Logistic ( Sigmoid ) Hyperbolic Tangent ( Tanh ) This is not an exhaustive list of activation functions used for hidden layers, but they are the most commonly used.

Non-Linear Activation Functions The non-linear functions are known to be the most used activation functions. It makes it easy for a neural network model to adapt with a variety of data and to differentiate between the outcomes. These functions are mainly divided basis on their range or curves: a) Sigmoid Activation Functions Sigmoid takes a real value as the input and outputs another value between 0 and 1. The sigmoid activation function translates the input ranged in (-∞,∞) to the range in (0,1) b) Tanh Activation Functions The tanh function is just another possible function that can be used as a non-linear activation function between layers of a neural network. It shares a few things in common with the sigmoid activation function. Unlike a sigmoid function that will map input values between 0 and 1, the Tanh will map values between -1 and 1. Similar to the sigmoid function, one of the interesting properties of the tanh function is that the derivative of tanh can be expressed in terms of the function itself. c) ReLU Activation Functions The formula is deceptively simple:  max(0,z) . Despite its name, Rectified Linear Units, it’s not linear and provides the same benefits as Sigmoid but with better performance. ( i ) Leaky Relu Leaky Relu is a variant of ReLU . Instead of being 0 when z<0, a leaky ReLU allows a small, non-zero, constant gradient α (normally, α=0.01). However, the consistency of the benefit across tasks is presently unclear. Leaky ReLUs attempt to fix the “dying ReLU ” problem. (ii) Parametric Relu PReLU gives the neurons the ability to choose what slope is best in the negative region. They can become ReLU or leaky ReLU with certain values of α.

d) Maxout : The Maxout activation is a generalization of the ReLU and the leaky ReLU functions. It is a piecewise linear function that returns the maximum of inputs, designed to be used in conjunction with the dropout regularization technique. Both ReLU and leaky ReLU are special cases of Maxout . The Maxout neuron, therefore, enjoys all the benefits of a ReLU unit and does not have any drawbacks like dying ReLU . However, it doubles the total number of parameters for each neuron, and hence, a higher total number of parameters need to be trained. e) ELU The Exponential Linear Unit or ELU is a function that tends to converge faster and produce more accurate results. Unlike other activation functions, ELU has an extra alpha constant which should be a positive number. ELU is very similar to ReLU except for negative inputs. They are both in the identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas ReLU sharply smoothes . f) Softmax Activation Functions Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In a general way, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will help determine the target class for the given inputs.

Activation Functions And Their Derivatives

How To Choose A Hidden Layer Activation Function A neural network will almost always have the same activation function in all hidden layers. It is most unusual to vary the activation function through a network model. Traditionally, the sigmoid activation function was the default activation function in the 1990s. Perhaps through the mid to late 1990s to 2010s, the Tanh function was the default activation function for hidden layers. Both the sigmoid and Tanh functions can make the model more susceptible to problems during training, via the so-called vanishing gradients problem. The activation function used in hidden layers is typically chosen based on the type of neural network architecture. Modern neural network models with common architectures, such as MLP and CNN, will make use of the ReLU activation function, or extensions. Recurrent networks still commonly use Tanh or sigmoid activation functions, or even both. For example, the LSTM commonly uses the Sigmoid activation for recurrent connections and the Tanh activation for output. Multilayer Perceptron (MLP) : ReLU activation function. Convolutional Neural Network (CNN) : ReLU activation function. Recurrent Neural Network : Tanh and/or Sigmoid activation function.

Activation for Output Layers The output layer is the layer in a neural network model that directly outputs a prediction. All feed-forward neural network models have an output layer. There are perhaps three activation functions you may want to consider for use in the output layer; they are: Linear Logistic (Sigmoid) Softmax This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used. How to Choose an Output Activation Function You must choose the activation function for your output layer based on the type of prediction problem that you are solving. Specifically, the type of variable that is being predicted. For example, you may divide prediction problems into two main groups, predicting a categorical variable ( classification ) and predicting a numerical variable ( regression ). If your problem is a regression problem, you should use a linear activation function. Regression : One node, linear activation.

If your problem is a classification problem, then there are three main types of classification problems and each may use a different activation function. Binary Classification : One node, sigmoid activation. Multiclass Classification : One node per class, softmax activation. Multilabel Classification : One node per class, sigmoid activation.
Tags