A Beginner's Approach to Deep Learning Techniques

DrAnirbanDasgupta1 54 views 24 slides Sep 29, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

This is an introductory course on DL


Slide Content

A Beginner’s Approach to Deep Learning

What is Deep Learning? P rovides computers or computing systems the ability to automatically learn and improve from experience without being explicitly programmed U ses multiple layers to progressively extract higher level features from the raw input, with the features and classification both learned from data Algorithms to mimic human intelligence with some logic rule which may or may not be trained through some data

Courtesy : Semiconductor Engineering Machine Learning vs Deep Learning

Deep Learning in Healthcare Disease diagnosis Medical imaging Smart health records Disease prediction Personalized medicine

Hierarchy of ML

Input layer takes in numerical features Input layers are often connected to hidden layers and finally to output layer These connections are called edges Edges typically have a weight that adjusts as learning proceeds Each circular unit is called a node The inputs at each node is multiplied with the corresponding weights, added with a bias, passed through an activation function to obtain the output Neural Networks

Perceptron – Simplest Neural Network Courtesy : Towards Data Science Perceptron is a single layer neural network inputs weights × + bias activate The perceptron consists of 4 parts. Inputs Weights and Bias Net sum Activation Function Weights show the strength of the particular edge A bias value allows you to shift the activation function curve up or down A ctivation functions are used to map the input between the required values like (0, 1) or (-1, 1).

Multi-Layer Perceptron (MLP) Courtesy : Towards Data Science MLP has more than a single layer Layers between input and output are hidden layers MLP utilizes a supervised learning technique called backpropagation for training It can distinguish data that is not linearly separable. As we increase no. of layers in MLP, we enter DL.

Convolutional Neural Networks (CNN) Courtesy : Towards Data Science ConvNets have the ability to learn these filters Less features to train than MLP, if input size is large CNNs have four types of layers: Convolution layer Pooling layer Dense/Fully connected layer Activation layer CNNs are well-suited for image classification tasks Popular CNN architectures include LeNet , AlexNet , VGGNet , etc.

Convolution Layer Courtesy : IBM Research The convolutional layer is the core building block of a CNN Filter values are the weights which are learned during training Convolution kernel or a filter moves across the receptive fields of the image, checking if the feature is present A dot product is calculated between the input pixels and the filter The filter shifts by a stride, repeating the process until the kernel has swept across the entire image The final output from the series of dot products from the input and the filter is known as a feature map, activation map

Pooling Layer Courtesy : Towards Data Science Pooling layer conducts dimensionality reduction, reducing the number of parameters in the input Pooling layer does not have any trainable weights There are two main types of pooling: Max pooling Average pooling Pooling layers help to reduce complexity, improve efficiency, and limit risk of overfitting.

Fully Connected Layer Courtesy : Towards Data Science I n the fully-connected layer, each node in the output layer connects directly to a node in the previous layer This layer performs the task of classification based on the features extracted through the previous layers and their different filters. 

Activation Functions Courtesy : Towards Data Science Activation function is used to map the input feature value to get the desired output of node It is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or -1 to 1 etc. Sigmoid or Logistic Activation Function The main reason why we use sigmoid function is because it exists between 0 to 1.  Therefore, it is especially used for models where we have to  predict the probability  as an output. The function is differentiable, so we can find the slope of the sigmoid curve at any two points. The logistic sigmoid function can cause a neural network to get stuck at the training time.

Activation Functions Courtesy : Towards Data Science Softmax Activation Function The  softmax function is a more generalized logistic activation function which is used for multiclass classification.

Activation Functions Courtesy : Towards Data Science Tanh or hyperbolic tangent Activation Function The range of the tanh function is from -1 to 1 The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph. The tanh function is mainly used classification between two classes.

Activation Functions Courtesy : Towards Data Science ReLU (Rectified Linear Unit) Activation Function The range of the ReLU function is from to positive infinity The disadvantage is that the any negative input given to the ReLU activation function turns the value into zero immediately in the graph

Activation Functions Courtesy : Towards Data Science Leaky ReLU Activation Function It is an attempt to solve the dying ReLU problem Usually, the value of a is 0.01 T he range of the Leaky ReLU is -infinity to infinity

Training a CNN Courtesy : Andrej Karpathy Once the architecture is fixed, the training is possible Training is the fixing of optimal weights of the convolutional and fully connected layers Backpropagation of error is used for updating the weights A loss function is optimized w.r.t. the weights Training uses Stochastic Gradient Decent optimization Parameters include learning rate, optimizer, batch size, validation split, metric, and loss Training is to map a set of inputs to a set of outputs from training data.

Losses Courtesy : Machine Learning Mastery Loss function is used to evaluate the performance of prediction, and is a measure of the error in prediction Binary Classification Problems Multi-Class Classification Problems Regression Problems Mean Squared Error (MSE) Cross-Entropy Categorical Cross-Entropy Sparse Categorical Cross-Entropy

Metrics Courtesy : Towards Data Science

Getting Started Courtesy : Towards Data Science Install keras and tensorflow Prepare the training and testing data Build the CNN layers using the Tensorflow library Select the Optimizer Train the network Finally, test the model pip install keras and pip install tensorflow Make separate folders, without repeating data, and separate classes in different folders in training data Use a sequential model and keep stacking layers The most common one is Adam Use suitable parameters based on the problem Test it on the test database

Text Recognition RNTN, RNN Image Recognition CNN, DBN Object Recognition CNN, RNTN Time Series Analysis RNN Video Analysis RNN Other Networks

References Nettleton, David F., Albert Orriols -Puig, and Albert Fornells . "A study of the effect of different types of noise on the precision of supervised learning techniques." Artificial intelligence review 33.4 (2010): 275-306. Polikar , Robi . "Ensemble learning." Ensemble machine learning. Springer, Boston, MA, 2012. 1-34. Holzinger , Andreas, et al. "Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI." International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, Cham, 2018. Seliya , Naeem, Taghi M. Khoshgoftaar , and Jason Van Hulse. "A study on the relationships of classifier performance metrics." 2009 21st IEEE international conference on tools with artificial intelligence. IEEE, 2009. Blum, Avrim L., and Pat Langley. "Selection of relevant features and examples in machine learning." Artificial intelligence 97.1-2 (1997): 245-271. Hastie, Trevor, Robert Tibshirani , and Jerome Friedman. "Unsupervised learning." The elements of statistical learning. Springer, New York, NY, 2009. 485-585. Deng, Li, Geoffrey Hinton, and Brian Kingsbury. "New types of deep neural network learning for speech recognition and related applications: An overview." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.