Deep learning(UNIT 3) BY Ms SURBHI SAROHA

SURBHISAROHA 85 views 34 slides Jul 29, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

BCA V


Slide Content

Deep learning (unit 3) BY:SURBHI SAROHA

SYLLABUS DIMENTIONALITY REDUCTION : Linear (PCA, LDA) and manifolds, metric learning – Auto encoders and dimensionality reduction in networks - Introduction to Convnet - Architectures – AlexNet , VGG, Inception, ResNet -Training a Convnet : weights initialization, batch normalization, hyperparameter optimization Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

DIMENTIONALITY REDUCTION : Linear (PCA, LDA) and manifolds Dimensionality reduction is the process of reducing the number of features (or dimensions) in a dataset while retaining as much information as possible. This can be done for a variety of reasons, such as to reduce the complexity of a model, to improve the performance of a learning algorithm, or to make it easier to visualize the data. There are several techniques for dimensionality reduction, including principal component analysis (PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA ). Each technique uses a different method to project the data onto a lower-dimensional space while preserving important information. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

What is Dimensionality Reduction? Dimensionality reduction is a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible. In other words, it is a process of transforming high-dimensional data into a lower-dimensional space that still preserves the essence of the original data. In machine learning, high-dimensional data refers to data with a large number of features or variables. The curse of dimensionality is a common problem in machine learning, where the performance of the model deteriorates as the number of features increases. This is because the complexity of the model increases with the number of features, and it becomes more difficult to find a good solution. In addition, high-dimensional data can also lead to overfitting, where the model fits the training data too closely and does not generalize well to new data. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Cont ….. Dimensionality reduction can help to mitigate these problems by reducing the complexity of the model and improving its generalization performance. There are two main approaches to dimensionality reduction: feature selection and feature extraction. Feature Selection: Feature selection involves selecting a subset of the original features that are most relevant to the problem at hand. The goal is to reduce the dimensionality of the dataset while retaining the most important features. There are several methods for feature selection, including filter methods, wrapper methods, and embedded methods. Filter methods rank the features based on their relevance to the target variable, wrapper methods use the model performance as the criteria for selecting features, and embedded methods combine feature selection with the model training process. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Cont …. Feature Extraction: Feature extraction involves creating new features by combining or transforming the original features . The goal is to create a set of features that captures the essence of the original data in a lower-dimensional space . There are several methods for feature extraction, including principal component analysis (PCA), linear discriminant analysis (LDA), and t-distributed stochastic neighbor embedding (t-SNE). PCA is a popular technique that projects the original features onto a lower-dimensional space while preserving as much of the variance as possible. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Components of Dimensionality Reduction There are two components of dimensionality reduction: Feature selection:  In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which can be used to model the problem. It usually involves three ways: Filter Wrapper Embedded Feature extraction:  This reduces the data in a high dimensional space to a lower dimension space, i.e. a space with lesser no. of dimensions. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Principal Component Analysis (PCA) This method was introduced by Karl Pearson. It works on the condition that while the data in a higher dimensional space is mapped to data in a lower dimension space, the variance of the data in the lower dimensional space should be maximum. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Advantages of Dimensionality Reduction It helps in data compression, and hence reduced storage space. It reduces computation time. It also helps remove redundant features, if any. Improved Visualization: High dimensional data is difficult to visualize, and dimensionality reduction techniques can help in visualizing the data in 2D or 3D, which can help in better understanding and analysis. Overfitting Prevention: High dimensional data may lead to overfitting in machine learning models, which can lead to poor generalization performance. Dimensionality reduction can help in reducing the complexity of the data, and hence prevent overfitting. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Cont ….. Feature Extraction: Dimensionality reduction can help in extracting important features from high dimensional data, which can be useful in feature selection for machine learning models. Data Preprocessing : Dimensionality reduction can be used as a preprocessing step before applying machine learning algorithms to reduce the dimensionality of the data and hence improve the performance of the model. Improved Performance: Dimensionality reduction can help in improving the performance of machine learning models by reducing the complexity of the data, and hence reducing the noise and irrelevant information in the data. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Disadvantages of Dimensionality Reduction It may lead to some amount of data loss. PCA tends to find linear correlations between variables, which is sometimes undesirable. PCA fails in cases where mean and covariance are not enough to define datasets. We may not know how many principal components to keep- in practice, some thumb rules are applied. Interpretability: The reduced dimensions may not be easily interpretable, and it may be difficult to understand the relationship between the original features and the reduced dimensions. Overfitting: In some cases, dimensionality reduction may lead to overfitting, especially when the number of components is chosen based on the training data. Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to outliers, which can result in a biased representation of the data. Computational complexity: Some dimensionality reduction techniques, such as manifold learning, can be computationally intensive, especially when dealing with large datasets. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality reduction techniques in machine learning to solve more than two-class classification problems. It is also known as Normal Discriminant Analysis (NDA) or Discriminant Function Analysis (DFA). This can be used to project the features of higher dimensional space into lower-dimensional space in order to reduce resources and dimensional costs. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

What is Linear Discriminant Analysis (LDA )? Although the logistic regression algorithm is limited to only two-class, linear Discriminant analysis is applicable for more than two classes of classification problems. Linear Discriminant analysis is one of the most popular dimensionality reduction techniques used for supervised classification problems in machine learning . It is also considered a pre-processing step for modeling differences in ML and applications of pattern classification . Whenever there is a requirement to separate two or more classes having multiple features efficiently, the Linear Discriminant Analysis model is considered the most common technique to solve such classification problems. For e.g., if we have two classes with multiple features and need to separate them efficiently. When we classify them using a single feature, then it may show overlapping. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

metric learning – Auto encoders and dimensionality reduction in networks At the heart of  deep learning  lies the neural network, an intricate interconnected system of nodes that mimics the human brain’s neural architecture. Neural networks excel at discerning intricate patterns and representations within vast datasets, allowing them to make predictions, classify information, and generate novel insights.   Autoencoders   emerge as a fascinating subset of neural networks, offering a unique approach to unsupervised learning . Autoencoders are an adaptable and strong class of architectures for the dynamic field of deep learning, where neural networks develop constantly to identify complicated patterns and representations. With their ability to learn effective representations of data, these unsupervised learning models have received considerable attention and are useful in a wide variety of areas, from image processing to anomaly detection. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

What are Auto encoders ? Auto encoders are a specialized class of algorithms that can learn efficient representations of input data with no need for labels. It is a class of  artificial neural networks  designed for  unsupervised learning . Learning to compress and effectively represent input data without specific labels is the essential principle of an automatic decoder. This is accomplished using a two-fold structure that consists of an encoder and a decoder. The encoder transforms the input data into a reduced-dimensional representation, which is often referred to as “latent space” or “encoding ”. From that representation, a decoder rebuilds the initial input. For the network to gain meaningful patterns in data, a process of encoding and decoding facilitates the definition of essential features. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Architecture of Auto encoder in Deep Learning The general architecture of an auto encoder includes an encoder, decoder, and bottleneck layer . Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Encoder Input layer take raw input data The hidden layers progressively reduce the dimensionality of the input, capturing important features and patterns. These layer compose the encoder. The bottleneck layer (latent space) is the final hidden layer, where the dimensionality is significantly reduced. This layer represents the compressed encoding of the input data. Decoder The bottleneck layer takes the encoded representation and expands it back to the dimensionality of the original input. The hidden layers progressively increase the dimensionality and aim to reconstruct the original input. The output layer produces the reconstructed output, which ideally should be as close as possible to the input data. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Introduction to Convnet - Architectures – AlexNet , VGG, Inception, ResNet These are some groundbreaking CNN architectures that were proposed to achieve a better accuracy and to reduce the computational cost . AlexNet This network was very similar to LeNet-5 but was deeper with 8 layers, with more filters, stacked convolutional layers, max pooling, dropout, data augmentation, ReLU and SGD. AlexNet was the winner of the ImageNet ILSVRC-2012 competition, designed by Alex Krizhevsky , Ilya Sutskever and Geoffery E. Hinton. It was trained on two Nvidia Geforce GTX 580 GPUs, therefore, the network was split into two pipelines. AlexNet has 5 Convolution layers and 3 fully connected layers. AlexNet consists of approximately 60 M parameters. A major drawback of this network was that it comprises of too many hyper-parameters. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

AlexNet Architecture Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

VGG-16 Net The major shortcoming of too many hyper-parameters of AlexNet was solved by VGG Net by replacing large kernel-sized filters (11 and 5 in the first and second convolution layer, respectively) with multiple 3×3 kernel-sized filters one after another.  The architecture developed by Simonyan and Zisserman was the 1st runner up of the Visual Recognition Challenge of 2014. The architecture consist of 3*3 Convolutional filters, 2*2 Max Pooling layer with a stride of 1, keeping the padding same to preserve the dimension. In total, there are 16 layers in the network where the input image is RGB format with dimension of 224*224*3, followed by 5 pairs of Convolution(filters: 64, 128, 256,512,512) and Max Pooling. The output of these layers is fed into three fully connected layers and a softmax function in the output layer. In total there are 138 Million parameters in VGG Net. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

VGG-16 Architecture Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Drawbacks of VGG Net: 1 . Long training time 2. Heavy model 3. Computationally expensive 4. Vanishing/exploding gradient problem Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Inception Net Inception network also known as GoogleLe Net was proposed by developers at google in “Going Deeper with Convolutions” in 2014. The motivation of InceptionNet comes from the presence of sparse features Salient parts in the image that can have a large variation in size. Due to this, the selection of right kernel size becomes extremely difficult as big kernels are selected for global features and small kernels when the features are locally located. The InceptionNets resolves this by stacking multiple kernels at the same level. Typically it uses 5*5, 3*3 and 1*1 filters in one go. For better understanding refer to the image below: Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Inception Module of GoogleLe Net Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

ResNet ResNet , the winner of ILSVRC-2015 competition are deep networks of over 100 layers. Residual networks are similar to VGG nets however with a sequential approach they also use “Skip connections” and “batch normalization” that helps to train deep layers without hampering the performance. After VGG Nets, as CNNs were going deep, it was becoming hard to train them because of vanishing gradients problem that makes the derivate infinitely small. Therefore , the overall performance saturates or even degrades. The idea of skips connection came from highway network where gated shortcut connections were used. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Normal Deep Networks vs Networks with skip connections Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Training a Convnet : weights initialization While building and training neural networks, it is crucial to initialize the weights appropriately to ensure a model with high accuracy. If the weights are not correctly initialized, it may give rise to the Vanishing Gradient problem or the Exploding Gradient problem. Hence , selecting an appropriate weight initialization strategy is critical when training DL models. Following notations must be kept in mind while understanding the Weight Initialization Techniques. These notations may vary at different publications. However, the ones used here are the most common, usually found in research papers. fan_in = Number of input paths towards the neuron fan_out = Number of output paths towards the neuron Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Example:  Consider the following neuron as a part of a Deep Neural Network. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

CONT…. For the above neuron, fan_in = 3 (Number of input paths towards the neuron) fan_out = 2 (Number of output paths towards the neuron) Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Batch normalization Internal covariate shift is a major challenge encountered while training deep learning models. Batch normalization was introduced to address this issue. In this article, we are going to learn the fundamentals and need of Batch normalization. What is Batch Normalization? Batch normalization  was introduced to mitigate the internal covariate shift problem in  neural networks  by Sergey Ioffe and Christian Szegedy in 2015. The normalization process involves calculating the mean and variance of each feature in a mini-batch and then scaling and shifting the features using these statistics. This ensures that the input to each layer remains roughly in the same distribution, regardless of changes in the distribution of earlier layers’ outputs. Consequently , Batch Normalization helps in stabilizing the training process, enabling higher learning rates and faster convergence. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

Benefits of Batch Normalization Faster Convergence:  Batch Normalization reduces internal covariate shift, allowing for faster convergence during training. Higher Learning Rates:  With Batch Normalization, higher learning rates can be used without the risk of divergence. Regularization Effect:  Batch Normalization introduces a slight regularization effect that reduces the need for adding regularization techniques like dropout. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

hyperparameter optimization What are the  Hyperparameters ? Hyperparameters   are those parameters that we set for training. Hyperparameters have major impacts on accuracy and efficiency while training the model. Therefore it needed to be set accurately to get better and efficient results. Hyperparameters are pre-established parameters that are not learned during the training process. They control a machine learning model’s general behaviour, including its architecture, regularisation strengths, and learning rates . The process of determining the ideal set of hyperparameters for a machine learning model is known as hyperparameter optimization. Usually , strategies like grid search, random search, and more sophisticated ones like genetic algorithms or Bayesian optimization are used to accomplish this. Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)

THANK YOU  Shobhit Institute of Engineering and Technology (NAAC 'A' Grade Accredited Deemed to be University)
Tags