UNIT-V.pptx for Deep Learning very applications

ImranS18 0 views 9 slides Oct 18, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

Unit 5 Applications of DL


Slide Content

UNIT-V DEEP LEARNING APPLICATIONS UNIT - V

ImageNet ImageNet is a huge collection of images, grouped based on the WordNet dictionary of nouns (like “cat”, “car”, “tree”). Each group (or category) has hundreds or thousands of related pictures. This database has been very important for research in computer vision and deep learning. Researchers can use this data for free, as long as it’s not for business purposes. Since 2010, ImageNet has hosted a yearly competition called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). In this challenge, computer programs try to correctly recognize and name objects and scenes from images. The competition uses a selected list of 1,000 categories that do not overlap. On September 30, 2012, a deep learning model called AlexNet achieved a top-5 error rate of 15.3%, which was more than 10.8% better than the second-best model. This big success was possible because AlexNet used GPUs (graphics cards) for faster and better training—something that helped spark the deep learning revolution. Later in 2015, a model created by Microsoft, which had over 100 layers, did even better and won the 2015 ImageNet challenge, beating AlexNet.

AUDIOWAVENET WaveNet was developed by the firm DeepMind and presented in the 2016 paper Wavenet : A Generative Model for Raw Audio¹. It explains how the model can be used for generating audio such as realistic human speech. It is a feed-forward neural-net, meaning the information only moves forward through the network, and does not loop as with RNNs (recurrent neural nets). Dilated convolutional layers are used to capture the dynamic response of guitar amplifiers. The dilated layers serve to increase the receptive field of the network, allowing it to reach farther back in time to predict the current audio sample. The more the network knows about the signal in the past, the better it can predict the value of the next sample. Audio data is one-dimensional. It is a numerical value (amplitude) that varies over time.

NLP – Natural Language Processing Convolutional Neural Network (CNN): The idea of using a CNN to classify text was first presented in the paper “Convolutional Neural Networks for Sentence Classification” by Yoon Kim. The central intuition is to see a document as an image. However, instead of pixels, the input is sentences or documents represented as a matrix of words.

RNNs have also been used to generate mathematical proofs and translate human thoughts into words.

Encoder-decoder sequence-to-sequence : The encoder-decoder seq2seq architecture is an adaptation to autoencoders specialized for translation, summarization, and similar tasks. The encoder encapsulates the information in a text into an encoded vector. Unlike an autoencoder, instead of reconstructing the input from the encoded vector, the decoder’s task is to generate a different desired output, like a translation or summary.

BIOINFORMATICS-FACERECOGNITION Face recognition (FR) has been the prominent biometric technique for identity authentication and has been widely used in many areas, such as military, finance, public security and daily life. Deep learning methods, such as convolutional neural networks, use a cascade of multiple layers of processing units for feature extraction and transformation. They learn multiple levels of representations that correspond to different levels of abstraction.

1. Face Detection. Locate one or more faces in the image and mark with a bounding box. 2. Face Alignment. Normalize the face to be consistent with the database, such as geometry and photometrics. 3. Feature Extraction. Extract features from the face that can be used for the recognition task. 4. Face Recognition. Perform matching of the face against one or more known faces in a prepared database. A number of deep learning methods have been developed and demonstrated for face detection. Perhaps one of the more popular approaches is called the “Multi-Task Cascaded Convolutional Neural Network,” or MTCNN for short, described by Kaipeng Zhang, et al. in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. The proposed CNNs consist of three stages. In the first stage, it produces candidate windows quickly through a shallow CNN. Then, it refines the windows to reject a large number of non-faces windows through a more complex CNN. Finally, it uses a more powerful CNN to refine the result and output facial landmarks positions