human action recognition with CNN is a thesis paper based on background reduction using maskrcnn and by using 3D cNN we can evaluate the result in two base model which is restnet50 and vgg16.

Shahin4220 63 views 42 slides May 19, 2024
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

human action recognition with CNN


Slide Content

Research Paper Presentation Based On Human Action Recognition

Advisor Dr. Md. Abu Layek Associate Professor Department of Computer Science and Engineering Jagannath University Human AcHtion Recognition with Background substraction and 3D CNN

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Literature Review Materials

Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Introduction As described by the author, The reason for the lower accuracy is that some of the background elements in these classes are the same, hence our goal is to eliminate the background elements using pre-processing techniques.

Introduction Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

How deep learning influence to detect Human Action recognition? - Feature Extraction : It automates the extraction of relevant features from raw data, which is crucial for recognizing human actions. - Neural Networks : Utilizes complex neural networks capable of processing large volumes of video data to identify intricate action patterns. - Spatial-Temporal Analysis: Employs models like CNNs and RNNs to capture spatial and temporal dependencies, thereby improving recognition accuracy. Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Introduction

Less accuracy in few classes ( Biking,Swing,Walking with Dog ) Because of same background elements Low input resolution. Problem Statement Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion 1. Clear the background noise as much as possible. 2. Develop an automatic Background remove system to fasten the process. Solution

HAR is a significant challenge for various reason Usage of cameras has expanded Identify any kind of crime or violence Motivation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Proposed Solution Data Preprocessing Data Background Noise Redution Multiple CNN Architecture Result Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Deep Learning Deep learning is a subfield of machine learning based on ANN(Artificial Neural Network). Neural Network Shallow neural network Deep neural network It consist input layer one hidden layer output layer It consist input layer More than one hidden layer output layer Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

In deep learning the hidden units in hidden layers act like biological neuron. Each hidden unit called neuron It takes inputs from input layer and then process these inputs in each hidden units to make a sense or decision and then transfer the outputs from one hidden layer to other hidden layers. Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Deep Learning

In deep learning, a convolutional neural network (CNN, or ConvNet ) is a class of deep neural networks, most commonly applied to analyze visual imagery. In CNN model , it consists three types of layer Convolutional layer Polling layer Fully Connected layer CNN (Convolutional Neural Network) Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Convolutional layer: Convolutional layers convolve the input and pass its result to the next layer. This layer extracts the feature with various kernel / filter. The objective of the Convolution Operation is to extract the high-level features such as edges from the input image. The first ConvLayer is responsible for capturing the Low-Level features such as color, gradient orientation, etc. With added layers, the architecture adapts to the High-Level features Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion CNN (Convolutional Neural Network)

Convolutional layer: Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion CNN (Convolutional Neural Network)

Pooling layer: Pooling layer is responsible for reducing the spatial size of the Convolved Feature. Decrease the computational power required to process the data through dimensionality reduction. There are two types of Pooling Max Pooling and Average Pooling Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion CNN (Convolutional Neural Network)

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Literature Review Materials

Literature Review Reference Contribution Drawback Key Contribution Performance Comparison of ResNet50V2 and VGG16 Models for Feature Extraction in Deep Learning The study aimed to compare the performance of ResNet50V2 and VGG16 for feature extraction in image classification tasks. The paper suggests that while both models are effective, VGG16 may be less efficient due to slower convergence and lower accuracy in certain tasks. ResNet50V2 outperformed VGG16, exhibiting faster convergence and achieving higher accuracy in the context of masked face recognition. Human Action Recognition from Various Data Modalities The paper reviews the use of various data modalities in HAR, including the application of ResNet and VGG16 . The review does not provide a direct comparison between the models. It highlights the importance of multimodal data for improving the accuracy of HAR systems. Introduction Literature Review CNN Architecture Materials Evaluation Conclusion Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Reference Contribution Drawback Key contribution Modern architectures convolutional neural networks in human activity recognition Discusses the role of modern CNN architectures like ResNet and VGG16 in HAR Specific drawbacks of each model in the context of HAR are not detailed. Emphasizes the advancements in CNN architectures that enhance HAR performance. Literature Review Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Directions Literature Review Materials

Here, we have used some CNN architecture. VGG-16 ResNet-50 These architectures are success in competitions - the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). CNN Architecture Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion evaluates algorithms for object detection and image classification at large scale

VGG16(Visual Geometry Group) : VGG16 is developed by oxford and win the ILSVR (ImageNet) competition in 2014. It has 16 layers. Layers Label Layers Quantity Convolutional layer 13 Fully Connected layer 3 Total 16 Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion CNN Architecture

ResNet 50: In 2015 ResNet was the winner of ImageNet challenge. In the ResNet 50 contains 50 layers. Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion CNN Architecture

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Directions Literature Review Materials

Materials Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet InceptionV3 Methodology Tools Proposed Methodology Conclusion Summary Limitations & Future Directions Literature Review Materials

Methodology Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Tools CPU 64 bit RAM 32 GB Operating System Windows 11 Programming Language Python H/W And S/W Requirements

Data Collection Data are collected from Kaggle’s data repository . This dataset is composed a set of 101 subjects. we will be using the UCF101 dataset. It has 101 classes of human action where each of the classes contains more than 100 videos on average. The frames will be extracted from our dataset, and any background elements will be removed before we begin processing the data. Furthermore, we will maintain the 224*224 resolution of the images. Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Background subtraction by MaskRCNN Extracting Frames Training the frames in ResNet CNN Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Model Working Processes

Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Background subtraction Background subtraction using MaskRCNN

Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion RestNet 50 Model ResNet Model

Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Model Development One of the first things we did after gathering the data was to extract images from each video. After that, we removed the background, taking into account only the most crucial components that were required for the detection of a certain object.

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Literature Review Materials

Evaluation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion 80% Training Testing Accuracy More Than 90% accuracy in new videos Background element was the issue Training Accuracy vs Testing Accuracy And Training Loss vs Testing Loss Of VGG16

Evaluation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Evaluation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Training Accuracy vs Testing Accuracy And Training Loss vs Testing Loss Of ResNet50

Evaluation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Evaluation Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion Used Model Accuracy Precision Recall F-1 Score ResNet 93.93% 95% 93% 94% VGG-16 51.68% 47% 56% 52%

Presentation Agenda Evaluations and results Introduction Problem Statement Motivation Proposed Solution Background Study CNN Architecture VGG16 ResNet Methodology Tools Proposed Methodology Conclusion & Possible Improvements Summary Limitations & Future Literature Review Materials

Same approach can be implemented in various video classification problem Conclusion Limitations Lack of original large dataset with variety of subjects. Study depends on only built-in CNN architectures. Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Conclusion Future Directions Custom Object Detection needed CNN+LSTM Model can be implemented further. Pose estimation values can be added in the model Introduction Literature Review CNN Architecture Materials Methodology Evaluation Conclusion

Bibliography T. Lima, B. Fernandes and P. Barros, "Human action recognition with 3D convolutional neural network," 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI),2017, pp. 1-6, doi : 10.1109/LA-CCI.2017.8285700. Saoudi , E.M., Jaafari , J. and Andaloussi , S.J., 2023. Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN. Scientific African, 21, p.e01796. de la Torre Frade , F., MARTINEZ MARROQUIN, E., SANTAMARIA PEREZ, M.E. and MORAN MORENO, J.A., 1997. Moving object detection and tracking system: a real-time implementation. LeCun, Y. and Bengio, Y., 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), p.1995. Li, Liyuan, Weimin Huang, Irene YH Gu, and Qi Tian. "Foreground object detection from videos containing complex background." In Proceedings of the eleventh ACM international conference on Multimedia, pp. 2-10. 2003. Zhou, Q., 2001. Tracking and classifying moving objects from videos. In Proc. 2nd IEEE Workshop on Performance Evaluation of Tracking and Surveillance, 2001. Pham, H.H., Khoudour , L., Crouzil , A., Zegers , P. and Velastin , S.A., 2022. Video-based human action recognition using deep learning: a review. arXiv preprint arXiv:2208.03775. Yang, C., Mei, F., Zang, T., Tu, J., Jiang, N. and Liu, L., 2023. Human Action Recognition Using Key-Frame Attention-Based LSTM Networks. Electronics, 12(12), p.2622.

THANKS!