Hybrid Algorithms for Summarization of Video Surveillance Systems 6_3_2023.pptx
SAJJAD253504
7 views
35 slides
Mar 08, 2025
Slide 1 of 35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
About This Presentation
AI project
Size: 236.45 MB
Language: en
Added: Mar 08, 2025
Slides: 35 pages
Slide Content
Hybrid Algorithms for Summarization of Video Surveillance Systems Supervisor Prof. Dr. Hazeem B. Taher Assist. Prof. Dr. Karim Q. Hussein Student: Sajjad Haitham hendi
Introduction Video summarization (VS) refers to the process of reducing a long video to a shorter version while retaining its key content and overall meaning. 5/27/2024 PRESENTATION TITLE 2
Video Summarization 5/27/2024 PRESENTATION TITLE 3
Video Summarization 5/27/2024 PRESENTATION TITLE 4 Outdoor Environments Indoor Environments Environments
Problem Statement 5/27/2024 PRESENTATION TITLE 5 Crucial Role of Surveillance Systems - Essential for security in various environments: commercial buildings, residential complexes, public spaces, and outdoor facilities. Challenge of Video Data Volume - Surveillance systems generate large volumes of video data. - Difficult for security personnel to review and analyze extensive footage effectively. Manual Monitoring Limitations - Manually monitoring long video streams is time-consuming and impractical. - Potential for delayed detection and response to security incidents. Core Issue: Lengthy Surveillance Videos -Difficulty in efficiently condensing lengthy surveillance video data. Objective of Video Summarization - Create shorter video descriptions. - Preserve significant events and activities from the original lengthy footage.
Aims 5/27/2024 PRESENTATION TITLE 6 Scholarly Contribution towards Surveillance Systems Advancement Objective : Develop an efficient video summarization approach. Focus : Autonomously extract significant events and activities. Scope : Applicable to both indoor and outdoor surveillance videos. Outcome : Produce succinct and informative summaries of surveillance footage. Efficient Video Content Understanding for Security Personnel Goal : Enable security personnel to identify potential threats and respond effectively. Method : Minimize the need for costly manual monitoring. Impact : Create safer and more secure environments for individuals and communities.
Scope and Limitation 7 Surveillance Videos Semantic-Based Activity/events Visual Content-Based Summarization Dynamic summary Single-view video summarization
Contributions 5/27/2024 PRESENTATION TITLE 8 1- Proposes a novel approach for extracting facial features in real-time to enable effective identification and summarization of surveillance videos within indoor environments. 2- Proposes a new hybrid method that integrates two different deep learning algorithms to detect and classify significant events in outdoor surveillance videos. 3- Obtaining the best results at the time of writing this dissertation in the field of summarizing surveillance videos with the use of the VIRAT dataset, and the best face recognition results with the three Faces datasets.
Proposed Models 9 Model 1 : Outdoor Model 2 : Indoor Detecting abnormal events outside buildings and producing a summary video containing the detected events Detect unauthorized persons to enter and roam the building and produce a summary video of their movements inside the building
Datasets 5/27/2024 PRESENTATION TITLE 10 1- Outdoor Model VIRAT Video Dataset Release 2.0_VIRAT Ground Dataset Raw Videos Event Shots
Datasets 5/27/2024 PRESENTATION TITLE 11 2- Indoor Model MUCT ORL Yale Case study Videos Three facial Datasets
Proposal Outdoor Model
Outdoor-Preprocessing 5/27/2024 PRESENTATION TITLE 13 Convert to Gray Histogram Equalization Resize (224,224) Output video Input video
Outdoor-Features Extraction 5/27/2024 14 Output CNN Features for single frame Input Video Conv1_1 224*224*32 Conv1_2 224*224*32 Max - Pooling 112*112*32 Conv5_1 14*14*512 Conv5_2 14*14*512 Max - Pooling 7*7*512 Conv5_3 14*14*512 Conv6_1 7*7*1024 Conv6_2 7*7*1024 Max - Pooling 3*3*1024 Conv6_3 7*7*1024 Conv2_1 112*112*64 Conv2_2 112*112*64 Max - Pooling 56*56*64 Conv3_1 56*56*128 Conv3_2 56*56*128 Max - Pooling 28*28*128 Conv4_1 28*28*256 Conv4_2 28*28*256 Max - Pooling 14*14*256 Flatten 1*9216 224*224*1 Features Extraction(Proposed CNN )
Outdoor- Training and Testing phase 5/27/2024 PRESENTATION TITLE 15 CNN Features Training Data GRU 32 unit GRU 16 unit Dropout 0.4 Dense ReLU 8 unit Dense Softmax 11 unit Test Event Class Predicate Class Testing Data Events Classifier Flatten 1*9216
Outdoor-Validation 5/27/2024 16 Convert to Gray CLAHE Resize(640,480) Input video Resize (224,224) Movement Extraction ( KNN ) Cars & Persons Extraction (Yolo7) Features Extraction Proposed CNN GRU Predication Events Classifier ROIs Vectors of CNN Features Detected Events
5/27/2024 PRESENTATION TITLE 17 Proposal Indoor Model
Indoor-Preprocessing 5/27/2024 PRESENTATION TITLE 18 Convert to Gray CLAHE Faces Cropping Resize (100,100) Output Face Image Input images
Indoor -Features Extraction 5/27/2024 PRESENTATION TITLE 19 Uniform LPB, GLCM(1,2,3,4) Uniform LPB, GLCM(1,2,3,4) Uniform LPB, GLCM(1,2,3,4) Uniform LPB, GLCM(1,2,3,4) 10000 LPB Features 16 GLCM Features 4 Contrast 4 Correlation 4 Homogeneity 4 Energy 10000 LPB Features 16 GLCM Features 4 Contrast 4 Correlation 4 Homogeneity 4 Energy 10000 LPB Features 16 GLCM Features 4 Contrast 4 Correlation 4 Homogeneity 4 Energy 10000 LPB Features 16 GLCM Features 4 Contrast 4 Correlation 4 Homogeneity 4 Energy
Indoor -Features Reduction (1D-CNN) 5/27/2024 PRESENTATION TITLE 20 10000 LPB Features 10000 LPB Features 10000 LPB Features 10000 LPB Features } Conv1D (32,3) MaxPooling1D Conv1D (32,3) MaxPooling1D Flatten Dense (128 Units) Dense (275 Unit) (10000, 1) (4999, 32) Input Shape (9998, 32) Output Shape (9998, 32) (4999, 32) (4997, 64) (4997, 64) (2498, 64) (159872,1) (159872,1) (128,1) ( 275 ,1) (128,1) 275 LPB Features 275 LPB Features 275 LPB Features 275 LPB Features 16 GLCM Features 16 GLCM Features 16 GLCM Features 16 GLCM Features 1164 Features
Indoor – Classification (ANN) 5/27/2024 PRESENTATION TITLE 21 291 Features Number of class 276 MUCT 40 ORL 10 Yale Epoch 100 Batch size 64 Learning rate 0.001 In put shape of a single image with a single angle
Indoor-Validation 5/27/2024 22 Convert to Gray CLAHE Resize(640,480) Input video Face detection (CNN) Faces Cropping Resize (100,100) Movement Extraction ( KNN ) Detected Faces Person Extraction (Yolo7) Features Extraction URIGLCM-LBP 1D-CNN Features Reduction Predication ANN Faces Classifier
Outdoor-Results 5/27/2024 PRESENTATION TITLE 23 Data Split : 70% training and 30% testing. Training : 100 epochs, batch size of 64 , Adam optimizer, learning rate of 0.001. Loss Function : Categorical cross-entropy loss for weight adjustment. Performance Evaluation: Accuracy and runtime metrics, comparison with contemporary methods. Distribution of samples of VIRAT dataset
Outdoor-Results 5/27/2024 PRESENTATION TITLE 25 Metric Classification Score Test Accuracy 79.0178% Test Precision 77.2348% Test Recall 79.0178% Test F1 Score 76.7138%
Outdoor-Comparisons to other models 5/27/2024 PRESENTATION TITLE 26 Method Accuracy InceptionV3 53.19% CMRA[152] 62.9% DHCM[151] 66.8 % ResNet50 71.94% ResNet50[102] 75.22% Proposed Model 79.01%
Indoor-Results Training Results 5/27/2024 PRESENTATION TITLE 27 Neural network classifier Dataset Test size Accuracy ORL 40% 100% MUCT 30% 99.98% Extended Yale B 20% 99.17% Data Split : 70% training and 30% testing. Training : 100 epochs, batch size of 64 , Adam optimizer, learning rate of 0.001. Loss Function : Categorical cross-entropy loss for weight adjustment. Performance Evaluation: Accuracy and runtime metrics, comparison with contemporary methods.
Indoor-Results 5/27/2024 PRESENTATION TITLE 28 loss curves accuracy curves MUCT dataset
indoor-Comparisons to other models 5/27/2024 PRESENTATION TITLE 29 Dataset Method Accuracy ORL Improved BP NW [153] 97.40% HISTO+CNN [154] 98.25% Proposed Model 100% MUCT LBP 96.22% HOG+GIST+CCA+FS [37] 98.99% BBO with (PCA, SVM) [155] 99% Proposed Model 99.98% Extended Yale B LRR-BSS [156] 95.50% DCM [157] 97.79% Proposed Model 99.17%
Indoor-Validation Results 5/27/2024 PRESENTATION TITLE 30 Video ID Unauthorized Frame Detection frame False Negative False Positive True Positive Video 1 22 25 1 1 21 Video 2 17 18 1 2 16 Video 3 28 29 1 28 Video 4 Total 67 72 2 6 65 Accuracy=97.01 Precision=91.54 Recall=97.01 F-score=94.19
Conclusions - Outdoor A novel hybrid approach is presented for identifying and categorizing significant events in videos. The model targets outdoor video summarization. The proposed model utilizes: ( CNN ) for visual feature extraction. ( GRU ) for temporal modeling and event recognition. Evaluation was conducted using the VIRAT dataset. 5/27/2024 PRESENTATION TITLE 31
Conclusions - Outdoor Empirical results showed satisfactory performance: Achieved 79% accuracy in event categorization. Demonstrated the model's effectiveness in detecting noteworthy events in outdoor environments. The model proved capable of operating in real-world scenarios and producing summaries of significant events in surveillance videos. 5/27/2024 PRESENTATION TITLE 32
Conclusions - indoor An innovative facial recognition model is introduced, combining feature extraction methods URIGLC-LBP and 1D-CNN with an ANN for classification. The model achieved exceptional accuracy of 99.98%, demonstrating strong class differentiation, precision, and recall. Comparative analysis showed the proposed model's superior performance in face recognition, outperforming all other models. 5/27/2024 PRESENTATION TITLE 33
Published Manuscripts N Title publisher 1 Outdoor Surveillance videos Summarization Model Utilizing CNN-GRU Scopus 2 Digital Video Summarization: A Survey Scopus 3 Advanced Facial Recognition with LBP-URIGL Hybrid Descriptors Scopus 4 Automated Video Events Detection and Classification using CNN-GRU Model Iraq 5/27/2024 PRESENTATION TITLE 34