1. Updated MSc Thesis Presentation.v2.pptx

JamalHussainArman 42 views 31 slides Sep 12, 2024
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

Network Intrusion Detection using ML


Slide Content

2022/9/29 M.Sc Thesis Defence Machine Learning Based Model for Network Intrusion Detection Supervised by: Engr. Dr. Fazal Muhammad Presented by: Jamal Hussain Arman

Publications Jamal Hussain Arman, Fazal Muhammad, Bilal Khan et al., “ Performance Assessment of Random Forest Based Model for Network Intrusion Detection Systems ” submitted to Intelligent Automation and Soft Computing in August-2022 (Under-review). Fazal Muhammad, Jamal Hussain Arman, Bilal Khan et al., “ Random Committee Based Model for Network Intrusion Detection Systems ” submitted to Computers, Materials and Continua in August-2022 (Under-review). Jamal Hussain Arman, Fazal Muhammad, Bilal Khan et al., “Network Intrusion Detection System using Random Forest and Random Committee Models” accepted and presented in International Conference on Internet of Thing in September-2022.

Outlines Introduction Literature Review Research Gap Proposed Solution Methodology of Study Attack Types Types of Intrusion Detection System and Approaches Machine Learning Types and Process Datasets 10-Fold Cross-validation Training Model Applied ML Algorithms Performance metrics Results Conclusion and Future work References

Introduction Demand for cyber security and protection. Popularity of the Internet of Things. Confidentiality, integrity and availability of network data [1]. Smart grids are currently using data-driven technology [2]. Firewalls and encryption. For Internet-based cyber-attacks an intrusion detection system (IDS) is better. Global ransom ware damage costs would go beyond $20 billion by 2021 [3]. Global cybercrime operating costs are expected to reach $10.5 trillion annually by 2025 [4].

Literature Review (1/2) Year Title Best Result Compared with 2021 Machine learning and deep learning methods for intrusion detection systems: recent developments and challenges The profound DA gets a higher precision of 98.39% compared to other states 2021 Machine learning methods for cyber security intrusion detection: Datasets and comparative study DT classifier gives the best result with 99% for NSL-KDD data set For the CSE-CIC-IDS 2018 data set, the best SVM result is the SVM Quadratic algorithm with 99.81 % 2021 Analysis, Design, and Comparison of Machine-Learning Techniques for Networking Intrusion Detection KNN 0.9957 and ANN 0.9923 2021 Detection of Cyber bullying using Machine Learning SVM with N-grams is the best way to go 0.6572 Accuracy NB: 0.5962 2020 IntruDTree: A Machine Learning Based Cyber Security Intrusion Detection Model IntruDTree Model 0.982 Accuracy NB: 0.9, LR: 0.94, SVM:0.96, KNN:0.94 2020 Cyber Intrusion Detection Using Machine Learning Classification Techniques RT: 94% BN:90% and DT:93 % 2020 Hierarchical Intrusion Detection Using Machine Learning and Knowledge Model Knowledge based IDS :0.99 C4.5: 0.969, RF: 0.964, Forest PA: 0.975, Ensemble Model: 0.976 2019 Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection RT: 99.1 NB: 83.63, KNN:95.13, SVM: 98.23

Literature Review (2/2) Ref No. Techniques Dataset Accuracy Other [5] Decision Tree and k-Nearest Neighbor CICIDS2017 99.49% and 99.52% [6] Decision Tree, Support Vector Machine NSL-KDD, CSE-CIC-IDS and ISCX 99.92% 99.81% & 99.8% [7] KNN and ANN KDD-99 0.9959 and 0.9926 [8] Decision Tree Decision Stump KDD99 96.218, 93.9811 95.5413, 92.0629 SMO, HT, and RF [9] Random Tree NSL-KDD 94% BN, NB and DT [10] Random Tree NSL-KDD 99.81% NB, KNN SVM 98.23 [11] ANN and SVM NSL-KDD 94.04 and 82.34

Research Gaps Traditional methods like firewall and encryption Usage of old datasets Only one evaluation metric i.e. accuracy Problems arising in traditional techniques 4.1 Low accuracy, high error, high false positive rate or low precision 5. F eature reduction

Proposed Solutions Advancement of modern methods like M achine L earning Usage of up-to-date datasets Considered multiple evaluation metrics To design ML-based model for intrusion detection . Optimal model for intrusion detection is most important without changing networking data.

Methodology (Flowchart of the study)

Attack Types Probe Scanning of system Low level attack Denial of Service Prevent the authorized users to get access Continuously engages the system Remote to User Abuse the privileges of a system Release vulnerabilities within network User to Root Attacker or a genuine person with minimal or normal privileges Attacker looks for system flaws

Types of Intrusion Detection System Intrusion detection system (IDS) Analyze and monitor the traffic Malicious activity Protect the computer network Host Based IDS HIDS relies on a single system Keep an eye on a host's internal environment Resources, file system, and programs Network Based IDS Composed of the networks Inbound and outbound traffic patterns [12]

Types of Intrusion Detection Approaches Misuse/Signature Based Attacks have signatures File fingerprints Comparing the signatures of every activity Anomaly Based Looks for unexpected activity Differs from the normal operational baseline Likelihood of detecting novel (zero-day) threats Preferable Hybrid Based Detection rate of zero-day assaults rises Combination of the signature and anomaly approaches Generally deployed as a hybrid arrangement

Machine Learning Types Figure 2. Machine Learning Types

Machine Learning Process Figure 3. Machine Learning Process

Datasets NSL-KDD Binary class dataset either normal or anomaly Reference for NIDS performance evaluation Updated version of KDD Cup 99 Absence of redundant records and duplicate records UNSW-NB Binary class dataset Contain different types of new attacks Number of real-time normal activities Australian Centre for Cyber Security website Kaggle Multiclass dataset 5 categories: normal, denial of service, r21, probe and u2r [13]

Table 1: Distribution of Statistics of Datasets into Training and Testing sets Kaggle NSL-KDD UNSW-NB15 (Multi-Class Dataset) (Binary Class Dataset) (Binary Class Dataset) Name Training set Testing set Training set Testing set Training set Testing set Number of samples 113376 12597 113376 12597 74099 8233

10-Fold Cross-Validation Training Model

Applied ML Algorithms Random Forest Supervised Learning technique Regression and classification problems Combines a number of decision trees Lower the chance of over fitting No need feature scaling Effective on big databases. It is an ensemble classification technique Based on the DT algorithm and provides individual trees as output. This algorithm combines random feature selection with the bagging concept to generate a set of DTs having controlled variances [14]

Random Forest Architecture

Applied ML Techniques for Comparative Analysis AIDE Naive Bayes KNN AdaBoost Random Tree Decision Stump Hoeffding Tree

Performance Metrics The performance evaluation is done in term of some metrics which asses the efficiency of a model using helpful representation which is commonly known as a confusion matrix. The confusion matrix defines the performance of classification models [15] True positive (TP): is when the model Predicts P as positive True negative (TN): is when the model predicts N as negative False positive (FP): is when the model predicts N as positive False negative (FN): is when the model predicts P as negative Representation of classification model results in the confusion matrix

Performance Metrics and Mathematical Forms

RF Technique Results NSL-KDD Kaggle UNSW NB Performance Metrics Accuracy 99.9174 99.8857 99.9053 Precision 0.999 0.999 0.999 TPR 0.999 0.999 0.999 FPR 0.001 0.001 0.001 Error Rate 0.0028 0.0014 0.0108 MCC 0.998 0.998 0.998 ROC Area 1 1 1

NSL-KDD Dataset Results Accuracy TPR FPR Precision Error Rate MCC ROC Area Random Forest 99.9174 0.999 0.001 0.999 0.0028 0.998 1 A1DE 99.7952 0.998 0.002 0.998 0.0022 0.996 1 Naïve Bayes 90.3813 0.904 0.101 0.905 0.0965 0.807 0.966 IBK/KNN 90.3813 0.904 0.101 0.905 0.0965 0.807 0.966 AdaBoostM1 94.5044 0.945 0.057 0.945 0.079 0.89 0.988 Random Tree 99.7658 0.998 0.002 0.998 0.0023 0.995 0.998 Decision Stump 92.215 0.922 0.079 0.922 0.1436 0.844 0.92 Hoeffding Tree 98.849 0.988 0.012 0.989 0.0161 0.977 0.995

Kaggle Dataset Results Accuracy TPR FPR Precision Error Rate MCC ROC Area Random Forest 99.8857 0.999 0.001 0.999 0.0014 0.998 1 A1DE 99.792 0.998 0.001 0.998 0.0009 0.997 1 Naïve Bayes 83.3996 0.834 0.046 0.91 0.0665 0.786 0.966 IBK/KNN 99.665 0.997 0.002 0.997 0.0014 0.994 0.997 AdaBoostM1 83.1519 0.832 0.12 0.984 0.153 0.9677 0.952 Random Tree 99.7293 0.997 0.002 0.997 0.0011 0.996 0.998 Decision Stump 83.1519 0.832 0.12 0.984 0.1104 0.9677 0.882 Hoeffding Tree 97.2573 0.973 0.018 0.971 0.0152 0.954 0.989

UNSW-NB Dataset Results Accuracy TPR FPR Precision Error Rate MCC ROC Area Random Forest 99.9053 0.999 0.001 0.999 0.0108 0.998 1 A1DE 99.4243 0.994 0.005 0.994 0.007 0.988 1 Naïve Bayes 76.8243 0.768 0.205 0.802 0.2335 0.572 0.864 IBK/KNN 98.721 0.987 0.013 0.987 0.0128 0.974 0.987 AdaBoostM1 99.3575 0.994 0.008 0.994 0.0373 0.987 0.999 Random Tree 99.2943 0.993 0.007 0.993 0.007 0.986 0.993 Decision Stump 76.6324 0.766 0.286 0.835 0.3287 0.579 0.738 Hoeffding Tree 96.6028 0.966 0.038 0.966 0.0529 0.932 0.981

Conclusion Random Forest best performance 10 Fold Cross Validation 3 Datasets: NSL-KDD, UNSW NB15 and Kaggle RT, A1DE, NB, KNN, AdaBoostM1, DS and HT Futuristic Direction Deep learning models Newer and real-time datasets

References K. NandhaKumar and S. Sukumaran , “A hybrid adaptive development algorithm and machine learning based method for intrusion detection and prevention system,” Turkish J. Comput . Math. Educ. , vol. 12, no. 5, pp. 1226–1236, 2021. S. N. Mohan, G. Ravikumar and M. Govindarasu , "Distributed Intrusion Detection System using Semantic-based Rules for SCADA in Smart Grid," 2020 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), 2020, pp. 1-5. “Global ransomware damage costs to exceed $265 billion by 2031 - EIN presswire .” https://www.einnews.com/pr_news/542950077/global-ransomware-damage-costs-to-exceed-265-billion-by-2031 (accessed Jun. 03, 2022). “Cybercrime to cost the world $10.5 trillion annually by 2025.” https://cybersecurityventures.com/cybercrime-damages-6-trillion-by-2021/ (accessed Jan. 17, 2022 ). M. Sarnovsky and J. Paralic , “Hierarchical intrusion detection using machine learning and knowledge model,” Symmetry (Basel). , vol. 12, no. 2, pp. 1–14, 2020. M. Shahzad Haroon and H. Mansoor Ali, “Adversarial training against adversarial attacks for machine learning-based intrusion detection systems,” Comput . Mater. Contin . , vol. 73, no. 2, pp. 3513–3527, 2022. S. A. Hussein, A. A. Mahmood and E. O. Oraby , “Network intrusion detection system using ensemble learning approaches,” Webology , vol. 18, no. Special Issue, pp. 962–974, 2021.

References S. Razdan , H. Gupta and A. Seth, “Performance analysis of network intrusion detection systems using j48 and naive bayes algorithms,” 2021 6th Int. Conf. Converg . Technol. I2CT 2021 , pp. 1–7, 2021. Z. Ahmad, A. S. Khan, C. W. Shiang , J. Abdullah and F. Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,” Trans. Emerg . Telecommun . Technol. , vol. 32, no. 1, pp. 1–29, 2021. M. Data and M. Aritsugi , “T-DFNN: An incremental learning algorithm for intrusion detection systems,” IEEE Access , vol. 9, pp. 154156–154171, 2021. R. Panigrahi , S. Borah, A. K. Bhoi , M. F. Ijaz , M. Pramanik et al. , “A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets,” Mathematics , vol. 9, no. 7, 2021. D. Chou and M. Jiang, “A survey on data-driven network intrusion detection,” ACM Comput . Surv . , vol. 54, no. 9, pp. 1–36, 2022. S. Lee, A. Abdullah, N. Jhanjhi and S. Kok , “Classification of botnet attacks in IoT smart factory using honeypot combined with machine learning,” PeerJ Comput . Sci. , vol. 7, pp. 1–23, 2021. Z. K. Maseer , R. Yusof , N. Bahaman, S. A. Mostafa , and C. F. M. Foozy , “Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset,” IEEE Access , vol. 9, pp. 22351–22370, 2021. P. Dini and S. Saponara , “Analysis, design, and comparison of machine-learning techniques for networking intrusion detection,” Designs , vol. 5, no. 1, pp. 1–22, 2021.

Thank You