Economic impact of phishing detection systems.pptx
EzehSamuelElochukwu
23 views
12 slides
Aug 10, 2024
Slide 1 of 12
1
2
3
4
5
6
7
8
9
10
11
12
About This Presentation
Economic importance of phishing detection systems in the 21st century
Size: 18.68 MB
Language: en
Added: Aug 10, 2024
Slides: 12 pages
Slide Content
01 ALEX EKWUEME FEDERAL UNIVERSITY NDUFU-ALIKE Department of Computer Science / Informatics A PROJECT ON D E S I G N A N D I M P L E M E N T A T I O N O F A U R L P H I S H I N G D E T E C T I O N S Y S T E M U S I N G H Y B R I D M A C H I N E L E A R N I N G A L G O R I T H M By : E Z E H E L O C H U KWU SAMUEL 25 th JULY, 2024
2 CONTENTS Introduction /Background Statement of the problem Aim and Objectives Literature Gaps Methodology Expected Results and Justification Conclusion
3 INTRODUCTION Phishing is a type of social engineering attack often used to steal user data, including login credentials and credit card numbers. Phishing attacks are a major threat in today’s digital world. Phishers employ deceptive tactics to trick users into revealing sensitive information or clicking malicious links. These attacks are becoming increasingly sophisticated, making it difficult for traditional methods to detect them.
04 Phishing attacks are constantly evolving, making them harder to detect. Use of social engineering techniques to target specific individuals and exploit vulnerabilities. Cloning of legitimate websites with high attention to details. INTRODUCTION (Cont.) Increasing Sophistication : Impact of Phishing: Financial losses for individuals and organizations Identity theft and data breaches Loss of trust in online transactions
PROBLEM STATEMENT 5 T he limitations of existing phishing detection methods. Existing phishing detection methods often rely on signature-based approaches or simple heuristics. Traditional methods such as blacklisting and whitelisting lack the ability to adapt to new phishing techniques as they emerge. * Users do not have a deep understanding of URLs Syntax/Structure. * Users do not have much time to look up a URL or unconsciously visit certain web pages. * Users are unable to differentiate between legal and phishing websites.
6 AIM & OBJECTIVES The Aim of this research is to develop a hybrid feature based phishing detection approach that effectively identifies phishing websites while the objectives are: Gather a comprehensive dataset of labelled URLs from kaggle dataset repository, containing both phishing and legitimate websites. Extraction of relevant features from the URLs that are indicative of phishing attempts. Implement a hybrid machine learning architecture that combines linear and non-linear ML models such as Random Forest, Gradient Boost and Logistic regression for enhanced detection capabilities. Train and optimize the chosen hybrid model on the preprocessed dataset. To test and evaluate the hybrid machine learning model based phishing detection system .
7 LITERATURE GAPS One of the gap identified in previous works is; Robustness, remains underexplored in the current literature. To assess the real-world applicability of a detection algorithm, it is crucial to evaluate the resilience of hybrid models against evasion techniques, adversarial attacks, and unforeseen challenges. While hybrid approaches are potentially more robust than single analysis-based approaches, few studies has researched this to date. Several works have been done adopting the hybrid approach for example, WebPhish presented by Opara et al. (2023), which implements a deep neural network trained using embedded raw URLs and HTML to detect phishing attacks. WebPhish showed an accuracy of 98.1%. However, WebPhish can only detect zero-day phishing attacks containing known HTML and URL content. If the attack involves manipulation of the webpage content, WebPhish cannot recognize the attack as this approach is strictly dependent on the training set.
8 METHODOLOGY The overall methodology adopted for developing a hybrid feature-based URL phishing detection system using hybrid machine learning algorithms is CRISP-DM and OBJECT ORIENTED PROGRAMMING . The Kaggle dataset for phishing websites is utilized in this work. The dataset includes 11430 URLS with 87 extracted features . The hybrid machine learning model architecture used in the development of this system are; random forest, gradient boosting , and logistic regression ). .
09 RESULTS The performance metrics of the hybrid model are; (accuracy = 84% , precision = 85% , recall = 86% , F1-score = 84% , ROC AUC = 91%). Strengths and Weaknesses of t he Model Based on The Result Metrics . Strengths : High overall Performance: The system achieves an accuracy of 84%, indicating a good ability to correctly classify URLs. Balanced Detection: precision (86%) and recall (85%) are close, suggesting t he system effectively minimizes both false positives and f alse negatives. Exceptional ROC AUC (0.91): This metric indicates a high ability to distinguish between phishing and legitimate URLs.
1 1 RESULTS (cont . ) Weakness: Potential for False Positives and Negatives: Although precision and recall are high, there’s still a chance of the system misclassifying a small percentage of URLs.
CONCLUSION The proposed method combined three machine learning classifiers to achieve this and the three algorithms used are Random Forest, Gradient Boosting, and Logistic Regression. The evaluation of the model indicates a well-functioning model with promising capabilities. The high AUC and balanced metrics demonstrate its effectiveness in differentiating phishing attempts from legitimate URLs while minimizing false positives. 1 2