DETECTION OF PHISHING WEBSITE FROM
URL'S BY USING CLASSIFICATION
TECHNIQUES project.pptx
PavanSomisetty1
15 views
28 slides
Jun 24, 2024
Slide 1 of 28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
About This Presentation
Project ppt
Size: 2.09 MB
Language: en
Added: Jun 24, 2024
Slides: 28 pages
Slide Content
Project Review on DETECTION OT PHISHING WEBSITES FROM URLS BY USING CLASSIFICATION TECHNIQUES Under The Esteemed Guidance of Mr .M. VENKAT RAO ( M.Tech ) Assistant Professor IT Department Department of Information Technology 1 BATCH:6 PRODUCED BY : N.VAMSI KRISHNA 18FE1A1230 P.SUMA 18FE1A1235 R.SRI PAVAN 18FE1A1239 N.SAI MAHESH 18FE1A1229
24/05/2021 2 Department of Information Technology DETECTION OF PHISHING WEBSITES FROM URL’S BY USING CLASSIFICATION TECHNIQUES TITLE CONTENTS
Department of Computer Science & Engineering 3 Title Abstract Introduction Literature Survey on Papers Methodology Department Of Information Technology CONTENTS
30/03/2021 Department of Computer Science & Engineering 4 The internet is getting stronger day by day and it makes our lives easier with many applications that are executed on cyber world. How ever, with the development of the internet, cyber attacks have increased gradually and identify thefts have emerged. It is a type of fraud committed by intruders by using fake webpages to access people’s private information such as user id, password etc. Machine learning technology has been used to detect and prevent these type of intrusions. Department Of Information Technology ABSTARCT
10/01/2022 Department of Computer Science & Engineering 5 Phishing is the most commonly used social engineering and cyber attack. Through such attacks, the phisher targets naïve online users by tricking them into revealing confidential information, with the purpose of using it fraudulently. In order to avoid getting published, users should have awareness of using phishing websites Have a blacklist of phishing websites which requires the knowledge of websites being detected as phishing. Detect them in their early appearance, using machine learning and deep neural network algorithms . Department of Information Technology INTRODUCTION
Department of Information Technology Of the above three, the machine learning based method is proven to effective than the other methods. Even then, online users are still being trapped into revealing sensitive information in phishing websites. Legitimate url’s are collected from the dataset provided by the university of new brunswick Phishing url’s are collected from opensource service called phishtank . this service provided a set of phishing url’s in multiple formats like csv,json etc.that gets updated hourly. INTRODUCTION
6/24/2022 Engineering 7 Department of Information Technology Traditional security mechanisms cannot prevent these attacks because they directly target the weakest part of the connection:end-users . The numbers of phishing websites detected in the first, second, and third quarters of 2020 were in the order of 165,772 146,994 and 571,764. Totally 884,530 unique phishing websites were detected for the first third quarter of this year. If we look for 2019 in the same order, the detected websites were 180,768 182,465 and 266,378 totally 629,611. This means an increase of approximately 40% in phishing websites in a year INTRODUCTION
24/05/2021 8 Literature Survey Paper-1 Literature Survey Paper-2 Department of Information Technology CONENTS
24/05/2021 Department of Computer Science & Engineering 9 . Feature Selections for the Machine Learning based Detection of Phishing Websites: Ebubekir Buber, and ORder Demir presented this journal and it is about feature selection from the phishing websites. In this research they used black list of websites and some of the url features as the Parameters to predict the output. Department of Information Technology LITERATURE SURVEY PAPER1
] 10 Towards detection of phishing websites on client-side using machine learning based approach: In this the using of Black-lists that which are not identifies the new phishing websites. To overcome this problem in this we are examining the various attributes of phishing and legitimate websites in depth and identified nineteen distinguished features to distinguish legitimate websites from phishing websites. Department of Computer Science & Engineering Department of Information Technology LITERATURE SURVEY PAPER2
Department of Information Technology In Recent years many phishers targets naïve online users by tricking them into revealing confidential information ,with the purpose of using it fraudulently. Existing methods consists of keeping the blacklists and some features of the URL. the features like domain type and subtype and protocol . EXISTING METHODS
6/24/2022 12 24 Our proposed system consists of detection of phishing websites from url’s by using machine learning techniques. We are detecting the websites from url’s by extracting the features like domain of url , ip address of url ,’@’ symbol in url , length of url and depth of url . We can also detect by seeing protocol , domain and subdomain name. We also using hosting time as a parameter to predict the output. Department of Information Technology PROPOSED METHODS
6/24/2022 Department of Computer Science & Engineering 13 6/24/2022 Department of Computer Science & Engineering 13 Department of Information Technology Our proposed system consists of Random Forest Algorithm Decision Tree Logistic Regression Naive Bayes k-Nearest Neighbours Support Vector Classifier Gradient Boosting Card boost XG Boost Classifier PROPOSED METHOD
6/24/2022 14 14 Department of Information Technology RANDOM FOREST: Random Forest is an ensemble learning algorithm for classification and regression. Random Forest generates a multitude of decision trees classifies based on the aggregated decision of those trees. Random Forest Classifier from sklearn . ensemble provided by scikit-learn. We experimented using 10 estimators (trees) using both presence and frequency features. presence features performed better than frequency though the improvement was not substanial METHODOLOGY
1 15 RANDOM FOREST WORKING: Step-1: Select random K data points from the training set. Step-2: Build the decision trees associated with the selected data points (Subsets). Step-3: Choose the number N for decision trees that you want to build. Step-4: Repeat Step 1 & 2. Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes. Department of Information Technology METHODOLOGY
16 6/24/2022 16 16 Department of Information Technology METHODOLOGY
6/24/2022 Department of Computer Science & Engineering 17 24/05/2021 17 Department of Information Technology Decision Trees: it is a supervised learning technique that can be used for both classification and regression problems , but mostly it is preffered to solve classification problems. In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. It is a graphical representation for getting all the possible solutions to a problem/decision based on conditions. MEHODOLOGY
6/24/2022 18 Department of Information Technology METHODOLOGY
24/05/2021 19 DECISION TREE WORKS: Department of Information Technology METHODOLOGY
24/05/2021 Department of Computer Science & Engineering 20 Department of Information Tecchnology LOGISTIC REGRESSION: Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. METHODOLOGY
Department of Computer Science & Engineering K-Nearest Neighbors : The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.
6/24/2022 Department of Computer Science & Engineering 22 For classification problems, a class label is assigned on the basis of a majority vote—i.e. the label that is most frequently represented around a given data point is used. While this is technically considered “plurality voting”, the term, “majority vote” is more commonly used in literature. The distinction between these terminologies is that “majority voting” technically requires a majority of greater than 50%, which primarily works when there are only two categories.
6/24/2022 Department of Computer Science & Engineering 23
6/24/2022 Department of Computer Science & Engineering 24 Naïve Bayes: The simplest solutions are usually the most powerful ones, and Naïve Bayes is a good example of that. Despite the advances in Machine Learning in the last years, it has proven to not only be simple but also fast, accurate, and reliable. It has been successfully used for many purposes, but it works particularly well with natural language processing (NLP) problems.
6/24/2022 Department of Computer Science & Engineering 25 Naïve Bayes Example:
6/24/2022 Department of Computer Science & Engineering 26 Random Forest: Random forest is a supervised learning algorithm . The “forest” it builds is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result. Put simply: random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. One big advantage of random forest is that it can be used for both classification and regression
6/24/2022 Department of Computer Science & Engineering 27 Random forest has nearly the same hyperparameters as a decision tree or a bagging classifier. Fortunately, there’s no need to combine a decision tree with a bagging classifier because you can easily use the classifier-class of random forest. With random forest, you can also deal with regression tasks by using the algorithm’s regressor.