objective The objective of the Phishing Website Detection project is to design, develop, and implement an advanced system that effectively identifies phishing websites. The main goal is to create a smart system that can quickly and accurately identify fake websites trying to steal personal information.
APPROACH Gather datasets from open source platforms containing both phishing and legitimate websites. Develop a code to extract essential features from the URL database. Utilize exploratory data analysis (EDA) techniques to analyse and preprocess the dataset. Split the dataset into training and testing sets for model evaluation. Implement various machine learning and deep neural network algorithms such as SVM, Random Forest, and Autoencoder, and evaluate their performance using accuracy metrics .
Feature Selection Following category of features are selected: Address Bar based Features Domain based Features HTML & JavaScript based Features Address Bar features are such as : Domain of URL IP Address in URL Length of URL Depth of URL
Domain Based Features are: DNS Record Website Traffic Age of Domain End Period of Domain HTML and JavaScript based Features are: Iframe Redirection Disabling Right Click Forwarding
MACHINE LEARNING MODELS Th is is a supervised machine learning task. There are two major types of supervised machine learning problems, called classification and regression. This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The machine learning models (classification) considered to train the dataset in this notebook are : Decision Tree Random Forest Multilayer Perceptrons XGBoost Autoencoder Neural Network Support Vector Machines
MODEL Evaluation Below Figure shows the train and test accuracy of the ML Models From the above figure we can see that XGBoost gives best Train and Test accuracy.