Detecting Fake Accounts on Instagram.pptx

konteu 7 views 9 slides Oct 18, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

Detecting Fake Accounts on Instagram
This presentation outlines a machine learning project aimed at identifying fake and spam accounts on Instagram. We'll cover the methodology from data loading and exploration to model building, evaluation, and hyperparameter tuning.


Slide Content

Detecting Fake Accounts on Instagram This presentation outlines a machine learning project aimed at identifying fake and spam accounts on Instagram. We'll cover the methodology from data loading and exploration to model building, evaluation, and hyperparameter tuning.

Project Overview & Objectives Our goal is to build a robust classification model that can accurately distinguish between genuine and fake/spam Instagram accounts. This project leverages various data science techniques to analyze account features and predict their authenticity. 1 Import Libraries Essential Python libraries for data manipulation, visualization, and machine learning. 2 Load Dataset Loading training and testing data from specified CSV files. 3 Preview Data Initial inspection of the dataset structure and content.

Data Loading and Initial Preview We begin by importing necessary libraries such as pandas, numpy, matplotlib, and seaborn for data handling and visualization. Scikit-learn modules are also imported for model selection, preprocessing, and evaluation . The datasets, df_train and df_test , are loaded from Google Drive. A quick preview of the head of both dataframes provides an initial understanding of the features available, including 'profile pic', 'nums/length username', 'fullname words', and 'fake' (the target variable).

Exploratory Data Analysis (EDA) EDA is crucial for understanding the dataset's characteristics. We checked for missing values, confirming no null entries. Descriptive statistics provided insights into the distribution and range of each feature. The distribution of the target variable 'fake' shows a balanced dataset, with an equal number of fake and genuine accounts, which is ideal for training A correlation heatmap reveals relationships between features, guiding subsequent steps. Distribution of Fake vs Genuine Accounts. Correlation Heatmap of features.

Feature Engineering & Model Building Feature engineering is a critical step where numerical features are scaled using StandardScaler to normalize their range, ensuring optimal performance for various machine learning algorithms. The 'fake' column, which serves as our target, is then re-attached to this scaled data. For model building, the prepared dataset is strategically split into training and testing sets, utilizing a 70/30 ratio. A RandomForestClassifier is selected for its robust capabilities and effectiveness across diverse data types. This powerful model is subsequently trained on the scaled training data. Data Scaling Standardizing features for optimal model performance. Data Splitting Dividing data into training and testing sets. Random Forest Training a robust classification model.

Feature Importance Understanding which features contribute most to the model's predictions is crucial. The Random Forest model provides feature importances, highlighting the most influential attributes in distinguishing between fake and genuine accounts. This plot helps in interpreting the model and potentially in future feature selection for optimization. Features like 'profile pic' and '#followers' often show high importance in such classification tasks.

Model Evaluation The model's performance is evaluated using several metrics: accuracy, classification report (precision, recall, f1-score), and confusion matrix. The initial model achieved an accuracy of 0.93, indicating strong performance. The confusion matrix visually represents the true positives, true negatives, false positives, and false negatives, providing a clear picture of where the model excels and where it might make errors. Accuracy: 0.9306 Classification Report: precision recall f-1 score support 0 0.92 0.96 0.94 93 1 0.95 0.90 0.92 80 accuracy 0.93 173 macro avg 0.93 0.93 0.93 173 weighted avg 0.93 0.93 0.93 173 Confusion Matrix for model evaluation. precision recall f1-score support 0 0.92 0.96 0.94 93 1 0.95 0.90 0.92 80 accuracy 0.93 173 macro avg 0.93 0.93 0.93 173 weighted avg 0.93 0.93 0.93 173

Hyperparameter Tuning To further optimize the model, GridSearchCV was employed for hyperparameter tuning. This process systematically searches for the best combination of parameters for the Random Forest classifier, aiming to improve accuracy and generalization. The best parameters found were: bootstrap: True, max_depth: None, min_samples_leaf: 4, min_samples_split: 10, n_estimators: 100 . With these optimized parameters, the model achieved an accuracy of 0.9133 .

Conclusion & Future Outlook We successfully developed and optimized a Random Forest model for detecting fake Instagram accounts. Achieving strong accuracy through rigorous evaluation and hyperparameter tuning, this project demonstrated effective malicious account identification. Refine Feature Set Integrate advanced behavioral and network-based features for enhanced detection. Explore Deployment Implement the model for real-time monitoring and automated account flagging. Evaluate New Models Investigate deep learning or anomaly detection algorithms for further improvements.
Tags