FAKE SOCIAL MEDIA ACCOUNT DETECTION DOCUMENTATION[6][1] (1).docx

spub1985 12 views 59 slides May 19, 2025
Slide 1
Slide 1 of 59
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59

About This Presentation

FAKE SOCIAL MEDIA ACCOUNT DETECTION DOCUMENTATION[6][1] (1).docx


Slide Content

A MAJOR PROJECT PHASE -II REPORT
ON
FAKE SOCIAL MEDIA ACCOUNT DETECTION
This project report submitted in partial fulfillment of the requirement for the
award of degree of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
MUPPANI SAI REDDY [21K91A1241]
SARIKONDA GOWTHAMI [21K91A1249]
V.TARUN [21K91A1257]
GUGULOTHU PREM KUMAR [21K91A1225]
Under the guidance of
Mr.V. MURUGAN
Assistant Professor
DEPARTMENT OF INFORMATION TECHNOLOGY
TKR COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
Approved By AICTE, Affiliated to JNTUH, Accredited By NBA,
Accredited by NAAC with ‘A+’Grade Medbowli, Meerpet, Balapur(M),
Hyderabad-500097
[2024-2025]

CERTIFICATE
Certified that this report entitled “FAKE SOCIAL MEDIA ACCOUNT
DETECTION”is being submitted by “MUPPANI SAI REDDY (21K91A1241),
SARIKONDA GOWTHAMI (21K91A1249), V. TARUN (21K91A1257), G. PREM
KUMAR (21K91A1225)” is record of Bonafide work carried out by them.
EXTERNAL EXAMINER

SUPERVISOR HEAD OF THE DEPARTMENT
Mr. V. MURUGAN Dr. N. SATYANARAYANA
Assistant Professor Professor
Information Technology Information Technology
Meerpet, TKRCET Meerpet, TKRCET

PLAGIARISM REPORT
This is to certify that major project VIII semester report entitled “FAKE SOCIAL
MEDIA ACCOUNT DETECTION ”.
Submitted by
1. MUPPANI SAI REDDY 21K91A1241
2. SARIKONDA GOWTHAMI 21K91A1249
3. V. TARUN 21K91A1257
4. GUGULOTHU PREM KUMAR 21K91A1225
Is Checked for Plagiarism and similarity obtained is_10%
Mr. V. MURUGAN Dr. N. SATYANARAYANA
SUPERVISOR HEAD OF THE DEPARTMENT
ASSISTANT PROFESSOR INFORMATION TECHNOLOGY
INFORMATION TECHNOLOGY MEERPET, TKRCET
MEERPET, TKRCET

PLAGIARISM DOCUMENT

ACKNOWLEDGEMENT
There are many people who helped us directly or indirectly to complete our project successfully.
We would like to take this opportunity to thank one and all.
We are extremely thankful and indebted to our supervisor, Mr.V. MURUGAN Assistant Professor,
Department of Information Technology, TKR College of Engineering and Technology, for his
constant guidance, encouragement and moral support throughout the major project. We are
extremely thankful to Dr. N. SATYANARAYANA , Head of the Department, Department of
Information Technology. TKR College of Engineering and Technology for the encouragement and
support throughout the project.
Our sincere thanks and gratitude to Dr. D. V. RAVI SHANKAR, Principal. TKR College of
Engineering and Technology, for all the timely support and valuable suggestions during the period
of our project
Finally. we would also like to thank all the faculty and staff of Information Technology
Department who helped us directly or indirectly, parents and friends for their cooperation in
completing the project work.
MUPPANI SAI REDDY(21K91A1241)
SARIKONDA GOWTHAMI(21K91A1249)
V. TARUN(21K91A1257)
GUGULOTHU PREM KUMAR(21K91A1225)


i
DECLARATION

We the undersigned solemnly declare that the report of the Major Project Phase-II entitled “FAKE
SOCIAL MEDIA ACCOUNT DETECTION ” is based on our own work carried out during the
course of our study under the supervision of Mr. V. MURUGAN. We assert that the statements
made and conclusions drawn are an outcome of the Major Project Phase-I. We further declare that
the best of our knowledge and belief of Major Project Phase-I report does not contain any part of
any work which has been submitted for the award of any other degree/diploma/certificate in this
University or any other University.

ii
MUPPANI SAI REDDY (21K91A1241)
SARIKONDA GOWTHAMI (21K91A1249)
V. TARUN (21K91A1257)
GUGULOTHU PREM KUMAR (21K91A1225)

4.ABSTRACT
This project aims to summarize the recent advancement in the fake account detection
methodology on social networking websites. Over the past decade, social networking websites
have received huge attention from users all around the world. As a result, popular websites such
as Facebook, Twitter, Linked-in, Instagram, and others saw an unexpected rise in registered
users. However, researchers claim that all registered accounts are not real; many of them are fake
and created for specific purposes. The primary purpose of fake accounts is to spread spam
content, rumor, and other unauthentic messages on the platform.
Hence, it is needed to filter out the fake accounts, but it has many challenges. In the past
few years, researchers applied many advanced technologies to identify fake accounts. In the
survey presented in this article, we summarize the recent development of fake account detection
technologies. We discuss the challenges and limitations of the existing models in brief. The
survey may help future researchers to identify the gaps in the current literature and develop a
generalized framework for fake profile detection on social networking websites.
Social networking platforms have become an essential part of today’s human life—
almost every individual is associate with at least one of the online social networking websites
today. Hence, a huge crowd is always active on these platforms; a large number of user
engagements attracted spammers and unauthentic users on online social networking. To spread
unauthentic messages such as rumors, hate speech, bullied text, and others, users create a fake
profile. Researchers proposed several techniques to limit this issue using machine-learning- and
deep-learning-based models, but many fake accounts are still present. However, for a good social
networking platform, these fake accounts are not acceptable. This project summarizes the recent
advancement of social networking.


iii

TABLE OF CONTENTS
S.NO CONTENTS Pg.no
1ACKNOWLEDGEMENT i
2DECLARATION ii
3ABSTRACT iii
4INTRODUCTION 5
5LITERATURE SURVEY 6
6PROBLEM STATEMENT 10
7EXISTING SYSTEM 11
8PROPOSED SYSTEM 13
9METHODOLOGIES TO IMPLEMENT 14
9.1 Data Preprocessing and Augmentation 14
9.2Feature Extraction Using Convolutional Neural Networks
(CNNs)
14
9.3Temporal Analysis for Frame Consistency 14
9.4 Model Training and Optimisation 14
10REQUIREMENT ANALYSIS 15
10.1Hardware Requirements 15
10.2 Software Requirements 15
10.3 Functional Requirements 16
10.4Non-Functional Requirements 17
11TECHNOLOGY USED 18
11.1Programming Language 18
11.2Machine Learning and Deep Learning Frameworks 18
11.3Libraries and Tools 18
11.4Specialized Technologies 19
11.5Dataset Management and Storage 19
11.6Development and Deployment Tools 20
11.7Hardware and GPU Technology 20
12SYSTEM ARCHITECTURE 21
13IMPLEMENTATION OF PROPOSED METHODOLOGY 22
13.1Data Collection 22
13.2Data Preprocessing 22
13.3Feature Extraction 23

13.3Evaluation Model 23
14DATA FLOW DIAGRAMS 24
15UML DIAGRAMS 27
16DOMAIN SPECIFICATION 35
16.1Supervised Learning 37
16.2Unsupervised Learning 39
17TESTING 42
18IMPLEMENTATION 44
19SAMPLE SCREENSHOTS 46
20CONCLUSION 48
21FUTURE WORK 49
REFERENCES

LIST OF FIGURES
S.NO FIGURES Pg.No
14.1LEVEL 0 25
14.2LEVEL 1 26
15.1USECASE DIAGRAM 29
15.2SEQUENCE DIAGRAM 30
15.3ACTIVITY DIAGRAM 31
15.4COLLABORATION DIAGRAM 32
15.5CLASS DIAGRAM 33
15.6ER DIAGRAM 34
19.1SAMPLE SCREENSHOT 46
19.2SAMPLE SCREENSHOT 47
19.3SAMPLE SCREENSHOT 47

4.INTRODUCTION
The exponential rise in internet usage has transformed social media platforms into vital
hubs for marketing, advertising, and social interaction. Among these platforms, Instagram
stands out as a prominent space with millions of active users engaging daily. However, this
widespread popularity has also led to misuse, with individuals creating false identities for
malicious purposes. Such activities pose serious threats to user security, privacy, and the
integrity of the platform.
The misuse of social networks, including the proliferation of fake accounts, has become a
growing concern. Cyber criminals and spammers exploit these platforms to deceive users,
promote fraudulent activities, and artificially inflate follower counts for personal or
commercial gain. These actions not only harm genuine users but also erode trust within the
community.
To address these challenges, researchers have proposed various methods to enhance the
security and reliability of social media applications. This paper focuses on the automatic
detection of fake Instagram profiles, leveraging supervised machine learning algorithms to
distinguish between genuine and fraudulent accounts. By identifying fake profiles, this
approach aims to safeguard Instagram users and ensure a secure social networking
environment.
5

5.LITERATURE SURVEY
1.Title: Detecting Fake Accounts in Online Social Networks
Author: Gupta, P., Gupta, N.
Published Year: 2018
Abstract:
This study examines methods to detect fake accounts on social network analyzing
user behaviour and activity patterns. Machine learning algorithms such as Random
Forest and SVM are employed for classification, achieving significant accuracy. The
paper highlights the importance of account attributes like profile completeness and
engagement metric
2.Title: Fake Profile Detection in Social Media Using Machine Learning
Author: Singh, A., Sharma, R.
Published Year: 2020
Abstract:
This paper presents a machine learning approach to identifying fake profiles on
platforms like Instagram and Facebook. It utilizes features such as post frequency,
follower-following ratio, and account activity to train models like Logistic Regression
and Decision Trees. Experimental results demonstrate effective detection rates.
3.Title: Social Media Security: Fake Profile Detection
Author: Patel, D., Mehta, S.
Published Year: 2019
Abstract:
The research focuses on security issues in social media, particularly fake profile
detection. Techniques such as Naïve Bayes and K-Nearest Neighbour are applied to
classify profiles based on engagement behaviours. The study emphasizes the need for
automated systems to enhance user safety.
6

4.Title: Machine Learning for Detecting Fake Social Media Accounts
Author: Zhou, J., Wang, L.
Published Year: 2021
Abstract:
This paper proposes a hybrid machine learning model combining SVM and
Neural Networks to identify fake profiles on social media. The study explores the
use of metadata, such account age and post patterns, and achieves a high accuracy
rate of 92%.

5.Title: A Comprehensive Review of Fake Account Detection Techniques
Author: Kim, H., Park, Y.
Published Year: 2017
Abstract:
This review paper analyses various fake account detection techniques used
across different platforms. It categorizes methods into heuristic-based, machine
learning-based, and hybrid approaches, highlighting their strengths and weakness.
6.Title: Spam and Fake Account Detection on Social Networks
Author: Banerjee, S., Das, P.
Published Year: 2016
Abstract:
The study focuses on identifying spam and fake accounts through
behavioural analysis. It employs clustering algorithms and network analysis to identify
anomalies in user interactions and engagement patterns, achieving
promising results in spam detection.
7

7.Title: Fake Followers and Profile Detection in Instagram
Author: Kaur, G., Malik, P.
Published Year: 2022
Abstract:
This paper investigates the detection of fake followers and profiles on
Instagram. Using supervised machine learning models like Random Forest and
Gradient Boosting, the study identifies key features contributing to account
authenticity, such as engagement-to-follower ratios.
8.Title: Identifying Fake Users in Social Media: Challenges and Solutions
Author: Chatterjee, A., Roy, S.
Published Year: 2019
Abstract:
This research outlines the challenges in identifying fake social media accounts,
including evolving tactics and data volume. It proposes a semi-supervised learning
approach that utilizes limited labeled data to train efficient classifiers for large-scale
applications.
9.Title: Bot and Fake Account Detection on Twitter Using ML Techniques
Author: Ahmed, I., Khan, M.
Published Year: 2020
Abstract:
The paper presents a machine learning-based approach for detecting fake
accounts and bots on Twitter. It leverages features like tweet frequency, retweet
behavior, and network connections, achieving an accuracy of over 90% with
ensemble learning techniques.
8

10.Title: Enhancing Social Media Security Through Automated Fake Profile Detection
Author: Li, Y., Zhang, X.
Published Year: 2021
Abstract:
This study proposes an advanced fake profile detection system using deep
learning models. It integrates user profile data, activity patterns, and text analysis to
identify fraudulent accounts, with experimental results showing improved precision
and recall rates compared to traditional methods.
9

6.PROBLEM STATEMENT

"Social media platforms like Instagram have become essential tools for advertising,
marketing, and social interaction, attracting millions of daily users. However, the rapid growth
and popularity of these platforms have given rise to malicious activities, including the creation
of fake profiles. These fraudulent accounts are often used for cybercrimes, spamming,
scamming, and artificially inflating engagement metrics such as followers, likes, and
comments.
Such misuse undermines user trust, compromises data privacy, and threatens the
integrity of online interactions. Despite the implementation of basic detection mechanisms such
as manual reporting, rule-based filtering, and heuristic-based behavioural analysis, these
approaches face significant limitations, including high rates of false positives and negatives,
inability to adapt to evolving tactics, and scalability issues for platforms with millions of users.
Furthermore, the reactive nature of existing solutions allows fake accounts to remain active for
extended periods, causing harm before action is taken to address these challenges, there is a
critical need for a robust, automated system capable of accurately detecting fake profiles on
Instagram.
By leveraging advanced supervised machine learning algorithms, this system aims to
classify profiles based on key features such as follower-to-following ratios, activity patterns,
profile completeness, and engagement metrics. The proposed solution not only reduces manual
effort and enhances detection accuracy but also provides scalability to handle large datasets,
enabling proactive identification and mitigation of fake accounts. This will contribute to
safeguarding user security, enhancing social media integrity, and fostering a trustworthy online
environment."
10

7.EXISTING SYSTEM
The current systems employed for detecting fake profiles on social media platforms
like Instagram rely primarily on basic heuristic approaches, manual reporting, and
rudimentary rule-based algorithms. These existing methods, while effective to a certain
extent, face several limitations and challenges:
1.Manual Reporting:
Platforms often depend on users to report suspicious accounts. While this approach
can help identify fake profiles, it is highly inefficient, time-consuming, and subjective.
Additionally, many fake accounts may remain undetected due to a lack of user vigilance.
2.Rule-Based Detection:
Some systems utilize predefined rules or filters, such as identifying accounts with
excessive activity, incomplete profile information, or suspicious follower patterns.
However, these methods are limited in their ability to adapt to evolving tactics employed by
cyber criminals and spammers.
3.Behavioral Analysis:
Certain platforms incorporate behavioural analysis techniques to flag unusual
patterns, such as high-frequency posting, excessive follower requests, or abnormal
engagement rates. While this provides additional insights, it may result in a high false-
positive rate, affecting genuine users.
4.Dependency on Platform-Specific Policies:
Social media platforms implement varying degrees of account verification and
security measures. However, these policies often lack consistency and do not address the
root cause of the problem effectively.
5.Scalability Issues:
Existing systems struggle to scale efficiently with the rapid growth of user bases and
the increasing sophistication of fake profile creation methods. Many approaches fail to keep
pace with the volume of data and the complexity of detection requirements.
While these systems offer a foundational framework for detecting fake profiles, they are
largely reactive and inadequate in addressing the dynamic nature of the problem. This
necessitates the development of more robust, automated, and adaptive solutions leveraging
advanced machine learning techniques to enhance the accuracy and efficiency of fake
profile detection.

11

Disadvantages
1.High False Positives and False Negatives:
Rule-based systems often mis-classify genuine accounts as fake (false positives) or
fail to detect sophisticated fake profiles (false negatives). This undermines the reliability of
the system and causes user dissatisfaction.
2.Lack of Adaptability:
Static rules and heuristics cannot keep up with the evolving tactics employed by
cyber criminals and spammers, making the existing systems ineffective against advanced
threats.
3.Scalability Issues:
Existing systems struggle to handle the enormous and continuously growing volume
of user-generated data, leading to delays and reduced detection accuracy on large platforms
like Instagram.
4.Reactive Rather Than Proactive:
Current systems often rely on user reports or observable suspicious activity to act,
which allows fake profiles to operate undetected for extended periods, causing potential
harm before action is taken.
12

8.PROPOSED SYSTEM
The proposed system is designed to identify fake Instagram profiles using
supervised machine learning algorithms. The system leverages robust methodologies for
data prepossessing, model training, and hyper parameter optimization to ensure high
accuracy and reliability. The implementation focuses on integrating advanced algorithms
such as Random Forest and Grid Search CV for enhanced performance.
Advantages
High Accuracy and Reliability
The system utilizes Random Forest, a powerful ensemble learning algorithm,
which is known for its high accuracy in classification tasks. GridSearchCV further
optimizes the model’s hyperparameters, ensuring the best possible performance.
Automation and Efficiency
By automating the detection of fake profiles, the system significantly reduces
manual effort, making the process faster and more efficient. It can process large
datasets quickly, saving time and resources.
Scalability
The system is designed to handle a large volume of data, making it scalable for
millions of Instagram profiles. It can easily adapt to growing datasets and other social
media platforms.
Improved Social Media Security
The system helps in proactively identifying fake profiles, enhancing the
security of social media platforms by preventing cyber criminals and spammers from
causing harm to users and communities.
Real-Time Detection
The system can be extended for real-time fake profile detection, which enables
timely actions to be taken against fraudulent accounts and ensures the integrity of
user interactions on social media.
13

9.METHODOLOGIES TO IMPLEMENT
9.1. Data Preprocessing and Augmentation
- Collect a diverse dataset of real and deepfake videos, ensuring it covers variations in
lighting, resolution, and compression.
- Perform preprocessing tasks such as frame extraction, resizing, normalization, and
enhancement of facial regions to improve feature clarity.
- Apply data augmentation techniques, including rotation, flipping, noise addition, and
color jittering, to make the model robust to variations.
9.2. Feature Extraction Using Convolutional Neural Networks (CNNs)
- Design or adapt a CNN architecture capable of extracting spatial features from video
frames, focusing on regions such as the eyes, mouth, and skin textures.
- Integrate attention mechanisms within the CNN to emphasize areas more susceptible to
manipulation, improving detection precision.
- Employ transfer learning by using pre-trained models (e.g., XceptionNet or ResNet) to
reduce training time and improve performance.
9.3. Temporal Analysis for Frame Consistency
- Extend the system by analysing temporal inconsistencies across video frames, such as
unnatural blinking rates, jittery movements, or transitions.
- Combine frame-level CNN outputs using temporal models like Recurrent Neural
Networks (RNNs) or Long Short-Term Memory (LSTM) networks to identify anomalies in
sequences.
9.4. Model Training and Optimization
- Train the CNN model using a large and labelled dataset, ensuring a balanced distribution
of real and deepfake videos.
- Implement optimization techniques like Adam optimizer, learning rate schedulers, and
dropout layers to prevent overfitting and improve model generalization.
14

10.REQUIREMENT ANALYSIS
The system specification for a project detecting fake Instagram profiles using supervised
learning includes hardware and software requirements, functional components, and system
design aspects. Below is a comprehensive outline
10.1. Hardware Requirements
• Processor: Intel Core i5 or higher (or equivalent AMD Ry-zen)
RAM: Minimum 8 GB (16 GB recommended for larger datasets)
Storage: At least 256 GB SSD (512 GB recommended for faster processing and
dataset storage)
Graphics Card: NVIDIA GTX 1050 or higher for GPU-accelerated training (if
applicable)
Operating System: Windows 10/11, macOS, or Linux (Ubuntu 20.04+)
 Network: Stable internet connection for fetching live data (if applicable)
10.2. Software Requirements
Operating System: Windows, macOS, or Linux
 Programming Language: Python (version 3.7 or above)
Frameworks and Libraries:
Machine Learning: scikit-learn, TensorFlow, PyTorch (if required)
Data Processing: pandas, NumPy
Data Visualization: matplotlib, Seaborn
Natural Language Processing (if text-based features are used): NLTK, spaCy
Development Environment: Jupyter Notebook, VS Code, or PyCharm
Deployment Tools: Flask
15

10.3. Functional Requirements
1.Data Collection and Input
• The system must be able to collect data related to Instagram profiles, such as the number
of followers, profile bio details, post frequency, follower-to-following ratio, account
creation date, and other relevant attributes.
o500GB or 1TB SSD for handling large datasets and ensuring fast data access.
oExternal storage (optional) for backup or managing multiple large datasets.
• Input data should be in a structured format (e.g., CSV, JSON) for easy processing.
2. Data Preprocessing
• The system should handle missing or incomplete data and perform necessary data
cleaning.
• It must be able to preprocess the data by encoding categorical variables and normalizing
or scaling numerical values.
• Data should be split into training and testing sets (e.g., 80-20 split or 70-30 split) for
model evaluation.
3. Feature Engineering and Selection
• The system should automatically select and extract relevant features (e.g., profile activity,
engagement, follower count) for model training.
• It must support feature importance evaluation and feature selection to enhance model
performance.
4. Model Training
• The system should allow for the training of machine learning models, specifically Random
Forest, using labeled data (real vs. fake profiles).
• The model training should be accompanied by cross-validation to avoid over-fitting.
• Grid Search CV or similar techniques should be used to fine-tune hyper parameters of the
Random Forest model.
5. Model Evaluation and Performance Metrics
• After training, the system should evaluate the model's performance using metrics such
as precision, recall, F1-score, and ROC-AUC.
16

10.4. Non - Functional Requirements
1. Performance
• Response Time: The system must provide quick responses for detecting fake profiles,
with a maximum response time of 1-2 seconds for classification after input data is provided.
• Throughput: The system should be capable of handling large datasets efficiently,
processing thousands of profiles in a reasonable amount of time (e.g., within minutes for
bulk processing).
• Scalability: The system should be scalable to handle an increasing number of profiles as
the social media platform grows, ensuring it can manage a high volume of data without
significant performance degradation .
Attention layers can be integrated into the CNN to prioritize regions of interest, such as the
eyes, mouth, and other facial areas, where manipulations are more likely.
1. Reliability
• Uptime: The system should be available 24/7 with minimal downtime. Any maintenance
or downtime should be communicated clearly to users.
• Fault Tolerance: The system must be resilient to failures, providing fallback mechanisms
in case of issues (e.g., if a model fails to load or if data is corrupted).
• Data Consistency: The system must ensure that the data used for training, testing, and
classification remains consistent and is not corrupted during processing.
2. Development Environment
IDE (Integrated Development Environment): A good IDE will help in writing,
debugging, and running Python code. Some popular choices:
oPyCharm: A full-featured IDE for Python development, especially useful
for machine learning and data science projects.
oVSCode: A lightweight, free code editor with Python extensions and great
support for Jupyter notebooks.
oJupyter Notebook: Useful for running and visualizing code in blocks. It’s
excellent for data analysis and experimentation.
17

11.TECHNOLOGY USED
The project involves detecting fake social media account using Convolutional random
forest. The technologies used span various domains, including programming languages,
frameworks, tools, and libraries, to ensure robust implementation. Below is a detailed
breakdown of the technologies:
11.1. Programming Language
Python:
Python is chosen as the primary language due to its simplicity, extensive library
support, and strong ecosystem for machine learning and deep learning applications. It
enables seamless integration of various tools, frameworks, and pre-trained models.
11.2. Machine Learning and Deep Learning Frameworks
TensorFlow:
TensorFlow provides a flexible platform for building, training, and deploying deep
learning models. It includes tools for creating custom CNN architectures and supports GPU
acceleration for faster computation.
Kera’s:
Built on TensorFlow, Kera’s simplifies the development of deep learning models
with its user-friendly API, enabling rapid prototyping and experimentation.
Py-Torch (optional alternative):
Py-Torch is another powerful deep learning framework that offers dynamic
computation graphs, making it easier to debug and customize models.
11.3. Libraries and Tools
OpenCV:
OpenCV (Open-Source Computer Vision Library) is used for video and image
preprocessing, such as frame extraction, resizing, and normalization. It also supports facial
landmark detection for focusing on manipulated regions.
18

NumPy:
NumPy is used for numerical operations, including matrix manipulations and dataset
handling during model preprocessing and training.
Pandas:
Pandas assists in managing datasets efficiently, including reading, writing, and
preprocessing metadata for videos.
Matplotlib and Seaborn:
These libraries are used for visualizing data and model performance metrics, such as
loss, accuracy, and confusion matrices.
scikit-learn:
Scikit-learn provides tools for data splitting, model evaluation, and additional
preprocessing like scaling and encoding.
11.4. Specialized Technologies
CNN Architecture:
The project employs CNNs for extracting spatial features from video frames. Pre-
trained architectures such as XceptionNet or ResNet may be utilized for transfer learning to
leverage prior knowledge from large datasets.
Custom CNN layers may be added to focus on specific tasks, such as analyzing facial
regions prone to manipulation.
Attention Mechanisms:
Attention layers can be integrated into the CNN to prioritize regions of interest, such
as the eyes, mouth, and other facial areas, where manipulations are more likely.
11.5. Dataset Management and Storage
FaceForensics++ Dataset:
This dataset provides a variety of real and manipulated videos, commonly used for
training and evaluating deepfake detection systems.
19

SQLite/MySQL/MongoDB:
Databases are used for managing and storing datasets, including metadata like video
labels, preprocessing logs, and results from model predictions.
11.6. Development and Deployment Tools
Integrated Development Environments (IDEs):
Jupyter Notebook: For iterative development, testing, and visualization of results.
PyCharm/VS Code: For writing, organizing, and debugging code in larger-scale
applications.
TensorBoard:
TensorBoard is used to monitor training progress, visualize model architecture, and
analyze performance metrics like loss and accuracy over epochs.
Git and GitHub/GitLab:
Version control systems are essential for tracking changes, collaborating with team
members, and maintaining a repository for the project.
11.7. Hardware and GPU Technology
CUDA and cuDNN:
NVIDIA's CUDA and cuDNN libraries enable GPU acceleration for training,
significantly reducing computation time compared to CPU-based processing.
NVIDIA GPUs:
GPUs such as NVIDIA RTX 3060 or higher are recommended to support deep
learning tasks efficiently.
20

12.SYSTEM ARCHITECTURE

21

13.IMPLEMENTATION OF PROPOSED METHODOLOGY
The implementation of the proposed methodology for “fake social media account
detection” involves a systematic approach broken down into several key phases:
13.1.DATA COLLECTION
Data used in this paper is a set of dataset. This step is concerned with selecting the
subset of all available data that you will be working with. ML problems start with data
preferably, lots of data (examples or observations) for which you already know the target
answer. Data for which you already know the target answer is called labelled data.

13.2.DATA PRE-PROCESSING
Organize your selected data by formatting, cleaning and sampling from it.
Three common data pre-processing steps are:
• Formatting:
The data you have selected may not be in a format that is suitable for you to work
with. The data may be in a relational database and you would like it in a flat file, or the data
may be in a proprietary file format and you would like it in a relational database or a text
file.
• Cleaning:
Cleaning data is the removal or fixing of missing data. There may be data instances
that are incomplete and do not carry the data you believe you need to address the problem.
These instances may need to be removed. Additionally, there may be sensitive information
in some of the attributes and these attributes may need to be anonymized or removed from
the data entirely.
• Sampling:
There may be far more selected data available than you need to work with. More
data can result in much longer running times for algorithms and larger computational and
memory requirements. You can take a smaller representative sample of the selected data
that may be much faster for exploring and prototyping solutions before considering the
whole dataset.
22

13.3. FEATURE EXTRATION
Next thing is to do Feature extraction is an attribute reduction process. Unlike
feature selection, which ranks the existing attributes according to their predictive
significance, feature extraction actually transforms the attributes. The transformed
attributes, or features, are linear combinations of the original attributes. Finally, our models
are trained using Classifier algorithm. We use classify module on Natural Language Toolkit
library on Python. We use the labelled dataset gathered. The rest of our labelled data will be
used to evaluate the models. Some machine learning algorithms were used to classify pre-
processed data. The chosen classifiers were Random-forest. These algorithms are very
popular in text classification tasks.

13.4.EVALUATION MODEL
Model Evaluation is an integral part of the model development process. It helps to
find the best model that represents our data and how well the chosen model will work in the
future. Evaluating model performance with the data used for training is not acceptable in
data science because it can easily generate overoptimistic and over fitted models.
- High computational requirements for training and real-time analysis.
- Potential challenges in detecting highly advanced deepfakes generated using
unseen algorithms.
- Limited effectiveness for low-resolution or highly compressed videos.
Performance of each classification model is estimated base on its averaged. The
result will be in the visualized form. Representation of classified data in the form of graphs.
Accuracy is defined as the percentage of correct predictions for the test data. It can be
calculated easily by dividing the number of correct predictions by the number of total
predictions.
23

14.DATA FLOW DIAGRAMS
The abbreviation for Information Stream Outline is DFD. DFD deals with the
information flow of a framework or an interaction. Additionally, it provides information on the
data sources, outcomes, and actual interactions for each factor. There are no circles, choice
criteria, or control streams in DFD. A flowchart can make sense of explicit tasks reliant on the
type of data.
It is a graphical tool that makes communicating with clients, supervisors, and other
faculty members easier. It is useful for dissecting both the current and the suggested structure.
It provides a summary of the framework procedures for what information there is.
What adjustments are made, what data is archived, what outputs are made, and so on.
There are various ways to address the Information Stream Outline. There are organised
investigation exhibiting gadgets at the DFD. Information Stream graphs are well known
because they let us visualise the important steps and information involved in programming
framework activities.
Four sections make up the information stream graph:
Process Due to dealing capability, a framework's ability to produce change is affected.
Images of a conversation may be round, oval, square, or rectangular with rounded corners. The
cycle is given a brief name that communicates its essence in a single word or statement.
Information Stream Information stream depicts the data moving between various pieces
of the frameworks. The bolt image is the image of information stream. An engaging name
ought to be given to the stream to decide the data which is being moved. Information stream
additionally addresses material alongside data that is being moved. Material movements are
displayed in frameworks that are not simply useful. A given stream ought to just exchange a
solitary kind of data. The course of stream is addressed by the bolt which can likewise be bi-
directional.
Distribution centre the information is put away in the stockroom for sometimes in the
future. Two flat lines address the image of the store. The distribution centre is essentially not
confined to being an information record rather it very well may be in any way similar to an
envelope with reports, an optical circle, a file organizer. The information distribution center can
be seen autonomous of its execution. At the point when the information stream from the
stockroom it is considered as information perusing and when information streams to the
distribution center it is called information passage or information refreshing.
Eliminator The Eliminator is an outer substance that stands beyond the framework and
speaks with the framework. It very well may be, for instance, associations like banks,
gatherings like clients or various divisions of a similar association, which isn't a piece of the
model framework and is an outside element. Displayed frameworks additionally speak with
eliminator.
24

Level 0
Figure-14.1
25

LEVEL 1
Level 1
Figure-14.2
26

15.UML DIAGRAMS
The Unified Modelling Language (UML) is used to specify, visualize, modify, construct
and document the artifacts of an object-oriented software intensive system under development.
UML offers a standard way to visualize a system's architectural blueprints, including elements
such as:
Actors
Business processes
(logical) components
Activities
programming language statements
database schemas, and
Reusable software components.
UML combines best techniques from data modelling (entity relationship diagrams),
business modelling (work flows), object modelling, and component modelling. It can be used
with all processes, throughout the software development life cycle, and across different
implementation technologies. UML has synthesized the notations of the Booch method, the
Object modelling technique (OMT) and Object-oriented software engineering (OOSE) by
fusing them into a single, common and widely usable modelling language. UML aims to be a
standard modelling language which can model concurrent and distributed systems.
Sequence Diagram:
Sequence Diagrams Represent the objects participating the interaction horizontally and
time vertically. A Use Case is a kind of behavioural classifier that represents a declaration of an
offered behaviour. Each use case specifies some behaviour, possibly including variants that the
subject can perform in collaboration with one or more actors. Use cases define the offered
behaviour of the subject without reference to its internal structure. These behaviours, involving
interactions between the actor and the subject, may result in changes to the state of the subject
and communications with its environment. A use case can include possible variations of its
basic behaviour, including exceptional behaviour and error handling.
Activity Diagram:
Activity diagrams are graphical representations of Workflows of stepwise activities and
actions with support for choice, iteration and concurrency. In the Unified Modelling Language,
activity diagrams can be used to describe the business and operational step-by-step workflows
of components in a system. An activity diagram shows the overall flow of control.
27

Usecase Diagram:
UML is a standard language for specifying, visualizing, constructing, and documenting
the artifacts of software systems.
UML was created by Object Management Group (OMG) and UML 1.0 specification
draft was proposed to the OMG in January 1997.
OMG is continuously putting effort to make a truly industry standard. UML stands for
Unified Modelling Language.
UML is a pictorial language used to make software blue prints.
Class Diagram:
The class diagram is the main building block of object-oriented modelling. It is used for
general conceptual modelling of the systematic of the application, and for detailed modelling
translating the models into programming code. Class diagrams can also be used for data
modelling.[1] The classes in a class diagram represent both the main elements, interactions in
the application, and the classes to be programmed.
In the diagram, classes are represented with boxes that contain three compartments:
The top compartment contains the name of the class. It is printed in bold and centered,
and the first letter is capitalized.
The middle compartment contains the attributes of the class. They are left-aligned and
the first letter is lowercase.
The bottom compartment contains the operations the class can execute. They are also
left-aligned and the first letter is lowercase.
28

15.1. Usecase Diagram
Figure-15.1
29

30

15.2. Sequence Diagram
Figure-15.2
31

15.3. Activity Diagram
Activity Diagram
Figure-15.3
Collaboration Diagram
Collaboration Diagram
Figu
Figure 15.3
32

15.4. Collaboration Diagram

Figure-15.4
33

15.5. Class Diagram



RESULTS AND DICUSSION
Figure-15.5
34

15.6.ER Diagram
Figure-15.6
35

16.DOMAIN SPECIFICATION
Machine Learning
Machine Learning is a system that can learn from example through self improvement
and without being explicitly coded by programmer. The breakthrough comes with the idea
that a machine can singularly learn from the data (i.e., example) to produce accurate results.
Machine learning combines data with statistical tools to predict an output. This output is
then used by corporate to makes actionable insights. Machine learning is closely related to
data mining and Bayesian predictive modelling. The machine receives data as input, use an
algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have a
Netflix account, all recommendations of movies or series are based on the user's historical
data. Tech companies are using unsupervised learning to improve the user experience with
personalizing recommendation. Machine learning is also used for a variety of task like fraud
detection, predictive maintenance, portfolio optimization, automatize task and so on.


How does Machine learning work?
Machine learning is the brain where all the learning takes place. The way the machine
learns is similar to the human being. Humans learn from experience. The more we know,
the more easily
36

we can predict. By analogy, when we face an unknown situation, the likelihood of success
is lower than the known situation. Machines are trained the same. To make an accurate
prediction,
the machine sees an example. When we give the machine a similar example, it can figure
outcome. However, like a human, if its feed a previously unseen example, the machine has
difficulties to predict.
The core objective of machine learning is the learning and inference. First of all, the
machine
For instance, the machine is trying to understand the relationship between the wage of an
individual and the likelihood to go to a fancy restaurant. It turns out the machine finds a
positive relationship between wage and going to a high-end restaurant:

Inferring
When the model is built, it is possible to test how powerful it is on never-seen before data.
The new data are transformed into a features vector, go through the model and give a
prediction. This is all the beautiful part of machine learning. There is no need to update the
rules or train again the model. You can use the model previously trained to make inference
on new data.
37

Machine learning Algorithms and where they are used?
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms
16.1. Supervised learning
An algorithm uses training data and feedback from humans to learn the relationship of
given inputs to a given output. For instance, a practitioner can use marketing expense and
weather forecast as input data to predict the sales of cans. You can use supervised learning
when the output data is known. The algorithm will predict new data.
There are two categories of supervised learning:
Classification task
Regression task
Classification:
Imagine you want to predict the gender of a customer for a commercial. You will start
gathering data on the height, weight, job, salary, purchasing basket, etc. from your customer
database. You know the gender of each of your customers, it can only be male or female.
38

The objective of the classifier will be to assign a probability of being a male or a female
(i.e., the label) based on the
information (i.e., features you have collected). When the model learns how to recognize
male or female, you can use
new data to make a prediction. For instance, you just got new information from an unknown
customer, and you want to know if it is a male or female. If the classifier predicts male =
70%, it
means the algorithm is sure at 70% that this customer is a male, and 30% it is a female. The
label can be of two or more classes. The above example has only two classes, but if a
classifier needs to predict object, it has dozens of classes (e.g., glass, table, shoes, etc. each
object represents a class).
Regression:
When the output is a continuous value, the task is a regression. For instance, a financial
analyst may need to forecast the value of a stock based on a range of features like equity,
previous stock performances, macroeconomics index. The system will be trained to estimate
the price of the stocks with the lowest possible error.
39

16.2. Unsupervised learning
In unsupervised learning, an algorithm explores input data without being given an explicit
output variable (e.g., explores customer demographic data to identify patterns) You can use
it when you do not know how to classify the data, and you want the algorithm to find
patterns and classify the data for you
40

Application of Machine learning
Augmentation:
Machine learning, which assists humans with their day-to-day tasks, personally or
commercially without having complete control of the output. Such machine
learning is used in different ways such as Virtual Assistant, Data analysis, software
solutions. The primary user is to reduce errors due to human bias.
Automation:
Machine learning, which works entirely autonomously in any field without the
need for any human intervention. For example, robots performing the essential
process steps in manufacturing plants.
Finance Industry:
Machine learning is growing in popularity in the finance industry. Banks are
mainly using ML to find patterns inside the data but also to prevent fraud.
41

Government organization:
The government makes use of ML to manage public safety and utilities. Take the
example of China with the massive face recognition. The government uses
Artificial intelligence to prevent jaywalker.
Healthcare Industry:
Healthcare was one of the first industry to use machine learning with image
detection.
Marketing:
Broad use of AI is done in marketing thanks to abundant access to data. Before the
age of mass data, researchers develop advanced mathematical tools like Bayesian
analysis to estimate the value of a customer. With the boom of data, marketing
department relies on AI to optimize the customer relationship and marketing
campaign.

42

17.TESTING
Testing for fake social media account detection involves validating the effectiveness and
accuracy of the detection system. Here’s how it can be approached using different testing types:
1.Black Box Testing:
Focus: Input-output validation without knowing internal logic.
Tests:
Submit an account with no profile photo → Expect: “Suspicious account”
Submit an account posting 100 times per day → Expect: “Flagged for spam”
Input account with 10,000 followings and 10 followers → Expect: “Unnatural activity”
2. White Box Testing
Focus: Internal logic, algorithms, and code paths.
Tests:
Verify thresholds in classification algorithm (e.g., post frequency > X).
Unit test scoring functions (e.g., account credibility score).
Validate decision tree or ML model with controlled inputs.
3. Grey Box Testing
Focus: Combine knowledge of internal logic with input/output.
Tests:
Simulate bot behaviour while monitoring backend feature extraction.
Track how scoring adjusts with gradually increasing suspicious behaviour.
Partially modify account data and observe detection reaction.
4. Functional Testing
Ensure the system flags:
Accounts without bios
Users with fake or copied content
Sudden follow/unfollow patterns
43

5. Non-Functional Testing
Performance: Can the system process 10,000 profiles per minute?
Scalability: Does the detection hold under large datasets?
Accuracy: Evaluate precision, recall, F1 score of fake detection.
44

18.IMPLEMENTATION

@app.route('/register', methods=['GET', 'POST'])
def register():
if request.method == 'POST':
username = request.form['username']
password = request.form['password']
if username in users:
flash('Username already exists. Please choose a different one.')
else:
users[username] = password
flash('Registration successful! You can now log in.')
return redirect(url_for('login'))
return render_template('register.html')

@app.route('/login', methods=['GET', 'POST'])
def login():
if request.method == 'POST':
username = request.form['username']
password = request.form['password']
if username in users and users[username] == password:
session['username'] = username
flash('Login successful!')
return redirect(url_for('home'))
else:
flash('Invalid username or password. Please try again.')
return render_template('login.html')


45

Import mysql.connector
conn = mysql.connector.connect(
host='localhost',
user='root',
password='your_password',
database='your_db'
)
46

19.SAMPLE SCREENSHOTS
Figure-19.1
Figure-19.2
Figure-19.1
47

Figure-19.2

Figure-19.3
48

20.CONCLUSION
The proposed system for detecting fake Instagram profiles using supervised machine
learning algorithms, such as Random Forest, provides an innovative and efficient approach
to enhancing the security and integrity of social media platforms. By utilizing advanced
techniques like GridSearchCV for hyperparameter optimization, the system ensures high
accuracy and reliability in classifying fake profiles, offering a scalable solution capable of
handling large datasets.
This automated approach not only minimizes human intervention but also provides
real-time detection, helping to safeguard users from cybercriminals and spammers. The
system’s ability to generate actionable insights and reports on fake profiles further supports
authorities in taking prompt action against fraudulent accounts.
The integration of key machine learning concepts, robust feature engineering, and
model evaluation ensures that the system remains adaptable to future challenges, such as
evolving fake profile tactics. Furthermore, its scalability, security, and user-friendly
interface make it
an ideal tool for both small-scale and large-scale social media platforms.
Ultimately, this system contributes to the broader goal of securing online social
interactions and maintaining trust in digital communities, offering a proactive and data-
driven solution to combat the growing issue of fake profiles in social media environments.
49

21.FUTURE WORK
The future scope of the proposed fake Instagram profile detection system holds
significant potential for further advancements and broader applications in the field of social
media security. Some key areas for future development include:
1. Integration with Multiple Social Media Platforms
The system could be expanded to detect fake profiles across various social media
platforms like Facebook, Twitter, and Linked-inLinkedIn. This would allow the detection of
fraudulent accounts on a wider scale, improving overall online security and user trust.
2.Real-Time Detection and Monitoring
Currently, the system can be applied to batch-processing data. However, in the
future, real-time monitoring and detection of fake profiles could be implemented, allowing
the system to instantly flag suspicious accounts as they are created or when they exhibit
unusual behavior.
3.Advanced Machine Learning Algorithms
The system could incorporate more sophisticated machine learning models, such as
deep learning-based approaches (e.g., Convolutional Neural Networks, Recurrent Neural
Networks) or ensemble models, to improve detection accuracy and handle complex, high-
dimensional data more effectively.
4.Behavioral Analysis for Fake Profile Detection
In addition to profile attributes like follower count and activity frequency, the
system could be extended to include behavioral analysis. By studying patterns such as the
timing of posts, engagement methods, or the type of content shared, the system could more
accurately differentiate between genuine users and fake profiles.

50

REFERENCES
1)K. S. Rajasekaran and V. R. K. M. Rao, "Detection of Fake Social-media Accounts using
Machine Learning Algorithms," IEEE Access, vol. 8, pp. 91533-91542, 2020.
DOI: 10.1109/ACCESS.2020.2997976
2)M. A. Alazab, R. A. Ward, and A. A. Yassein, "Machine Learning Algorithms for Fake
Profile Detection in Social Media," IEEE Access, vol. 8, pp. 123672-123684, 2020.
DOI: 10.1109/ACCESS.2020.3001237
3)S. K. Sharma and K. P. Gummadi, "A Novel Approach for Detecting Fake Profiles in
Social Media," IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp.
1234-1245, 2020.
DOI: 10.1109/TKDE.2019.2956252
4)S. Saha, N. R. Prasad, and A. Chakrabarti, "Behavioral Feature Based Fake Profile
Detection on Social Media," IEEE Transactions on Computational Social Systems, vol. 7,
no. 5, pp. 1087-1098, Oct. 2020.
DOI: 10.1109/TCSS.2020.2978100
5)S. Ahmed, S. A. Khan, and F. M. Ghanem, "A Machine Learning Approach to Detect
Fake Social Media Profiles," IEEE Access, vol. 7, pp. 22170-22179, 2019.
DOI: 10.1109/ACCESS.2019.2910357
6) S. Verma, A. Bansal, and N. Singh "Fake Profile Detection on Social Media Using
Random Forest Classifier," IEEE International Conference on Computing,
Communication, and Networking Technologies (ICCCNT), pp. 1-6, 2020.
DOI: 10.1109/ICCCNT49239.2020.9225482
7)M. M. Islam, M. I. Reaz, and M. A. Rahman, "Detecting Fake User Accounts in Social
Networks: A Random Forest Classifier Approach," IEEE Access, vol. 8, pp. 24698-24710,
2020.
DOI: 10.1109/ACCESS.2020.2971153
8)Sharma, V. Agarwal, and P. Jha, "A Survey of Machine Learning Techniques for Fake
Profile Detection in Social Media," IEEE Transactions on Emerging Topics in Computing,
vol. 9, no. 2, pp. 545-554, 2021.
DOI: 10.1109/TETC.2020.2962376
51

9)P. Kumar, R. Thakur, and D. K. Gupta, "Fake Profile Detection in Social Media Using
Ensemble Methods," IEEE International Conference on Big Data and Smart Computing
(BigComp), pp. 345-350, 2020.
DOI: 10.1109/BIGCOMP48407.2020.00070
10)S. Das, P. K. Shukla, and R. K. Sharma, "Detection of Fake Accounts in Social
Media Using Supervised Learning," IEEE International Conference on Cyber Security and
Cloud Computing (CSCloud), pp. 208-213, 2020.
DOI: 10.1109/CSCloud49749.2020.00045
11) ResNext Model : https://pytorch.org/hub/pytorch_vision_resnext/ accessed on
06 April 2020
12) https://www.geeksforgeeks.org/software-engineering-cocomo-model/ Accessed
on 15 April 2020
13) Deepfake Video Detection using Neural Networks
http://www.ijsrd.com/articles/IJSRDV8I10860.pdf
14) International Journal for Scientific Research and Development http://ijsrd.com.
52

53