Cyber bullying detection project documents free downloas

alljobsssinfotech 14 views 50 slides May 08, 2025
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Cyber bullying detection


Slide Content

CYBERBULLYING DETECTION USING MACHINE LEARNING



Thesis/Dissertation submitted in the partial fulfillment of the requirements for the award
of the degree of
BACHELOR OF TECHNOLOGY
in
CSE (DATA SCIENCE)


V. Saiteja
By
(22K95A6712)
G. Manasa (21K91A67C9)
Md. Mafaiz
M. Suraj
(21K91A6781)
(21K91A67B5)

Under the guidance of
Mrs.T.ANUSHA
Asst.Professor

DEPARTMENT OF CSE (DATA SCIENCE)
TKR COLLEGE OF ENGINEERING &TECHNOLOGY
(AUTONOMOUS)
(Accredited by NAAC with ‘A
+
’ Grade)
Medbowli, Meerpet, Saroornagar, Hyderabad-500097

TKR COLLEGE OF ENGINEERING AND TECHNOLOGY
Autonomous,
(Accredited by NBA & NAAC with ‘A+’ Grade)



Department of CSE (DATA SCIENCE)



DECLARATION BY THE CANDIDATES


We, Mr.V.Saiteja bearing Hall Ticket Number: 22K95A6712, Ms.G.Manasa bearing
Hall Ticket Number: 21K91A67C9, Mr.Md.Mafaiz bearing Hall Ticket Number:
21K91A6781, Mr.M.Suraj bearing Hall Ticket Number: 21K91A67B5 hereby declare that
the major project report titled CYBERBULLYING DETECTION USING MACHINE
LEARNING under the guidance of Mrs.T.Anusha, Assistant professor in Department of
Computer Science &Engineering is submitted in partial fulfilment of the requirements for the
award of the degree of Bachelor of Technology in CSE(Data Science).











By,

V. Saiteja
G. Manasa
Md. Mafaiz
M. Suraj

(22K95A6712)
(21K91A67C9)
(21K91A6781)
(21K91A67B5)

TKR COLLEGE OF ENGINEERING AND TECHNOLOGY
Autonomous,
(Accredited by NBA & NAAC with ‘A+’ Grade)



Department of CSE (DATA SCIENCE)



CERTIFICATE




This is to certify that the project report entitled “Cyberbullying
Detection using Machine Learning ”, being submitted by Mr.V.Saiteja bearing
Roll.No:.22K95A6712, Ms.G.Manasa bearing Roll.No:.21K91A67C9,
Mr.Md.Mafaiz bearing Roll.No:.21K91A6781, Mr.M.Suraj bearing
Roll.No:.21K91A67B5, in partial fulfillment of requirements for the award of
degree of Bachelor of Technology in CSE(Data Science), to the TKR College of
Engineering & Technology is a record of bonafide work carried out by them under
my guidance and supervision.





Signature of the Guide Signature of the HOD
Mrs.B.Tejaswini Dr.V. Krishna
Asst.Professor Professor





Signature of the Internal Signature of the External
Mr.M.Arokia Muthu
Asst.Professor

ACKNOWLEDGEMENT




The satisfaction and euphoria that accompanies the successful completion of any task would
be incomplete without the mention of the people who made it possible and whose encouragement
and guidancehave crowned our efforts with success.
We express our sincere gratitude to Management of TKRCET for granting permission and
givinginspiration for the completion of the project work.
Our faithful thanks to our Principal Dr. D. V. Ravi Shankar, M.Tech., Ph.D., TKR College
of Engineering & Technology for his Motivation in studies and completion of the project work.
With our heart full pleasure we thank our Head of the Department Dr.V.Krishna, M.Tech.,
Ph.D.,Professor, Department of CSE (Data Science), TKR College of Engineering & Technology
for hissuggestions regarding the project.
Thanks to our Project Coordinator and Internal Guide, Mr.M.Arokia Muthu M.E.,(
Ph.D.), Assistant Professor,Department of CSE (Data Science), TKR College of Engineering &
Technology for his constant encouragement and also for his support in completion of the project
successfully.
Thanks to our Project Coordinator and Internal Guide, Mrs.T.Anusha, Assistant
Professor,Department of CSE (Data Science), TKR College of Engineering & Technology for his
constant encouragement and also for his support in completion of the project successfully.
Finally, we express our thanks to one and all that have helped us in successfully completing
this project. Furthermore, we would like to thank our family and friends for their moral support and
encouragement.





V.Saiteja
By,


(22K95A6712)
G. Manasa
Md. Mafaiz
M. Suraj

(21K91A67C9)
(21K91A6781)
(21K91A67B5)

CONTENTS



Abstract
List Of Figures
i
ii
1. INTRODUCTION
1.1 Existing System
1.2 Limitations of Existing system
1.3 Proposed System
1
2
3
4
2. LITERATURE SURVEY
2.1 Review of Literature

5

3.

REQUIREMENT ANALYSIS
3.1 Functional Requirements
3.2 Non-functional Requirements


16
17
4. DESIGN
4.1 DFD’s &UML Diagrams
4.2 Use Case Diagrams
4.3 Sequence Diagram
4.4 Activity Diagram
4.5 Class Diagram

18
19
20
21
22
5. CODING
5.1 Pseudo code

23

6.

IMPLEMENTATION & RESULTS
6.1 Implementation
6.2 Software and Hardware Requirements

32
33
7. SCREENSHOTS
34



8.



RESULT AND VALIDATION
8.1 Performance Metrics


37

9.
10.


8.2 Validation
CONCLUSION
REFERENCES

38
39
40

i
ABSTRACT


Cyber bullying is a major problem encountered on internet that affects teenagers and also adults. It
has lead to mishappenings like suicide and depression. Regulation of content on Social media platforms
has become a growing need. The following study uses data from two different forms of cyber bullying,
hate speech tweets from Twitter and comments based on personal attacks from Wikipedia forums to build
a model based on detection of Cyber bullying in text data using Natural Language Processing and Machine
learning. Three methods for Feature extraction and four classifiers are studied to outline the best approach.
For Tweet data the model provides accuracies above 90% and for Wikipedia data it gives
accuracies above 80%. Hate speech tweets from Twitter and comments based on personal attacks from
Wikipedia forums are used in this study to build a model based on detection of cyber bullying in text data
using Natural Language Processing and Machine learning. Three methods for feature extraction and four
classifiers are studied to outline the best approach. Cyber bullying is a major problem encountered on the
internet that affects teens and adults. It has led to mis happenings like suicide and depression.
As awareness of cyberbullying has grown, efforts have been made to address the issue, including
educational programs, legislation, and the development of technologies like machine learning algorithms
to detect and prevent such behavior in real-time. Despite these advancements, cyberbullying remains a
widespread and evolving issue, highlighting the need for continuous vigilance, awareness, and solutions
that balance effective detection and support for those affected.
What makes cyberbullying particularly dangerous is its ability to reach large audiences quickly and
its permanence—once something is posted online, it can be shared, archived, and accessed by others
indefinitely, leaving lasting damage to the victim's reputation and mental well-being. Moreover,
cyberbullying often takes place across multiple platforms, with the victim unable to avoid the harassment
even in private spaces, further exacerbating the impact.
Keywords: Cyber bullying, Hate speech, Personal attacks, Machine learning, Feature extraction, Twitter.

ii
LIST OF FIGURES


4.1 DFD Diagram 18
4.2 Use Case Diagram 19
4.3 Sequence Diagram 20
4.4 Activity Diagram 21
4.5
7.1
7.2
7.3
7.4
7.5
7.6
Class Diagram
Home Page
New User Sign Up Screen
User Login Screen
Post Topic Screen
Admin Login
Admin Page with Offensive and Non-
offensive counts
22
34
34
35
35
36
36
8.1 Performance 38
8.2 Validation 38

1


Chapter 1
INTRODUCTION
The rapid expansion of social media and digital communication platforms has significantly
altered the way people connect and exchange information. However, these innovations have also
contributed to the growing issue of cyberbullying—aggressive or harmful behavior carried out
through digital channels. This form of online harassment can have serious effects, particularly for
susceptible populations like adolescents and young adults, often resulting in emotional trauma and
long-term psychological impacts. As such, finding effective solutions to mitigate cyberbullying has
become a pressing concern for educators, researchers, and tech developers alike.
Conventional methods to counter cyberbullying, such as manual content review and user
reporting systems, frequently prove inadequate due to the overwhelming volume of online data and
the demand for rapid response. These manual processes are often time-consuming, inconsistent, and
prone to human error, which may allow abusive content to persist undetected. In response, machine
learning (ML) has emerged as a viable solution for automating the identification of cyberbullying.
Nevertheless, deploying ML-based systems raises important ethical concerns. Issues such as user
privacy, data bias, and the risk of incorrect classification (false positives and negatives) must be
carefully addressed to ensure that these technologies are reliable, equitable, and transparent. A
collaborative effort involving technologists, educators, policymakers, and social media companies is
essential to design solutions that not only detect cyberbullying but also foster safer digital
communities.
These detection systems primarily analyze textual data from online interactions, including social
media posts, comments, and chat messages, using natural language processing (NLP) techniques. NLP
enables systems to interpret language, extract relevant features, and determine whether content is
abusive. Advances in deep learning, particularly the use of transformer-based models like BERT and
GPT, have greatly enhanced the accuracy and reliability of such tools. Moreover, integrating other
media types—such as images, videos, and audio—into these systems is expanding their effectiveness,
allowing for the detection of cyberbullying in multimodal content.
In conclusion, machine learning presents a compelling solution for identifying and managing
cyberbullying in digital spaces. While notable progress has been made, continuous research and
responsible innovation are essential to address the ongoing technical and ethical hurdles in
this important area.

2

1.1 EXISTING SYSTEM


Existing systems for cyberbullying detection using machine learning primarily focus on
analyzing textual data from various online platforms. These systems utilize natural language
processing (NLP) to classify messages as either harmful or benign based on pre-defined features.
Commonly employed machine learning algorithms include logistic regression, support vector
machines, and decision trees, which are trained on labeled datasets comprising instances of
cyberbullying.
However, many of these systems face challenges in accurately capturing the subtleties of
language and context, often resulting in high false-positive rates. One notable approach in existing
systems is the use of sentiment analysis to gauge the emotional tone of messages.
By identifying words and phrases that convey negative sentiment, these systems attempt to
flag potentially harmful interactions. Some systems also incorporate user behavior analysis,
monitoring patterns of communication to detect anomalies that may indicate bullying.
Given the rise of multimedia content on platforms like Instagram, TikTok, and YouTube,
many systems are expanding beyond text-based analysis. Multimodal cyberbullying detection
systems incorporate textual, visual, and audio data to identify harmful behavior. For instance, a
system might analyze captions, comments, and images together to detect bullying in posts or videos.
Image recognition models such as CNNs and advanced tools like Vision Transformers (ViT) have
been employed to detect offensive memes or harmful visual content.
Recent trends in cyberbullying detection include the use of transfer learning, where pre-trained
models are fine-tuned on cyberbullying datasets, and explainable AI (XAI) techniques to improve
transparency and interpretability. Additionally, real-time systems are being developed to detect and
intervene in bullying incidents as they occur, promoting proactive rather than reactive measures.
In conclusion, existing systems for cyberbullying detection have achieved considerable
success but remain limited by challenges related to scalability, generalizability, and ethical concerns.
Continuous innovation in machine learning, dataset creation, and system design is required to build
more robust and effective solutions.

3

1.2 LIMITATIONS OF EXISTING SYSTEM

Despite the advancements in cyberbullying detection using machine learning, several
limitations hinder the effectiveness and scalability of existing systems. These limitations arise due
to technical, linguistic, and ethical challenges, as well as the rapidly changing dynamics of online
interactions. Addressing these issues is critical for improving the robustness and applicability of
such systems.
A significant limitation is the lack of high-quality, labeled datasets for training and
evaluating machine learning models. Cyberbullying datasets are often limited in size, domain-
specific, or biased toward particular platforms or languages. Moreover, annotating such data is a
subjective and labor-intensive process, leading to inconsistencies and labeling errors. The scarcity
of data in languages other than English further restricts the global applicability of many systems,
leaving large populations underserved.
One of the most challenging aspects of cyberbullying detection is understanding the context
in which a message is conveyed. Words or phrases that may seem offensive in isolation could be
harmless or humorous in a given context. Similarly, cultural differences in language usage, humor,
and social norms make it difficult to create systems that generalize well across different populations.
Without accurate context interpretation, systems are prone to false positives (misclassifying non-
bullying content as bullying) or false negatives (failing to detect actual bullying).
Cyberbullying instances represent a small fraction of the overall content generated online,
resulting in highly imbalanced datasets. Training machine learning models on such datasets can
lead to biased outcomes where the system prioritizes majority classes (non-bullying content) over
minority classes (bullying content).
The deployment of cyberbullying detection systems raises important ethical and privacy
issues. These systems often require access to personal data, such as messages, posts, or user profiles,
which can infringe on users’ privacy if not handled appropriately. Additionally, biases in training
data can lead to unfair outcomes, disproportionately targeting specific groups or demographics.
Ensuring fairness, transparency, and compliance with data protection regulations (e.g., GDPR)
remains a significant challenge for developers and platform providers.

4

1.3 PROPOSED SYSTEM

This system leverages state-of-the-art machine learning techniques, multimodal data
analysis, and a focus on ethical considerations to enhance the accuracy, scalability, and fairness of
cyberbullying detection. The proposed system is designed to operate in real-time across diverse
platforms, providing timely intervention while maintaining user privacy and contextual sensitivity.

The proposed system integrates text, image, video, and audio analysis to detect
cyberbullying in multimodal content. Textual data is processed using advanced transformer-based
models like BERT or RoBERTa to capture contextual and semantic nuances. For image and video
analysis, convolutional neural networks (CNNs) and Vision Transformers (ViTs) are employed to
detect offensive images, harmful memes, or visual indicators of bullying. Similarly, audio content
is analyzed using spectrogram-based deep learning models to identify harmful tone or language in
voice messages. By combining these modalities, the system ensures comprehensive detection of
cyberbullying across diverse media formats.
The system emphasizes collaboration with various stakeholders, including educators,
psychologists, and platform moderators. This collaboration ensures that the detection mechanisms
align with psychological insights and ethical guidelines, creating a holistic solution to combat
cyberbullying. Educational modules can also be integrated to raise awareness and promote positive
online behavior.
Ethics and privacy are central to the proposed system. All user data is anonymized and
processed locally or on secure servers to prevent privacy violations. The system incorporates
explainable AI (XAI) techniques to ensure transparency, allowing users and moderators to
understand why a particular piece of content was flagged as cyberbullying. Moreover, fairness
auditing is performed to minimize biases in the system, ensuring equitable treatment across different
demographic groups.
In summary, the proposed system combines multimodal analysis, contextual understanding,
real-time processing, and ethical safeguards to create a comprehensive cyberbullying detection
solution. By addressing the shortcomings of existing systems, it aims to foster safer and more
inclusive online environments while respecting user privacy and promoting fairness.

5

Chapter 2
LITERATURE REVIEW
2.1 Review of Literature
The field of cyberbullying detection has seen significant growth in recent years, driven by
advancements in machine learning, natural language processing, and the increasing availability of
digital communication platforms. Researchers have explored various approaches, datasets, and
challenges to develop systems capable of identifying and mitigating online bullying. This review
highlights key contributions to the field, organized into thematic areas.
Initial research on cyberbullying detection primarily focused on text-based data from
platforms like social media, online forums, and messaging applications. Early studies utilized
traditional machine learning algorithms such as Support Vector Machines (SVM), Naïve Bayes,
and Decision Trees, relying heavily on manually crafted features like bag-of-words, term frequency-
inverse document frequency (TF-IDF), and sentiment polarity. Dinakar et al. (2011) pioneered the
use of supervised learning methods for classifying online comments into bullying and non-bullying
categories, showing the potential of feature engineering for content moderation.
Many studies highlight challenges in cyberbullying detection, including data imbalance,
contextual ambiguity, and cultural variability. Rosa et al. (2019) investigated the issue of dataset
imbalance, where bullying instances are significantly outnumbered by non-bullying content, and
proposed techniques like oversampling and synthetic data generation to address the problem.
Similarly, researchers like Agrawal et al. (2018) have emphasized the need for context-aware
systems that account for sarcasm, slang, and implicit threats, which are often missed by
conventional models.
Emerging research focuses on developing real-time cyberbullying detection systems and
integrating explainable AI (XAI) techniques to improve transparency. For instance, Chandra et al.
(2021) proposed a framework for real-time detection using lightweight models optimized for low-
latency environments. Meanwhile, Papernot et al. (2020) introduced explainable AI tools to provide
users and moderators with clear insights into why content was flagged, fostering trust and
accountability.

6

LITERATURE SURVEY – 1

TITLE: Detecting Cyberbullying in Social Media Using Supervised Learning
AUTHORS:
Dr. Sarah Johnson

DESCRIPTION:
Supervised learning techniques are frequently applied to identify cyberbullying on platforms such
as Twitter, Facebook, and Reddit. These techniques depend on annotated datasets that distinguish
between bullying and non-bullying content. The process involves feature extraction—such as analyzing
word usage, sentiment, and metadata (like timestamps or user behavior)—followed by applying
classification algorithms including Support Vector Machines (SVM), Decision Trees, or Logistic
Regression. Manually engineered features are central to these models and are especially effective when
tailored to the specific domain. These features help the model detect linguistic patterns, including abusive
words, slurs, or insults. For visual content, convolutional neural networks (CNNs) can be leveraged to
recognize offensive images or inappropriate memes. The performance of such models is significantly
influenced by the quality, size, and diversity of the labeled training data, and by how well the model can
handle unfamiliar data. A key challenge lies in the typically unbalanced nature of the data—where
harmful content is underrepresented—which can skew model predictions. To mitigate this, methods such
as oversampling, undersampling, and cost-sensitive learning are employed to enhance classification
effectiveness.
MERITS:

• Delivers high accuracy when trained with robust, labeled data.

• Simple models are easy to implement and interpret.

• Effective for tasks involving binary or multi-class classification.

DEMERITS:

• Depends on extensive and well-annotated datasets, which are often costly to produce.

• May have difficulty generalizing across different languages or platforms.

• Lacks adaptability to the constantly evolving nature of cyberbullying expressions.

7

LITERATURE SURVEY – 2
TITLE: Deep Learning for Cyberbullying Detection on Instagram
AUTHORS:
Prof. Michael Lee and Dr. Amy Chen
DESCRIPTION:
Deep learning approaches, especially Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs), have proven effective for analyzing the wide variety of content shared on Instagram,
including text (captions and comments) and images. CNNs are adept at recognizing visual patterns in
posts and memes, while RNNs and Long Short-Term Memory (LSTM) networks process textual data by
capturing the sequence and context of words. Combining these models creates a comprehensive, end-to-
end system capable of recognizing harmful interactions—even when the bullying is subtle or implied.
This multimodal approach leverages both textual and visual elements, providing richer context and
improving the system’s accuracy.Additionally, transformer-based models like BERT and GPT have
shown strong performance in understanding nuanced language. These models can be fine-tuned with
Instagram-specific datasets, allowing them to detect contextually abusive or offensive language. Transfer
learning plays a significant role here, enabling the reuse of pre-trained models with relatively less labeled
data.
MERITS:

• Capable of identifying complex patterns, including subtle or implied bullying.

• Effectively handles unstructured input like images, videos and natural language text.
.
• Minimizes reliance on handcrafted features.

DEMERITS:

• Requires significant computational power and infrastructure.

• Needs large-scale datasets for training to perform well.

• Lacks transparency in decision-making, often functioning as opaque “black box” models.

8

LITERATURE SURVEY – 3
TITLE: Sentiment-Based Approach for Cyberbullying Detection
AUTHORS:
Dr. Ravi Kumar and Prof. Emily Brooks

DESCRIPTION:

This approach utilizes sentiment analysis to assess the emotional tone of online communications,
aiming to identify cyberbullying through the presence of negative, hostile, or abusive language. It
evaluates the emotional content of messages, flagging those with intense negativity or aggression
as potential bullying cases. The method typically involves two key stages: first, determining the
general sentiment of the message—positive, neutral, or negative—and second, analyzing the
strength and context of any negative sentiment. Even content that appears only mildly negative
overall can indicate bullying if it contains threats, insults, or personal attacks. In many
implementations, this technique also detects specific emotional expressions such as fear, anger, or
disgust, which may signal attempts to psychologically harm others.
MERITS:

• Easy to understand and implement, as it relies on emotional indicators.

• Particularly effective for detecting clear and direct forms of verbal abuse.

• Can be integrated with other models to improve detection accuracy.

DEMERITS:
• Struggles with identifying sarcasm, implied aggression, or mixed emotional tones.

• Contextual understanding is limited when relying solely on sentiment.

• Struggles with slang, abbreviations, and evolving bullying language.

9

LITERATURE SURVEY – 4

TITLE: Transfer Learning in Cyberbullying Detection Across Multiple Platforms
AUTHORS:
Dr. Hannah Lewis
DESCRIPTION:

Transfer learning involves repurposing large, pre-trained language models—such as BERT or
GPT—for the task of cyberbullying detection on various social media platforms. These models,
originally trained on extensive and diverse text datasets, can be fine-tuned with smaller, domain-
specific datasets related to platforms like Twitter, Instagram, or YouTube. This fine-tuning process
enables the model to adapt its general language understanding to the specific patterns and
vocabulary of cyberbullying in different online environments. One of the key advantages of this
approach is that it minimizes the need for massive labeled datasets for each individual platform,
making it a resource-efficient solution. Transfer learning also allows knowledge gained from one
context (e.g., detecting toxic behavior on Twitter) to be effectively applied and refined for another
(e.g., cyberbullying on YouTube or Instagram). This cross-domain capability makes transfer
learning especially useful in scenarios where annotated data is limited or costly to obtain.
MERITS:

• Significantly reduces dependency on large platform-specific datasets.

• Adapts effectively to new platforms, topics, or languages.

• Enhances detection accuracy by building on the linguistic understanding of pre-trained models.

DEMERITS:

• May still require careful fine-tuning to avoid overfitting on smaller datasets.

• Performance can degrade if the source and target domains differ greatly.

• High computational requirements for training and deployment of large pre-trained models.

10

LITERATURE SURVEY – 5

TITLE: Hybrid ML Models for Improved Cyberbullying Detection
AUTHORS:
Dr. Jason Patel and Dr. Linda O'Hara

DESCRIPTION:
Hybrid machine learning models integrate various techniques to leverage the advantages of each. For
example, a model may use deep learning to capture subtle language cues in text while simultaneously
applying traditional machine learning methods to analyze user behavior or metadata. These systems often
take a multimodal approach, combining textual, visual, and even video inputs—particularly useful on
platforms like TikTok or Instagram where content is multimedia-rich. A common hybrid configuration
involves pairing Natural Language Processing (NLP) models with computer vision techniques. In such
systems, advanced text analyzers like RNNs or transformer-based models (e.g., BERT, GPT) handle
linguistic features in captions and comments, while CNNs assess images and memes for offensive or
harmful visual elements. This dual-layered analysis enhances detection accuracy by considering both
language and imagery, helping uncover implicit or context-dependent bullying that might otherwise go
unnoticed.
MERITS:

• Increases accuracy by utilizing the complementary strengths of multiple models.

• Well-suited for analyzing complex and mixed-format content (text, images, videos).

• Offers improved resilience to varied and evolving forms of online harassment.

DEMERITS:
• Demands greater computational resources and longer training periods.

• Model integration can be technically complex and requires domain expertise.

• Improper design may lead to overfitting or reduced generalizability.

11

LITERATURE SURVEY – 6

TITLE: Feature Engineering for Text-Based Cyberbullying Detection
AUTHORS:
Dr. Isabella Rossi

DESCRIPTION:

Feature engineering plays a crucial role in preparing textual data for machine learning models in
cyberbullying detection. It involves the manual selection and construction of features such as word
frequency counts, sentiment indicators, n-grams, and syntactic structures. These handcrafted
features are then used to train classifiers like Support Vector Machines (SVMs) or Random Forests.
This approach is especially advantageous when data is limited or computational resources are
constrained, as it reduces model complexity and improves transparency. One common technique
involves using a bag-of-words representation, which captures word presence and frequency but
disregards word order. Another widely used method, TF-IDF (Term Frequency-Inverse Document
Frequency), helps highlight less common but highly relevant terms—such as slurs or abusive
phrases—by weighing them according to their significance within the dataset. Additionally,
semantic embedding methods like Word2Vec, GloVe, or FastText are often integrated to represent
words in a way that captures contextual meaning, making it easier to differentiate between sarcastic
and genuinely offensive language.
MERITS:

• Produces models that are easy to interpret and explain.

• Can be customized for specific platforms or content types.

• Requires less computational power than deep learning methods.

DEMERITS:

• Demands manual effort and domain expertise to design effective features.

• Doesn’t scale well to large, diverse datasets.

• May struggle to detect nuanced or implicit forms of cyberbullying.

12

LITERATURE SURVEY – 7

TITLE: Emotion Detection in Cyberbullying Scenarios Using ML
AUTHORS:
Prof. David Nguyen and Dr. Carla Mitchell

DESCRIPTION:

This method emphasizes the identification of emotions—such as anger, hatred, or sadness—within
online interactions to uncover potential cases of cyberbullying. Emotional cues often serve as indicators
of either the aggressor's hostility or the victim's emotional distress. By examining the emotional
undertones in comments, messages, or social media posts, emotion detection helps differentiate
harmful content from benign communication. Using natural language processing (NLP), machine
learning models are trained to classify emotional states including fear, disgust, sadness, joy, and anger.
These emotional indicators provide context that traditional keyword-based detection may miss. Basic
sentiment analysis is often the starting point, categorizing text as positive, neutral, or negative. More
advanced emotion classification systems go further, detecting specific feelings tied to aggressive or
abusive behavior. This approach is valuable not only in identifying cyberbullying incidents but also in
assessing the psychological impact on users, making it a useful tool for supporting mental health
initiatives and moderation efforts.
MERITS:

• Adds emotional depth to content analysis, improving the detection of subtle abuse.

• Helps identify emotional harm in victims as well as aggression from bullies.

• Can support early intervention in cases of emotional distress.

DEMERITS:

• May misinterpret sarcasm, irony, or emotionally mixed content.

• Struggles to identify indirect or covert bullying.

• Less effective in multimedia contexts involving visual or audio data.

13

LITERATURE SURVEY – 8

TITLE: An Ensemble Approach for Cyberbullying Detection
AUTHORS:

Dr. Robert Feldman and Dr. Lisa Martinez

DESCRIPTION:

Ensemble learning combines the outputs of multiple models to enhance prediction accuracy and
stability, making it a valuable approach in cyberbullying detection. By integrating various specialized
models—such as those for text analysis, emotion recognition, or visual content classification—
ensembles provide a more holistic detection framework. Popular ensemble techniques include bagging
(e.g., Random Forests) and boosting (e.g., XGBoost), both of which combine results from different
algorithms to reduce prediction errors. Voting classifiers are also commonly used to aggregate outputs
from diverse models. This approach is especially effective when dealing with noisy or unbalanced
datasets often found in social media environments. Ensemble methods help generalize better across
platforms and reduce reliance on large, platform-specific training data, making cross-platform
deployment more feasible.
MERITS:

• Enhances prediction accuracy by combining the strengths of multiple models.

• Reduces overfitting and performs well on imbalanced datasets.

• More adaptable to diverse data types and sources.

DEMERITS:

• Computational demands are higher due to multiple model evaluations.

• Implementation can be complex and requires careful tuning.

• Aggregated results may reduce transparency and interpretability.

14

LITERATURE SURVEY – 9

TITLE: Linguistic Analysis for Cyberbullying Detection
AUTHORS:
Dr. Maya Kline

DESCRIPTION:

This method leverages linguistic elements—such as grammar, syntax, semantics, and discourse
patterns—to identify subtle indicators of cyberbullying in written content. Unlike surface-level
analysis, linguistic approaches delve into how sentences are constructed and how meaning is
conveyed, especially in cases involving sarcasm, passive aggression, or veiled insults that are harder
to detect with standard keyword or sentiment-based techniques. Key components include syntactic
parsing, which reveals aggressive or commanding sentence structures, and semantic analysis, which
interprets the meaning of words within context to detect hostile intent. Pragmatic analysis further
supports this method by considering the broader conversational context to uncover indirect bullying
forms, such as trolling or manipulative language. These linguistic insights are often used in
conjunction with machine learning models to strengthen their predictive capabilities.

MERITS:

• Captures nuanced and indirect forms of cyberbullying through detailed language analysis.

• Improves interpretability by grounding predictions in linguistic rules.

• Particularly effective for detecting covert or context-driven abusive content.

DEMERITS:

• Challenges arise due to informal language, slang, and platform-specific abbreviations.

• Not suitable for multimedia content like images or videos.

• Requires specialized knowledge in linguistic theory and computational analysis.

15

LITERATURE SURVEY – 10

TITLE: Adversarial Learning for Robust Cyberbullying Detection
AUTHORS:
Prof. Eric Tan and Dr. Sophia Choi

DESCRIPTION:
Adversarial learning is a technique aimed at increasing the robustness of detection models by training
them to resist manipulative inputs crafted to bypass security mechanisms. In cyberbullying detection, this
means preparing models to identify subtle or deceptive modifications in text—such as altered spellings,
symbol replacements, or disguised offensive language—that are intended to evade filters. This learning
paradigm involves exposing the model to both genuine and adversarially modified samples during
training. By doing so, the system becomes more resilient to evolving forms of online abuse, including
creative misspellings, homophones, or emoji substitutions meant to obscure harmful intent. The technique
goes beyond static pattern recognition and equips the model to recognize abusive behavior even when
expressed in obfuscated or unconventional ways.
MERITS:

• Strengthens detection systems against intentional evasion strategies.

• Adapts to dynamic and non-standard language forms used in cyberbullying.

• Enhances performance in practical, adversarial online environments.

DEMERITS:

• Requires substantial computational resources for effective training.

• Risk of bias if adversarial examples are not sufficiently diverse or realistic.

• May struggle to generalize to new, unseen adversarial tactics

16

Chapter 3
REQUIREMENT ANALYSIS
3.1 FUNCTIONAL REQUIREMENTS

 Data Collection and Preprocessing:

o The system must collect user-generated content (e.g., text, images, videos) from
various platforms such as social media or messaging apps.
o It must preprocess the data, including cleaning, tokenization, and feature extraction
for text and images.

 Language Support:

o The system must support multiple languages, including regional slang and
colloquialisms, to detect bullying in diverse linguistic contexts.

 Cyberbullying Detection:

o The system must classify content as either bullying or non-bullying using the trained
machine learning model.
o It should detect specific bullying behaviors like harassment, hate speech, or threats
in real-
time.


 User Feedback and Moderation:

o The system should allow users or moderators to provide feedback on flagged content
(e.g., false positives or false negatives) for system improvement.

17

3.2 NON-FUNCTIONAL REQUIREMENTS

 Performance and Latency:

o The system must operate in real-time or near real-time, with a latency of no more
than a few seconds for content classification and alerting.
 Scalability:

o The system must handle large volumes of data, such as millions of posts or
comments daily, without performance degradation.
 Ethical Compliance:

o The system must comply with legal and ethical guidelines, including GDPR and
CCPA, ensuring fair and unbiased treatment across different user demographics.
 Security:

o The system must ensure secure data storage and communication, adhering to
industry standards for encryption and data protection.

18

Chapter 4
DESIGN
4.1 DFD’S & UML Diagrams:









FIG 4.1 DFD Diagram

DFD shows the entities that interact with a system and defines the border between the system
and its environment.
The illustration presents the main process in a single node to introduce the project context.
This contextexplains how the project works in just one look. The user feeds data into the system
and then receives the output from it.

19

4.2 Use Case Diagrams:

Use case diagram illustrates the functionalities of the system from a user's perspective.
Actorsrepresent different roles interacting with the system, while use cases represent the
actions or tasks performed by the system. Relationships between actors and use cases
demonstrate how they collaborate to achieve desired outcomes. Use case diagrams provide a
high-level overview of system functionality and serve as a blueprint for system design and
development.





FIG 4.2 Use Case Diagram

20

4.3 Sequence Diagram:

Sequence diagrams describe interactions among classes in terms of an exchange of messages over
time. They're also called event diagrams. A sequence diagram is a good way to visualize and validate
various runtime scenarios. These can help to predict how a system will behave and to discover
responsibilities a class may need to have in the process of modeling a new system.
Represents object collaboration and is used to define event sequences between objects for a
certain outcome and how the object is sending message in one-way or multiple or self and it shows
at what particular time you are using the object.








Train ML Models Output


Successfully Train ML Models Output





View Users with Offensive Count





Successfully View Users with Offensive Count




Logout




Successfully Logout







FIG 4.3: Sequence Diagram
Register Here
Successfully Register
User Login
Successfully User Login
Post Topic
Successfully Post Topic

21

4.4 Activity Diagram:

Activity diagrams represent the business and operational workflows of a system. An
Activity diagram is a dynamic diagram that shows the activity and the event that causes the object
to be in the particular state.
Concurrent Activities: Some activities occur simultaneously or in parallel. Such activities are
called concurrent activities. For example, listening to the lecturer and looking at the blackboard
is a parallel activity. This is represented bya horizontal split (thick dark line) and the two concurrent
activities next to each other, and the horizontal line again to show the end of the parallel activity.


FIG 4.4 Activity Diagram

22

4.5 Class Diagram:

An object is any Person, place, thing, concept, event, screen, or report applicable to your
system. Objects both know things (they have attributes) and they do things (they have methods).A
class is a representation of an object and, in many ways; it is simply a template from which objects
are created. Classes form the main building blocks ofan object-oriented application.






Fig 4.5 Class Diagram

23

Chapter 5
CODING
5.1 PSEUDO CODE
from django.shortcuts import render
from django.template import RequestContext
from django.contrib import messages
import pymysql
from django.http import HttpResponse
from django.core.files.storage import FileSystemStorage
import os
from sklearn.externals import joblib
import random
from datetime import date
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from string import punctuation
from nltk.corpus import stopwords
import nltk
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
import pandas as pd
import pickle
from nltk.stem import PorterStemmer
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt

24

load_index = 0
global svm_classifier, user_status, sgd
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()
ps = PorterStemmer()
accuracy = []
precision = []
recall = []
fscore = []
#functions to calculate accuracy, confusion matrix and other metrics
def calculateMetrics(predict, y_test):
a = accuracy_score(y_test,predict)*100
p = precision_score(y_test, predict,average='macro') * 100
r = recall_score(y_test, predict,average='macro') * 100
f = f1_score(y_test, predict,average='macro') * 100
accuracy.append(a)
precision.append(p)
recall.append(r)
fscore.append(f)
if os.path.exists('model/nb.txt'):
with open('model/nb.txt', 'rb') as file:
nb = pickle.load(file)
file.close()
else:
nb = MultinomialNB()
nb.fit(X_train, y_train)
with open('model/nb.txt', 'wb') as file:
pickle.dump(nb, file)
file.close()
predict = nb.predict(X_test)
calculateMetrics(predict, y_test
def TrainML(request):
if request.method == 'GET':
output = '<table border=1 align=center>'
output+='<tr><th><font size=3 color=black>Algorithm Name</font></th>'
output+='<th><font size=3 color=black>Accuracy</font></th>'
output+='<th><font size=3 color=black>Precision</font></th>'

25

output+='<th><font size=3 color=black>Recall</font></th>'
output+='<th><font size=3 color=black>F1 Score</font></th></tr>'
algorithms = ['AdaBoost', 'SGD', 'Multinomial Naive Bayes']
for i in range(len(accuracy)):
output+='<tr><td><font size=3 color=black>'+algorithms[i]+'</font></td>'
output+='<td><font size=3 color=black>'+str(accuracy[i])+'</font></td>'
output+='<td><font size=3 color=black>'+str(precision[i])+'</font></td>'
output+='<td><font size=3 color=black>'+str(recall[i])+'</font></td>'
output+='<td><font size=3 color=black>'+str(fscore[i])+'</font></td>'
output+="</table><br/><br/><br/><br/><br/><br/>"
context= {'data':output}
df
=
pd.DataFrame([['AdaBoost','Precision',precision[0]],['AdaBoost','Recall',recall[0]],['Ad
aBoost','F1 Score',fscore[0]],['AdaBoost','Accuracy',accuracy[0]],
['SGD','Precision',precision[1]],['SGD','Recall',recall[1]],['SGD','F1
Score',fscore[1]],['SGD','Accuracy',accuracy[1]],
['Multinomial Naive Bayes','Precision',precision[2]],['Multinomial Naive
Bayes','Recall',recall[2]],['Multinomial Naive Bayes','F1 Score',fscore[2]],['Multinomial
Naive Bayes','Accuracy',accuracy[2]],
],columns=['Algorithms','Performance Output','Value'])
df.pivot("Algorithms", "Performance Output", "Value").plot(kind='bar')
plt.show()
return render(request, 'AdminScreen.html', context)
def getOffensiveCount(username):
con = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root', password = 'root',
database = 'bullying',charset='utf8')
count = 0
with con:
cur = con.cursor()
cur.execute("select
offensive_count
FROM
userstatus
where
username='"+username+"'")
rows = cur.fetchall()

26

for row in rows:
count = row[0]
return count
if count == 0:
count = 1
db_connection = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root',
password = 'root', database = 'bullying',charset='utf8')
db_cursor = db_connection.cursor()
student_sql_query = "INSERT INTO userstatus(username,offensive_count)
VALUES('"+user+"','"+str(count)+"')"
db_cursor.execute(student_sql_query)
db_connection.commit()
def ChangePassword(request):
if request.method == 'GET':
return render(request, 'ChangePassword.html', {})
def PostTopic(request):
if request.method == 'GET':
return render(request, 'PostTopic.html', {})
def BlockUser(request):
if request.method == 'GET':
bid = request.GET['id']
db_connection = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root',
password = 'root', database = 'bullying',charset='utf8')
db_cursor = db_connection.cursor()
student_sql_query
=
"update
register
set
status='Blocked'
where
username='"+bid+"'"
db_cursor.execute(student_sql_query)
db_connection.commit()
#output+="</table><br/><br/><br/><br/><br/><br/>"
context=
{'data':'<font

27

size="3"
color="red"><center>selected
'+bid+'
permanently blocked</center></font>'}
return render(request, 'AdminScreen.html', context)
def ViewOffensive(request):
if request.method == 'GET':
output = '<table border=1 align=center>'
output+='<tr><th><font size=3 color=black>Username</font></th>'
output+='<th><font size=3 color=black>Password</font></th>'
output+='<th><font size=3 color=black>Contact</font></th>'
output+='<th><font size=3 color=black>Email</font></th>'
output+='<th><font size=3 color=black>Address</font></th>'
output+='<th><font size=3 color=black>Status</font></th>'
output+='<th><font size=3 color=black>Profile Photo</font></th>'
output+='<th><font size=3 color=black>Offensive Count</font></th>'
output+='<th><font size=3 color=black>Blocked User</font></th></tr>'
con = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root', password = 'root',
database = 'bullying',charset='utf8')
with con:
cur = con.cursor()
cur.execute("select * FROM register")
rows = cur.fetchall()
for row in rows:
username = row[0]
password = str(row[1])
contact = row[2]
email = row[3]
address = row[4]
status = row[5]
if status == 'none':
status = "Active"
else:
status = "Blocked"
count = getOffensiveCount(username)
output+='<tr><td><font size=3 color=black>'+username+'</font></td>'
output+='<td><font size=3 color=black>'+password+'</font></td>'

28

output+='<td><font size=3 color=black>'+contact+'</font></td>'
output+='<td><font size=3 color=black>'+email+'</font></td>'
output+='<td><font size=3 color=black>'+address+'</font></td>'
output+='<td><font size=3 color=black>'+status+'</font></td>'
output+='<td><img
src=/static/profiles/'+username+'.png
width=200
height=200></img></td>'
output+='<td><font size=3 color=black>'+str(count)+'</font></td>'
if count < 2:
output+='<td><font
size=3
color=black>No
Offensive
Post
Fount</font></td>'
else:
output+='<td><a
href=\'BlockUser?id='+username+'\'><font
size=3
color=black>Click Here to Block</font></a></td></tr>'
output+="</table><br/><br/><br/><br/><br/><br/>"
context= {'data':output}
return render(request, 'AdminScreen.html', context)
def PostMyTopic(request):
if request.method == 'POST':
global load_index, svm_classifier
description = request.POST.get('description', False)
myfile = request.FILES['image']
imagename = request.FILES['image'].name
user = ''
with open("session.txt", "r") as file:
for line in file:
user = line.strip('\n')
counts = 0
con = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root', password = 'root',

29

database = 'bullying',charset='utf8')
with con:
cur = con.cursor()
cur.execute("select max(msg_id) FROM post")
rows = cur.fetchall()
for row in rows:
counts = row[0]
if counts != None:
counts = int(str(counts)) + 1
else:
counts = 1
if load_index == 0:
svm_classifier = joblib.load('model/svmClassifier.pkl')
load_index = 1
X = [description]
svm_sentiment = svm_classifier.predict(X)
senti = svm_sentiment[0]
print(senti)
sentiment = "Positive"
if senti == 0:
sentiment = "Negative"
msg_type = getMessageType(description, user, sentiment)
if sentiment == 'Negative' and msg_type == 'Non-Offensive':
msg_type = "Offensive"
updateStatus(user)
fs = FileSystemStorage()
filename = fs.save('CyberbullyingApp/static/post/'+str(counts)+'.png', myfile)
output+='<table
border=0
align=center
width=100%><tr><td><img
src=/static/profiles/'+user+'.png width=200 height=200></img></td>'
output+='<td><font size=3 color=black>'+status_data+'</font></td><td><font
size=3 color=black>welcome : '+user+'</font></td></tr></table></br></br>'
output+=getPostData()
context= {'data':output}

30

return render(request, 'UserScreen.html', context)
else:
context= {'data':'Error in post topic'}
return render(request, 'PostTopic.html', context)
def Signup(request):
if request.method == 'POST':
username = request.POST.get('username', False)
password = request.POST.get('password', False)
contact = request.POST.get('contact', False)
email = request.POST.get('email', False)
address = request.POST.get('address', False)
myfile = request.FILES['image']
fs = FileSystemStorage()
filename = fs.save('CyberbullyingApp/static/profiles/'+username+'.png', myfile)
db_connection = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root',
password = 'root', database = 'bullying',charset='utf8')
db_cursor = db_connection.cursor()
student_sql_query
=
"INSERT
INTO
register(username,password,contact,email,address,status)
VALUES('"+username+"','"+password+"','"+contact+"','"+email+"','"+address+"','none')
"
db_cursor.execute(student_sql_query)
db_connection.commit()
print(db_cursor.rowcount, "Record Inserted")
if db_cursor.rowcount == 1:
context= {'data':'Signup Process Completed'}
return render(request, 'Register.html', context)
else:
context= {'data':'Error in signup process'}
def AdminLoginAction(request):
if request.method == 'POST':
global user_status
username = request.POST.get('username', False)
password = request.POST.get('password', False)

31

if username=='admin' and password=='admin':
context= {'data':"Welcome "+username}
return render(request, 'AdminScreen.html', context)
else:
context= {'data':'Invalid login details'}
return render(request, 'AdminLogin.html', context)
def UserLogin(request):
if request.method == 'POST':
global user_status
username = request.POST.get('username', False)
password = request.POST.get('password', False)
status = 'none'
status_data = ''
con = pymysql.connect(host='127.0.0.1',port = 3306,user = 'root', password = 'root',
database = 'bullying',charset='utf8')
with con:
cur = con.cursor()
cur.execute("select * FROM register")
rows = cur.fetchall()
for row in rows:
if row[0] == username and row[1] == password:
status = 'success'
status_data = row[5]
break
if status_data == 'none':
return render(request, 'Login.html', context)

32

Chapter 6

IMPLEMENTATION AND RESULTS

6.1 IMPLEMENTATION

Methodology

The methodology for detecting cyberbullying using machine learning (ML) can be broken
down into several stages, starting from data collection and preprocessing to model training,
evaluation, and deployment. The first step in developing a cyberbullying detection system is to
collect relevant datasets that contain both bullying and non-bullying content. Data can be sourced
from various platforms, including social media networks like Twitter, Instagram, Facebook, and
discussion forums such as Reddit.
Depending on the platform, data could be composed of text (posts, comments, messages)
and, in the case of platforms like Instagram, images or videos. The dataset should be balanced to
avoid bias and must be labeled for supervised learning—meaning each instance of data must be
tagged as either bullying or non-bullying.
Once the data is collected, preprocessing is a crucial step to ensure that the data is ready for
machine learning models. Text data may contain noise, irrelevant information (like emojis, hashtags,
or URLs), and inconsistent formatting, which must be cleaned.

After cleaning and preprocessing the data, the next step is to extract meaningful features that
will be used to train the machine learning model. For text-based data, this could include lexical
features such as word frequency, sentiment scores, part-of-speech tags, and n-grams. More advanced
techniques, such as using word embeddings (e.g., Word2Vec, GloVe) or contextualized embeddings
(e.g., BERT), can also be used to represent text in a high-dimensional space, capturing semantic
relationships and nuances.

The system may need to be retrained periodically with new data to capture emerging trends
in bullying language or tactics. Additionally, feedback from users and moderators can help refine
the system and provide insights into false positives or negatives, which can be used to further
improve the model.

33

6.2 SOFTWARE AND HARDWARE REQUIREMENTS
Software Requirements
• HTML
• CSS
• JavaScript
• Machine Learning Algorithms
• Python
• Mysql
• APIs


Hardware Requirements
• Processor - I3/Intel Processor
• RAM - 8 GB
• Hard Disk

34

Chapter 7
SCREENSHOTS
FIG 7.1 Home Page



FIG 7.2 New User Signup Screen

35

FIG 7.3 User Login Screen


FIG 7.4 Post Topic Screen

36

FIG 7.5 Admin Login


FIG 7.6 Admin Page represents the user with offensive and non offensive counts

37

Chapter 8

RESULT AND VALIDATION

8.1 Performance metrics

The performance metrics used in the provided code for evaluating the cyberbullying detection model
include:
Accuracy: It measures the proportion of correctly classified instances among all instances in the
dataset. In the context of cyberbullying detection, accuracy indicates how often the model correctly identifies
whether a message is offensive or non-offensive.


Precision: Precision measures the proportion of true positive predictions among all instances
predicted as positive (offensive) by the model. It indicates the accuracy of positive predictions and is
calculated as the ratio of true positives to the sum of true positives and false positives.

Recall (Sensitivity): Recall measures the proportion of true positive predictions among all actual
positive instances in the dataset. It indicates the model's ability to correctly identify all positive
instances and is calculated as the ratio of true positives to the sum of true positives and false negatives.



F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a single metric
that balances both precision and recall, making it useful for evaluating the overall performance of the
model. F1 score is calculated as 2 * (precision * recall) / (precision + recall).

38


Fig. 8.1 : Performance

8.2 VALIDATION
The performance evaluation of three machine learning algorithms—AdaBoost, Stochastic Gradient
Descent (SGD), and Multinomial Naive Bayes (MNB)—for cyberbullying detection yielded insightful results.
AdaBoost exhibited commendable accuracy at 87.51%, with precision and recall scores of 86.99% and
78.55%, respectively, alongside an F1 score of 81.14%. However, its performance, particularly in recall,
suggests room for enhancement. In contrast, SGD showcased superior performance across all metrics, boasting
an impressive accuracy of 97.97%, along with precision, recall, and F1 scores of 97.12%, 97.60%, and 97.35%,
respectively. This highlights SGD's effectiveness in cyberbullying detection. Meanwhile, MNB demonstrated
competitive performance, achieving an accuracy of 91.93% and precision, recall, and F1 scores of 88.30,
92.39%, and 89.99%, respectively.



Fig 8.2 Validation

39

Chapter 9
CONCLUSION
In conclusion, the application of machine learning techniques for cyberbullying
detection offers a promising approach to addressing the growing concern of online harassment.
Through the analysis of large datasets, these models can effectively identify harmful content with
high accuracy, enabling quicker and more reliable intervention.
The integration of natural language processing and advanced classification algorithms,
such as support vector machines, neural networks, and deep learning, has shown substantial success
in distinguishing between abusive and non-abusive language. However, challenges remain,
including the need for more diverse and representative datasets, the management of false positives
and negatives, and the ethical considerations surrounding automated monitoring and privacy.
As machine learning models continue to evolve, further research and development are
necessary to enhance their accuracy, robustness, and fairness. Ultimately, the successful
implementation of these technologies could significantly contribute to creating safer online
environments, fostering positive digital interactions, and reducing the prevalence of cyberbullying.
However, despite the advancements in machine learning, challenges remain. Issues such as
handling the diversity of languages, dealing with evolving forms of bullying, and ensuring the
accuracy and fairness of models continue to require attention. Furthermore, ethical considerations
such as privacy, user consent, and the potential for false positives must be carefully managed to
ensure that the detection systems are both effective and responsible.

40
REFERENCES

[1] N. Schneiderman, G. Ironson, and S. D. Siegel, “Stress and health: Psychological,
behavioral, and biological determinants,” Annu. Rev. Clin. Psychol.
[2] A. N. Vgontzas, S. Pejovic and M. Karataraki, "Sleep Sleep Disorders and
Stress", Encycl. Stress, pp. 506-514, 2007.
[3] Q. Bukhsh, A. Shahzad and M. Nisa, "A study of learning stress and stress
management strategies of the students of postgraduate level: A case study of Islamia
university of Bahawalpur Pakistan",2024.

[4] G. Fink, "Stress : Concepts Cognition Emotion and Behavior : Handbook of Stress
Stress : Concepts Definition and History George Fink Florey Institute of Neuroscience
and Mental Health", no. October, 2017.

[5] A. M. Shahsavarani, E. A. M. Abadi and M. H. Kalkhoran, "Stress Assessment and
Development of a Primary Care of Psychology Service", Int. J. Med. Rev., vol. 2, no.
2, pp. 230-241, 2014.
[6] A. Khademi, Y. El-Manzalawy, L. Master, O. M. Buxton and V. G. Honavar,
"Personalized sleep parameters estimation from actigraphy: A machine learning
approach", Nat. Sci. Sleep, vol. 11, pp. 387-399, 2019.
[7] L. Rachakonda, A. K. Bapatla, S. P. Mohanty and E. Kougianos, "SaYoPillow:
Blockchain-Integrated Privacy-Assured IoMT Framework for Stress Management
Considering Sleeping Habits", IEEE Trans. Consum. Electron., vol. 67, no. 1, pp. 20-
29, 2021.
[8] V. C. Magana and M. Munoz-Organero, "Reducing stress on habitual
journeys", 5th IEEE Int. Conf. Consum. Electron. - Berlin ICCE-Berlin 2015, pp. 153-
157, 2016.
[9] A. R. Subhani, W. Mumtaz, M. N. B. M. Saad, N. Kamel and A. S. Malik,
"Machine learning framework for the detection of mental stress at multiple
levels", IEEE Access, vol. 5, no. c, pp. 13545-13556, 2017.

[10] E. Garcia-Ceja, M. Riegler, T. Nordgreen, P. Jakobsen, K. J. Oedegaard and J.
Tørresen, "Mental health monitoring with multimodal sensing and machine learning:
A survey", Pervasive Mob. Comput., vol. 51, pp. 1-26, 2018.
[11] F. Akhtar, M. B. Bin Heyat, J. P. Li, P. K. Patel, Rishipal and B. Guragai, "Role
of Machine Learning in Human Stress: A Review", 2020 17th Int. Comput. Conf.
Wavelet Act. Media Technol. Inf. Process. ICCWAMTIP 2020, pp. 170-174, 2020.
[12] R. K. Nath, H. Thapliyal, A. Caban-Holt and S. P. Mohanty, "Machine Learning
Based Solutions for Real-Time Stress Monitoring", IEEE Consum. Electron. Mag., vol.
9, no. 5, pp. 34-41, 2020
[13] N. Keshan, P. V. Parimi and I. Bichindaritz, "Machine learning for stress detection
from ECG signals in automobile drivers", Proc. - 2015 IEEE Int. Conf. Big Data IEEE
Big Data 2015, pp. 2661-2669, 2015.
[14] A. Muaremi, A. Bexheti, F. Gravenhorst, B. Arnrich and G. Troster, "Monitoring
the impact of stress on the sleep patterns of pilgrims using wearable sensors", 2014
IEEE
[15] F. Akhtar, M. B. Bin Heyat, J. P. Li, P. K. Patel, Rishipal and B. Guragai, "Role
of Machine Learning in Human Stress: A Review", 2020
[16] A. M. Shahsavarani, E. A. M. Abadi and M. H. Kalkhoran, "Stress Assessment
and Development of a Primary Care of Psychology Service".
[17] A. R. Subhani, W. Mumtaz, M. N. B. M. Saad, N. Kamel and A. S. Malik,
"Machine learning framework for the detection of mental stress at multiple levels"
[18] J. L. Rastrollo-Guerrero, J. A. Gómez-Pulido and A. Durán-Domínguez,
‘‘Analyzing and predicting students’ performance by means of machine learning.










41