Fake_News_Classification_Using_SPM_and_Hybrid_Classifiers-Review[1].pptx

premrajmmuruganandam 8 views 27 slides Aug 30, 2025
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Classification of spm


Slide Content

ENHANCING THE FAKE NEWS CLASSIFICATION USING SPM AND HYBRID CLASSIFIER Presented By Abirami M-225171008 Dharshini B-225171031 Jainab M-225171049 Project Coordinator Dr.M.Martinaa., Assistant Professor, CSE, SRC Dr.P.Umamaheswari, Assistant Professor, CSE,SRC Mr. M.JeyaPandiyan, Assistant Professor, CSE,SRC 4/16/2025 SRC,SASTRA 1

ABSTRACT Fake news detection is a crucial problem in today's digital world, where misinformation spreads rapidly through social media and news websites. Traditional methods struggle with linguistic complexities and evolving fake news tactics. The proposed work enhances fake news detection using XGBoost classifier, along with feature extraction techniques like TF-IDF and word embeddings. The PrefixSpan algorithm is also used for sequential pattern mining to improve classification accuracy. 4/16/2025 SRC,SASTRA 2

Existing system Uses Apriori, TKS, and CM-SPAM algorithms for sequential pattern mining. Extracted patterns are used as features for classification. Classifies fake news using seven machine learning models (BNB, GNB, DT, RF, SVM, KNN, LR) and one deep learning model (MLP). Performance is evaluated using accuracy, precision, F1-score, and recall. These methods can be computationally expensive and often lack interpretability. 4/16/2025 SRC,SASTRA 3

Drawbacks of Existing system High false positives due to reliance on surface-level text analysis. Overfitting on training data, leading to poor generalization. Computational inefficiency in handling large datasets. Lack of robustness in distinguishing real news from manipulated information. 4/16/2025 SRC,SASTRA 4

Proposed system The proposed work introduces a novel content-based framework for fake news classification Uses PrefixSpan to identify unique patterns in fake news. Uses TF-IDF and word embeddings for text representation. Implements XGBoost, a powerful ensemble learning classifier. Provides meaningful insights into linguistic structures. Streamlined process for faster and more accurate detection. 4/16/2025 SRC,SASTRA 5

Advantages of Proposed system Better linguistic pattern recognition with Sequential Pattern Mining. Computational efficiency with optimized text representation. Scalability for handling large datasets efficiently. Higher accuracy using XGBoost compared to traditional models. More interpretable results. 4/16/2025 SRC,SASTRA 6

Proposed System Algorithm Details PrefixSpan – Extracts frequent word sequences from fake news articles, helping identify common misinformation patterns. TF-IDF –Measures word importance in a document, filtering out common words while keeping key terms for classification. Word Embeddings –Represents words based on meaning and context, improving recognition of similar words in different usage. XGBoost – An ensemble learning algorithm that improves classification accuracy using multiple decision trees. 4/16/2025 SRC,SASTRA 7

Existing System And Proposed System Comparison 4/16/2025 SRC,SASTRA 8 Existing System Proposed System Relies heavily on frequent pattern matching with limited linguistic understanding. Uses meaningful text features to understand both pattern and context. Lacks adaptability to new or evolving fake news patterns. Adapts better to new types of fake news due to strong generalization. Offers limited interpretability, making model decisions hard to explain. Provides better interpretability with clearer feature importance. Requires more computation time and resources due to multiple model evaluations. Offers a streamlined and efficient approach, lowering computational requirements.

SYSTEM SPECIFICATIONS Hardware Requirements: Hard Disk : 256 GB. Monitor : 15 VGA Color. RAM : 8 GB. Processor : Core i3 Software Requirements : Operating system : Windows Programming language : Python IDE : Jupyter Notebook 4/16/2025 SRC,SASTRA 9

Overall Architecture Diagram 4/16/2025 SRC,SASTRA 10 Fake and real News Dataset Data Preprocessing Removing Special Characters Stopword Removal Lowercasing Lemmatization Extracting Pattern (PrefixSpan Algorithm) Feature Extraction TF-IDF Word Embedding Classification (XGBoost Classifier) Fake Real Evaluation (Accuracy,Precision,recall,F1score)

Data Flow Diagram 4/16/2025 SRC,SASTRA 11 News Articles Data Preprocessing Feature extraction Normalization News Classification Pattern Mining Feature Combination Evaluation Fig. Data Flow Diagram

Use Case Diagram 4/16/2025 SRC,SASTRA 12 Fig. Use Case Diagram

Class Diagram 4/16/2025 SRC,SASTRA 13 Fig. Class Diagram

Activity Diagram 4/16/2025 SRC,SASTRA 14 Fig. Activity Diagram

Sequence Diagram 4/16/2025 SRC,SASTRA 15 Fig. Sequence Diagram

Collaboration Diagram 4/16/2025 SRC,SASTRA 16 Fig. Collaboration Diagram

Modules Text Preprocessing Pattern Mining Feature Extraction Classification Perfomance Evaluation 4/16/2025 SRC,SASTRA 17

Data Preprocessing Text Cleaning – Removes special characters, punctuation, and unnecessary symbols. Stopword Removal – Eliminates common words (e.g., "the," "is," "and") that do not add value. Lowercasing – Converts all text to lowercase to maintain consistency. Lemmatization – Reduces words to their root form (e.g., "running" → "run") to treat related words as the same. 4/16/2025 SRC,SASTRA 18

Pattern Mining Extracts meaningful sequential patterns from preprocessed text to identify common structures in fake news. Identifies frequent patterns in fake news articles for better detection. Improves classification accuracy by capturing distinctive word sequences used in misinformation. 4/16/2025 SRC,SASTRA 19

Feature Extraction Converts extracted patterns into a numerical format for machine learning models. TF-IDF (Term Frequency-Inverse Document Frequency) – Measures word importance in a document relative to the dataset. Word Embeddings – Represents words as dense vectors to capture semantic meaning.   4/16/2025 SRC,SASTRA 20

Classification Uses XGBoost Classifier – A powerful machine learning algorithm that efficiently classifies news as fake or real based on extracted features. Learns from Sequential Patterns & Features – Combines insights from Pattern Mining, TF-IDF, and Word Embeddings to improve classification accuracy. Evaluated Using Performance Metrics – Assesses model performance with accuracy, precision, recall, and F1-score to ensure reliable fake news detection. 4/16/2025 SRC,SASTRA 21

SCREEN SHOT 4/16/2025 SRC,SASTRA 22

SCREEN SHOT 4/16/2025 SRC,SASTRA 23

SCREEN SHOT 4/16/2025 SRC,SASTRA 24

Conclusion The proposed fake news classification system effectively combines Sequential Pattern Mining ( PrefixSpan ), TF-IDF, word embeddings, and XGBoost classification to enhance the accuracy and reliability of fake news identification. Compared to existing systems, it offers improved performance by reducing false positives, increasing interpretability, and capturing both linguistic patterns and contextual meaning. This streamlined and efficient approach ensures better adaptability to evolving misinformation trends while maintaining computational efficiency. 4/16/2025 SRC,SASTRA 25

Future Enhancement In the future, the system can be enhanced to support real-time fake news detection by processing live data from news sources and social media. It can be extended to handle multilingual content, allowing detection across different languages. Incorporating image and video analysis would enable the system to identify fake news that includes manipulated media. Additionally, implementing automated dataset updates will help maintain accuracy by adapting to new patterns and evolving misinformation techniques. 4/16/2025 SRC,SASTRA 26

4/16/2025 SRC,SASTRA 27 THANK YOU!
Tags