premrajmmuruganandam
8 views
27 slides
Aug 30, 2025
Slide 1 of 27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
About This Presentation
Classification of spm
Size: 419.46 KB
Language: en
Added: Aug 30, 2025
Slides: 27 pages
Slide Content
ENHANCING THE FAKE NEWS CLASSIFICATION USING SPM AND HYBRID CLASSIFIER Presented By Abirami M-225171008 Dharshini B-225171031 Jainab M-225171049 Project Coordinator Dr.M.Martinaa., Assistant Professor, CSE, SRC Dr.P.Umamaheswari, Assistant Professor, CSE,SRC Mr. M.JeyaPandiyan, Assistant Professor, CSE,SRC 4/16/2025 SRC,SASTRA 1
ABSTRACT Fake news detection is a crucial problem in today's digital world, where misinformation spreads rapidly through social media and news websites. Traditional methods struggle with linguistic complexities and evolving fake news tactics. The proposed work enhances fake news detection using XGBoost classifier, along with feature extraction techniques like TF-IDF and word embeddings. The PrefixSpan algorithm is also used for sequential pattern mining to improve classification accuracy. 4/16/2025 SRC,SASTRA 2
Existing system Uses Apriori, TKS, and CM-SPAM algorithms for sequential pattern mining. Extracted patterns are used as features for classification. Classifies fake news using seven machine learning models (BNB, GNB, DT, RF, SVM, KNN, LR) and one deep learning model (MLP). Performance is evaluated using accuracy, precision, F1-score, and recall. These methods can be computationally expensive and often lack interpretability. 4/16/2025 SRC,SASTRA 3
Drawbacks of Existing system High false positives due to reliance on surface-level text analysis. Overfitting on training data, leading to poor generalization. Computational inefficiency in handling large datasets. Lack of robustness in distinguishing real news from manipulated information. 4/16/2025 SRC,SASTRA 4
Proposed system The proposed work introduces a novel content-based framework for fake news classification Uses PrefixSpan to identify unique patterns in fake news. Uses TF-IDF and word embeddings for text representation. Implements XGBoost, a powerful ensemble learning classifier. Provides meaningful insights into linguistic structures. Streamlined process for faster and more accurate detection. 4/16/2025 SRC,SASTRA 5
Advantages of Proposed system Better linguistic pattern recognition with Sequential Pattern Mining. Computational efficiency with optimized text representation. Scalability for handling large datasets efficiently. Higher accuracy using XGBoost compared to traditional models. More interpretable results. 4/16/2025 SRC,SASTRA 6
Proposed System Algorithm Details PrefixSpan – Extracts frequent word sequences from fake news articles, helping identify common misinformation patterns. TF-IDF –Measures word importance in a document, filtering out common words while keeping key terms for classification. Word Embeddings –Represents words based on meaning and context, improving recognition of similar words in different usage. XGBoost – An ensemble learning algorithm that improves classification accuracy using multiple decision trees. 4/16/2025 SRC,SASTRA 7
Existing System And Proposed System Comparison 4/16/2025 SRC,SASTRA 8 Existing System Proposed System Relies heavily on frequent pattern matching with limited linguistic understanding. Uses meaningful text features to understand both pattern and context. Lacks adaptability to new or evolving fake news patterns. Adapts better to new types of fake news due to strong generalization. Offers limited interpretability, making model decisions hard to explain. Provides better interpretability with clearer feature importance. Requires more computation time and resources due to multiple model evaluations. Offers a streamlined and efficient approach, lowering computational requirements.
SYSTEM SPECIFICATIONS Hardware Requirements: Hard Disk : 256 GB. Monitor : 15 VGA Color. RAM : 8 GB. Processor : Core i3 Software Requirements : Operating system : Windows Programming language : Python IDE : Jupyter Notebook 4/16/2025 SRC,SASTRA 9
Overall Architecture Diagram 4/16/2025 SRC,SASTRA 10 Fake and real News Dataset Data Preprocessing Removing Special Characters Stopword Removal Lowercasing Lemmatization Extracting Pattern (PrefixSpan Algorithm) Feature Extraction TF-IDF Word Embedding Classification (XGBoost Classifier) Fake Real Evaluation (Accuracy,Precision,recall,F1score)
Data Flow Diagram 4/16/2025 SRC,SASTRA 11 News Articles Data Preprocessing Feature extraction Normalization News Classification Pattern Mining Feature Combination Evaluation Fig. Data Flow Diagram
Use Case Diagram 4/16/2025 SRC,SASTRA 12 Fig. Use Case Diagram
Class Diagram 4/16/2025 SRC,SASTRA 13 Fig. Class Diagram
Data Preprocessing Text Cleaning – Removes special characters, punctuation, and unnecessary symbols. Stopword Removal – Eliminates common words (e.g., "the," "is," "and") that do not add value. Lowercasing – Converts all text to lowercase to maintain consistency. Lemmatization – Reduces words to their root form (e.g., "running" → "run") to treat related words as the same. 4/16/2025 SRC,SASTRA 18
Pattern Mining Extracts meaningful sequential patterns from preprocessed text to identify common structures in fake news. Identifies frequent patterns in fake news articles for better detection. Improves classification accuracy by capturing distinctive word sequences used in misinformation. 4/16/2025 SRC,SASTRA 19
Feature Extraction Converts extracted patterns into a numerical format for machine learning models. TF-IDF (Term Frequency-Inverse Document Frequency) – Measures word importance in a document relative to the dataset. Word Embeddings – Represents words as dense vectors to capture semantic meaning. 4/16/2025 SRC,SASTRA 20
Classification Uses XGBoost Classifier – A powerful machine learning algorithm that efficiently classifies news as fake or real based on extracted features. Learns from Sequential Patterns & Features – Combines insights from Pattern Mining, TF-IDF, and Word Embeddings to improve classification accuracy. Evaluated Using Performance Metrics – Assesses model performance with accuracy, precision, recall, and F1-score to ensure reliable fake news detection. 4/16/2025 SRC,SASTRA 21
SCREEN SHOT 4/16/2025 SRC,SASTRA 22
SCREEN SHOT 4/16/2025 SRC,SASTRA 23
SCREEN SHOT 4/16/2025 SRC,SASTRA 24
Conclusion The proposed fake news classification system effectively combines Sequential Pattern Mining ( PrefixSpan ), TF-IDF, word embeddings, and XGBoost classification to enhance the accuracy and reliability of fake news identification. Compared to existing systems, it offers improved performance by reducing false positives, increasing interpretability, and capturing both linguistic patterns and contextual meaning. This streamlined and efficient approach ensures better adaptability to evolving misinformation trends while maintaining computational efficiency. 4/16/2025 SRC,SASTRA 25
Future Enhancement In the future, the system can be enhanced to support real-time fake news detection by processing live data from news sources and social media. It can be extended to handle multilingual content, allowing detection across different languages. Incorporating image and video analysis would enable the system to identify fake news that includes manipulated media. Additionally, implementing automated dataset updates will help maintain accuracy by adapting to new patterns and evolving misinformation techniques. 4/16/2025 SRC,SASTRA 26