Hotel Review Classification(NLP Classification) PPT

RuhiSalwadgi 47 views 24 slides Sep 06, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

The Hotel dataset consists of 20,491 reviews and feedbacks for different hotels.
Our goal is to examine how travelers are communicating their positive and
negative experiences on online platforms for staying in a specific hotel.
The major objective is what are the attributes that travelers are co...


Slide Content

Hotel Rating Classification Team Members: Ashwini Salwadgi Anuja Borse Shubham Pawar Akshay Kumar Rudra Shukla Devesh Gaonkar Pranjalee Bokde

CONTENTS 01 02 03 Project Architecture Problem statement Introduction to ML classification 04 05 07 Dataset Details Data Preprocessing And EDA Feature engineering 08 Model Selection 09 Deployment

Project Architecture

Problem statement The Hotel dataset consists of 20,491 reviews and feedbacks for different hotels. Our goal is to examine how travelers are communicating their positive and negative experiences on online platforms for staying in a specific hotel. The major objective is what are the attributes that travelers are considering while selecting a hotel. With this managers can understand which elements of their hotel influence more in forming a positive review or improving the hotel brand image.

Introduction to NLP Natural Language Processing (NLP) is a field of computer science and artificial intelligence focused on the interaction between computers and human languages. It involves programming computers to process, analyze, and derive meaning from large amounts of natural language data. NLP is applied in various areas, including automatic question answering, text summarization, and language translation. Research in NLP spans across disciplines such as cognitive science, linguistics, and psychology. One significant application of NLP is text classification, where the goal is to categorize text into predefined labels based on its content.

NLP classification Text Classification: Text classification is a common NLP task used to solve business problems in various fields. The goal of text classification is to categorize or predict a class of unseen text documents, often with the help of supervised machine learning. Similar to a classification algorithm that has been trained on a tabular dataset to predict a class, text classification also uses supervised machine learning. The fact that text is involved in text classification is the main distinction between the two.

Dataset Details Hotel_Review.csv- the dataset we are using in our project. No. of Rows: 20,491 No. of Columns: 02

Data Preprocessing

Text Preprocessing

Data Visualization(EDA) Distribution of Feedback Labels Bar Plot of Feedback Counts Visual representation of the unique counts for each class (positive/negative).

Positive Reviews using Word cloud Negative Reviews using Word cloud

Top Bigrams Top Trigrams

Feature engineering Feature engineering in Natural Language Processing (NLP) involves transforming raw text data into meaningful features that can be used by machine learning algorithms to make predictions or generate insights. Unlike traditional structured data, text data is unstructured, so feature engineering in NLP often involves a series of pre-processing steps and the creation of specialized features to capture the nuances of language. Feature engineering in NLP is highly dependent on the specific problem and the type of data being used. The goal is to create features that best capture the underlying patterns in the text, leading to better model performance .

Sentiment Analysis Features Custom Features

Model Selection Logistic regression: Logistic regression is a fundamental machine learning algorithm that is widely used in Natural Language Processing (NLP) tasks, particularly for binary classification problems. Despite its simplicity, it performs well on many NLP tasks when combined with the right features and data preprocessing techniques. Logistic regression is trained using the maximum likelihood estimation, where the model parameters are optimized to best fit the training data. Logistic regression remains a powerful tool in NLP, especially when you need a model that is simple, interpretable, and performs well on a wide range of binary classification tasks.

Applications of Logistic regression in NLP: Text Classification: Logistic regression can be used for tasks like sentiment analysis, spam detection, or any other task where text needs to be classified into two categories. Feature Representation: Bag of Words ( BoW ): Converts text into a vector of word frequencies. TF-IDF: Weights words by their importance, giving more significance to rarer words in the document. Word Embeddings: Converts words into dense vectors capturing semantic meaning (e.g., using Word2Vec, GloVe ). n-grams: Captures sequences of words (e.g., bigrams, trigrams) to consider word order and context.

Why we selected Logistic regression ?

Deployment Deployment is the process by which a ML model is moved from an offline environment and integrated into an existing production environment such as a live application. It is a critical step that must be completed in order for a model to serve its intended purpose and solve the challenges it is designed. Here, we are using ‘ Stremlit ’ for deploying our application.

Challenges in project Data Collection and Quality Noise in Data: Hotel reviews often contain spelling errors, slang, abbreviations, and grammatical mistakes, which can make text preprocessing difficult. Length Variation: Reviews can vary significantly in length, from a few words to several paragraphs, which might require different handling during preprocessing. Text Preprocessing Challenges Handling Stop Words: Deciding whether to remove stop words (common words like "and", "the") can be tricky, as they might carry sentiment in some contexts (e.g., "not good"). Stemming and Lemmatization: Reducing words to their base forms can help in generalizing features but might also lose some context (e.g., "better" being reduced to "good").

Challenges in project Model Selection and Training Choosing the Right Model: Simple models like logistic regression might not capture complex relationships in the data, while more advanced models like neural networks might require extensive tuning and more computational resources. Overfitting: With limited data or noisy data, the model might overfit, especially when using complex models, leading to poor generalization to new reviews. Deployment Challenges Real-time Processing: If the model is to be deployed in a real-time system (e.g., for live review monitoring), efficiency and speed of processing become critical. Scalability: The model needs to scale with an increasing volume of reviews, requiring optimization in terms of computational resources and processing time.

references Pandas documentation Link- https://pandas.pydata.org/docs/ Matplotlib documentation- https://matplotlib.org/stable/index.html Streamlit documentation- https://docs.streamlit.io/ https://www.kaggle.com/

THANK YOU!!!!!.........
Tags