Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews

jadavvineet73 41 views 18 slides May 11, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

This project explores sentiment analysis, a technique used to understand emotions expressed in text. We delve into the world of movie reviews, applying sentiment analysis techniques to uncover audience sentiment towards various films. This can provide valuable insights for filmmakers, studios, and m...


Slide Content

SENTIMENT ANALYSIS ON MOVIE REVIEWS PRESENTED BY: MANSI CHOUDHARY

INDEX 01 02 03 04 05 10 DATA COLLECTION DATA PREPROCESSING FEATURE EXTRACTION MODEL SELECTION MODEL TRAINING: FINE-TUNING AND ITERATION 09 08 07 06 CONCLUSION INTERPRETABILITY ERROR ANALYSIS MODEL EVALUATION

Data Collection For this project, I selected the IBDM dataset from the website Kaggle Link for dataset

DATA PREPROCESSING

HTML tags removed All uppercase word converted into lower case Non-Alphanumeric Characters removed Extra Whitespaces removed Tokenization done Stemming used instead of Lemmatization because Lemmatization take more run time Duplicate values removed

FEATURE EXTRACTION Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used in natural language processing (NLP) and information retrieval to evaluate the importance of a word in a document relative to a collection of documents, typically a corpus. TF-IDF combines two components: Term Frequency (TF) and Inverse Document Frequency (IDF). Term Frequency-Inverse Document Frequency (TF-IDF) is used

MODEL SELECTION Model preformed: LSTM NAIVE BAYE’S RANDOM FOREST SVM

MODEL TRAINING Split the data

MODEL EVALUATION Approximately all models got same result Accuracy: Proportion of correctly classified instances among all instances. Precision: Proportion of true positive predictions among all positive predictions. Recall: Proportion of true positive predictions among all actual positive instances. F1-score: Harmonic mean of precision and recall, providing a balance between them. ROC-AUC: Area under the Receiver Operating Characteristic (ROC) curve, measuring the model's ability to discriminate between positive and negative instances.

ERROR ANALYSIS Approximately all models got same result,tried to resolve but not got solutions When i was converting X_train, X_test into array the session was crashing so i minimized the number of inputs and output while training this maybe the reason output efficiency is not good

INTERPRETABILITY Visualization techniques such as bar plots or count plot is used WordCloud Used

Most frequent positive words

Most frequent negative words

FINE-TUNING AND ITERATION Selected Random Forest Model and done fine tuning

CONCLUSION Random Forest run well given expected outputs

Future expectations: Instead of TFID Word2Vec can be used More models can be used like CNN, Logistic Regression Instead Of Stemming , Lemmatization could be used