Twitter Sentiment Analysis

ayushkhandelwal7 1,212 views 11 slides Apr 13, 2016
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

Information Retrieval and Extraction Major Project 2016
IIIT Hyderabad


Slide Content

TWITTER SENTIMENT ANALYSIS By: Ayush Khandelwal Goutam Nair Pravallika Rao Course: Information Retrieval and Extraction IIIT Hyderabad Under the guidance of Prof. Vasudeva Varma

Problem Statement Input - Textual content of a tweet Output – Label signifying the sentiment of the tweet (Positive, Neutral or Negative)

Motivation Tweets sometimes express opinions about different topics. These opinions are important Consumers can use sentiment analysis to research products or services before making a purchase. E.g. Kindle Marketers can use this to research public opinion of their company and products, or to analyze customer satisfaction. E.g. Election Polls Organizations can also use this to gather critical feedback about problems in newly released products. E.g. Brand Management (Nike, Adidas)

Challenges Noisy text Lack of context - 140 characters only Acronyms - lol, brb, gr8 Emoticons - :) , :( , :| Negation

Approach

Approach Tweet Downloader Download the tweets using twitter API ( https://github.com/aritter/twitter_download ). 9684 training and 8987 testing tweets are downloaded. Parser The parser removes all unavailable tweets from the downloaded data After removing these we have 7612 tweets for training and 7868 tweets for testing

Approach Pre-processing Replace Emoticons by their polarity. Remove URLs and Targets. Expand acronyms. eg 'brb' to 'be right back' Remove stop words. Tokenization Stemming Case-folding Remove punctuation marks Replace sequence of repeating characters eg. 'hellooooo' by 'helloo'

Approach Feature Extractor The pre-processed data file is fed to the feature extractor which creates the feature vector. The basic(baseline) feature that was considered was of unigrams. A list of all unique unigrams across the training set was constructed and it formed the basic vector for each tweet. Synsets are used for words that are not found in the list of unique unigrams.

Approach Add Additional Features Polarity scores of the tweets Negation Hashtags Special characters (?,!,*) Capitalized words SVM Classification and Prediction The features extracted are passed to the classifier The model built is used to predict the sentiment of the new tweets

Results Features Accuracy Precision Recall F1 score Unigram 54.855% 0.5264 0.5061 0.5126 Unigram+Additional features 57.079% 0.5525 0.5308 0.5386 Bigrams 58.579% 0.5713 0.5173 0.5269 Bigrams+Additional features 60.739% 0.5930 0.5525 0.5637

Links Github Repositary - https://github.com/ayush-khandelwal7/Twitter-Sentiment-Analysis Github Page - http://goutamnair7.github.io/Twitter-Sentiment-Analysis