Natural Language Programming Project Group-2 .pptx

ShivalikSingh3 4 views 26 slides Mar 09, 2025
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

NLP Project


Slide Content

Analyzing Google Play Store Reviews for Finance Apps [PayPal, Mint, Paytm] Group Number - 2 PRAJWAL K BHAN – 26A RATIK PURI – 29A SETHUPARVATI – 34A SHIRISH SEHGAL – 35A SHIVALIK SINGH – 36A HARSHIT GROVER – 50A

Steps performed

Data source We collected online user reviews of the PayPal, Paytm, and Mint apps from the Google Play Store. The data extracted from the Google Play Store included user IDs, review texts, ratings, and review dates. The reviews were contributed exclusively by users who had signed up, ensuring their authenticity and reliability. We targeted the newest reviews to ensure the relevance of the data, retrieving a total of 100,000 reviews per app. This approach was aimed at capturing the most current user sentiments and experiences. To maintain the quality of the data, only reviews containing at least 20 words were included in the study. This reduced the collective actionable reviews to 59,636. Threshold was set to avoid issues related to sparsity and noise commonly found in shorter reviews, as discussed by Li et al. (2018). The importance of including detailed textual content was underscored by the need to gather qualitative data of high contextual value (Noble and Smith, 2015). Similar methodologies have been applied in other studies, such as by Kumar et al. (2023), who used reviews with more than 20 words for topic modeling in the context of grocery apps.

Data pre-processing The review data was extracted using the google-play-scraper tool and initially converted to a uniform string format. Reviews with fewer than 20 words were filtered out to eliminate noise. The text was then cleaned by removing special characters and standardizing the format, including converting all text to lowercase. Basic text normalization was performed, including the removal of stop words. Special characters, punctuation, and unnecessary whitespace were removed to standardize the text across all reviews. Next, the reviews were processed to ensure consistency in language. This involved converting all text to lowercase, which helped in reducing redundancy and ensuring uniformity. All non-English reviews were excluded. The preprocessed data was structured and saved as a CSV file for further analysis. The notebook primarily utilized the pandas library for these tasks, laying the foundation for more advanced analyses such as sentiment analysis or regression.

Topic discovery using LDA Optimal number of topics is 10.

Topic identified

Topic No Keywords Topic name Topic 1 insurance, restaurant, enjoyed, hidout, executive, customer, two, problem, person, paytm loaded, backup, service, otp, sooner, surprised, began, auto, editor, care UPI Payment Challenges Topic 2 credit, karma, income, mint, card, intuit, helped, company, score, move, get, im, app,PayPal dont, product, used, even, much, thing, better Money Transfer Issues Topic 3 app, account, time, cant, bank, get, even, use, update, every, dont, log, work, password, try, still, phone, email, number, tried Banking Management Problems Topic 4 app, transaction, account, month, budget, like, feature, category, update, cant, work, dont, doesnt, show, bank, version, see, one, card, credit App Usability and Reliability Topic 5 service, customer, app, tax, need, synced, dont, issue, potential, would, pay, refreshing, ever, hour, know, im, get, third, rep, thing Digital Transaction Experience Topic 6 way, make, activity, good, get, adjust, would, guy, pay, account, company, tag, app, customer, venmo, family, â, god, friend, debt Budgeting App Experience Topic 7 Mint, money, account, payment, use, card, service, get, day, never, dont, pay, bank, customer, even, send, im, app, transfer, back Budgeting and Credit Feedback Topic 8 upi, customer, id, app, payment, care, please, currency, bank, support, number, option, worst, service, hai, use, cashback,Mint, dont, mode App Customization Topic 9 app, budget, track, account, bill, one, mint, spending, great, see, love, like, would, credit, keep, finance, easy, goal, financial, really Transaction Tracking Concerns Topic 10 mint ,paytm , app, update, year, issue, account, used, ive, using, use, since, many, time, new, great, intuit, work, like, support, still Subscription Management Tools

TOPIC NAME SAMPLE REVIEW 1 SAMPLE REVIEW 2 UPI Payment Challenges Customer service is terrible. I had to talk to multiple agents, and none of them could solve my problem. I eventually realized they were straight-up lying to me constantly. They claimed things like being able to pull funds early, but PayPal says they can't do that. I could never use the app. It kept saying to connect to the internet, even though I was already connected. Deleting and reinstalling didn’t help. The back button just exits the app, with no way to navigate. Frustrating and unusable. Money Transfer Issues Since getting my new Samsung, I can’t log into the app. I’ve uninstalled and reinstalled, but it just loops when I try to log in, even with fingerprint. Not sure what’s wrong, but I need access to my account. Please fix ASAP. Can someone help? The app won’t update my number. Every time I try to add a new one, it just disappears and won’t let me save it. Please contact me. Banking Management Problems Customer service is terrible. After talking to multiple agents, I realized they were lying to me about things like pulling funds early, which PayPal says they can't do. Another fraudulent issue. The app is also terrible. Couldn't even use the app. It kept asking to connect to the internet, even though I was already connected. Deleting and reinstalling didn’t help. The back button just exits the app, with no way to navigate. Extremely frustrating App Usability and Reliability PayPal doesn’t care about its employees, forcing them back to the office despite promises of remote work. It’s disappointing to see another company take advantage of its staff. As a customer, I’m disgusted. Time to find a different option—just another example of a greedy CEO working from home while the staff suffers Received a strange message yesterday claiming I need a different approach with my account. They mentioned my wife's name and said she has a lot of paperwork and money involved. I'm trying to figure out how much I owe. The rest of the message was confusing, mentioning 'big smokey,' 'windscream carriage,' and something about a 'cleanup.' It seems like a scam. Digital Transaction Experience Glad I looked at reviews before updating. The new update is awful. Hey developers, ever heard of 'If it ain't broke, don't fix it'? I’ve used PayPal for years without problems, but now I can't use it. Customers deserve better. App keeps stopping while logging in. I’ve uninstalled and reinstalled multiple times, emailed the app maker, and spoken to PayPal head office several times—nothing has changed. Even after a year, it still crashes when I try to access the transaction page.

TOPIC NAME SAMPLE REVIEW 1 SAMPLE REVIEW 2 Budgeting App Experience cant manage subscription app click bill view subscription ive got linked pp menu lead go website manage subscription go website app checked paypal manage subscription round round go paypal ought one stop place bad one star worst bank app ever nothing issue forced stopped paypal forced open german one even tho traveling military family order support genocidial state zionism eather paypal like boycotted Budgeting and Credit Feedback awful app downloaded fairly new crypto trying fimd way around tiresome interface suddenly got blocked shocked thanks goodness hadnt deposited planned account said suspcious activity would never reviewed hadnt even spent min didnt even bother found better substitute given paypal never give encountered persistent issue attempting make payment via paypal endeavoring reserve hotel accommodation paypal received notification indicating transaction exceeded allotted time frame could completed App Customization music artist thisbapo make real easy get paid release really good app trick problem sigh phone number whatsapp textnow wont work work great please make possible transfer back account without creditdebit card easier send money directly accountvirtual account latest policy make impossible take money easily Transaction Tracking Concerns major bug app transaction hidden view working go home click balance finance scroll recent activity scroll click show paypal balance transaction show transaction two direct deposts hidden screen also missing search tracking feature working going home scroll recent activity click see walletactivity default activity show moreeverything let click past transaction contact make new one least make show search obvious also business app suck android cant add new contact create invoice Subscription Management Tools best cash app market hand convenient online purchase accepted far place cash app venmo well chime mention crypto shout devs pay pal great linked lot aites iasue credit card bank lonk im always screwed need find app give peoole like identification stolen chance purchase online

Topic occurrence frequency analysis.

Inferences derived Identification of Key Themes : By analyzing the frequency of topics across the reviews, we could identify the most common concerns or features that users discuss. For example, if a significant number of reviews mention "payment issues" or "customer support," these themes would indicate key areas of user interest or frustration. Sentiment Analysis Alignment : High-frequency topics can be aligned with sentiment analysis results to understand how users feel about specific aspects of the apps. For instance, frequent mentions of "security" in a positive context could indicate trust, while negative mentions might highlight security concerns. User Satisfaction Drivers : Understanding the most frequent topics can help identify what drives user satisfaction or dissatisfaction. Topics like "ease of use," "transaction speed," or "rewards" might correlate with higher ratings, suggesting they are key satisfaction drivers. Trends Over Time : If the data included temporal information, analyzing topic frequency over time could reveal trends, such as increasing concerns about a specific feature after a recent app update. Focus Areas for Improvement : Topics that frequently appear in negative reviews could highlight areas needing improvement. For example, if "customer service" or "app crashes" are common topics in low-rated reviews, these areas may require attention from the development team.

Part 1: Regression data analysis with 2000 reviews

Part 1: Inferences derived with 2000 reviews Significant Predictors : All topics have a significant positive association with the dependent variable (“score”). This suggests that higher probabilities for any of these topics are associated with higher scores. Low R-squared and Adjusted R-squared : The model has a low R-squared value (0.013), indicating that the independent variables (topic probabilities) explain only a small portion of the variance in the dependent variable . The model does not have a strong explanatory power for predicting “score.” Model’s Overall Significance : The model is statistically significant (Prob (F-statistic) = 0.00234), but the low R-squared suggests that there may be other variables not included in the model that could better explain the variation in the scores . Potential Issues : The low Durbin-Watson statistic and normality tests suggest potential issues with autocorrelation and non-normality in the residuals , which may affect the reliability of the model’s estimates . These should be further investigated and addressed if necessary .

Part 1: Dominance analysis on Regression model with 2000 review data

Part 1: Correspondence analysis with 2000 reviews

Inferences obtained from Correspondence analysis (Part 1 whole corpus) Topic 4, 9, and 3 are close to each other in the upper right quadrant, suggesting that these topics might have similar user perceptions or probability distributions. Topic 7 is isolated to the far right, indicating it may have a unique profile that distinguishes it from the other topics. Topic 2 is positioned on the far left, indicating a distinct profile or perception compared to others, potentially being less favored or representing a unique dimension in user ratings.​ The spread of topics across the two dimensions indicates a diversity in user perceptions or interactions with these topics. No single cluster contains all topics, suggesting a wide range of responses or engagements.​ Topics that are closer to the origin (center of the plot) might have average or neutral user ratings, while those farther away (like Topic 2 and Topic 7) might represent extreme perceptions, either highly positive or negative.​

Part 2: Regression data analysis with 50,000 reviews

Part 2: Inferences derived with entire corpus 1. R-squared (0.297): The model explains approximately 29.7% of the variance in the dependent variable (`score`). This indicates a moderate level of fit, but there is still a significant portion of the variance not explained by the model. 2. Adjusted R-squared (0.297): The adjusted R-squared is the same as the R-squared, which typically happens when the sample size is large and the number of predictors is small relative to the number of observations. It suggests that the model's explanatory power is consistent when adjusting for the number of predictors. 3. F-statistic (1008) and Prob (F-statistic) (0.00): The F-statistic is quite large, and the p-value associated with it is effectively zero, indicating that the model is statistically significant overall. This means that at least one of the predictors in the model is significantly associated with the dependent variable. 4. Log-Likelihood (-64957 ): This is a measure of the model's fit, with more negative values indicating a poorer fit. However, it is more useful when comparing across different models rather than interpreting on its own. 5. AIC (1.299e+05) and BIC (1.300e+05): These are information criteria used for model selection, with lower values indicating a better fit. These criteria penalize models for complexity, with BIC penalizing more heavily than AIC.

Part2 : Regression data analysis with entire corpus

Part 2: Inferences derived with entire corpus Impact of Review Content on Ratings : By performing regression analysis, you could determine how various aspects of review content (e.g., mentioned topics, sentiment) impact user ratings. For instance, positive mentions of specific features might correlate with higher ratings, while negative comments could be associated with lower ratings. Key Predictors of User Ratings : The analysis would help identify which topics or keywords in the reviews are significant predictors of overall user satisfaction. For example, frequent mentions of "user interface" might strongly correlate with higher ratings if users appreciate the app's design. Quantitative Measure of Review Influence : Regression analysis could provide a quantitative measure of how strongly different review elements (such as sentiment or specific themes) affect the ratings. This insight helps in understanding which aspects of the app have the most influence on user satisfaction. Identifying Areas for Improvement : By examining the regression coefficients, you can identify which negative aspects are most strongly associated with lower ratings. This information is valuable for pinpointing areas where the app needs improvement. Model Validation : The performance of the regression model (e.g., R-squared value, residuals) would indicate how well the review content explains the variation in user ratings. A higher R-squared value would suggest that the model is effective in capturing the factors influencing ratings.

Part 2: Dominance analysis on Regression model with full corpus data

Inferences obtained from Dominance analysis (Part 2 whole corpus) The output provided is from a dominance analysis on a regression model, and it gives insights into the relative importance of various topics in predicting the dependent variable in the regression. Inferences: Topic 2 Probability is the most dominant predictor in the model, with the highest percentage relative importance (28.13%). This suggests that Topic 2 is a crucial factor in predicting the outcome. Topic 4 Probability is also highly important, contributing 24.17% to the model's predictive power. Topic 9 Probability and Topic 3 Probability are moderately important, with contributions of 13.04% and 11.80% respectively. Topics 5, 7, and 1 Probabilities have moderate to low importance, ranging from around 4.41% to 4.49%. Topics 8, 10, and 6 Probabilities have the least influence, with Topic 6 being the least important predictor, contributing only 1.66%. The dominance analysis indicates that Topic 2 ( PayPal: Money Transfer Issues) and Topic 4 ( App Usability and Reliability ) are the most significant predictors in the regression model, followed by Topics 9 and 3. These insights can guide further analysis or model refinement by focusing on the most important topics.

Part 2: Correspondence analysis with 50000 reviews

Inferences obtained from Correspondence analysis (Part 2 whole corpus) Impact of Review Content on Ratings : By performing regression analysis, you could determine how various aspects of review content (e.g., mentioned topics, sentiment) impact user ratings. For instance, positive mentions of specific features might correlate with higher ratings, while negative comments could be associated with lower ratings. Key Predictors of User Ratings : The analysis would help identify which topics or keywords in the reviews are significant predictors of overall user satisfaction. For example, frequent mentions of "user interface" might strongly correlate with higher ratings if users appreciate the app's design. Quantitative Measure of Review Influence : Regression analysis could provide a quantitative measure of how strongly different review elements (such as sentiment or specific themes) affect the ratings. This insight helps in understanding which aspects of the app have the most influence on user satisfaction . Identifying Areas for Improvement : By examining the regression coefficients, you can identify which negative aspects are most strongly associated with lower ratings. This information is valuable for pinpointing areas where the app needs improvement. Model Validation : The performance of the regression model (e.g., R-squared value, residuals) would indicate how well the review content explains the variation in user ratings. A higher R-squared value would suggest that the model is effective in capturing the factors influencing ratings.

Google drive link of all codes/data/pre processed data/plots https://drive.google.com/drive/folders/1LK04elIfbS6WBtcoWZcfAyhznCXdaJqY?usp=drive_link

Managerial insights from this text mining project 1. Customer Feedback Analysis :: The text mining project focused on extracting and analyzing user reviews for various productivity and finance apps. The identification of common issues (e.g., payment challenges, usability problems, transaction concerns) can help managers prioritize improvements in these areas.Insights into specific topics like "UPI Payment Challenges" or "Mint App Customization" suggest that customers are vocal about certain functionalities, which may require focused attention. 2. Product Improvement :: The extracted topics can guide app development teams to enhance features that users find problematic or lacking. For instance, if "Banking Management Problems" is a common theme, it indicates that this area needs more robust functionality or user support. Managers can use these insights to make data-driven decisions on where to allocate resources for feature development, bug fixing, or user interface improvements. 3. Market Positioning :: By understanding the sentiment and common issues discussed in reviews, companies can better position their apps in the market. If competitors are struggling with certain features, addressing these in your app could become a unique selling proposition. The analysis can also reveal gaps in the market where new features or entirely new apps could be developed to meet unmet needs. 4. Customer Support and Communication : The identification of specific issues in customer reviews can help customer support teams address these concerns more effectively. For example, if "Money Transfer Issues" with PayPal are frequently mentioned, the support team can prepare targeted responses and solutions. Additionally, these insights can be used to proactively communicate with users, offering tutorials or guides on features they find challenging. 5. Strategic Planning : The insights can feed into broader strategic planning. If the data reveals that certain app functionalities are consistently problematic, it may indicate a need to rethink the app’s strategy or even consider partnerships with other tech solutions. Similarly, if positive aspects are identified, like strong feedback on "Digital Transaction Experience," this can be leveraged in marketing campaigns to attract more users .
Tags