Relation Between Images and Text Posted on Social Media
matissrikters
6 views
28 slides
Sep 23, 2024
Slide 1 of 28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
About This Presentation
Relation Between Images and Text Posted on Social Media
Size: 10.65 MB
Language: en
Added: Sep 23, 2024
Slides: 28 pages
Slide Content
Relation Between Images and Text Posted on Social Media Matīss Rikters 04.09.2024
Outline Project overview Dataset collection, processing, annotation Sentiment analysis, named entity recognition, question answering Dataset analysis, aspects from cognitive science Some recent papers Annotations for Exploring Food Tweets From Multiple Aspects What Food Do We Tweet about on a Rainy Day? Tweeting on an Empty Stomach: Unpacking Food Price Hikes and Inflation via Twitter
Project overview Started as my bachelor’s thesis in 2011 Was running for years with little disruptions https://twitediens.lv/ - go check it out Has its own Twitter account https://twitter.com/Twitediens Every day it tweeted 5 most mentioned foods of the last 24 hours 5 most active users of the last 24 hours A random recommendation for lunch Twitter users occasionally interacted with it Been facing difficulties after the recent Twitter leadership change , on pause at the moment...
Dataset overview https://github.com/Usprogis/Latvian-Twitter-Eater-Corpus Domain-specific about food and eating written in Latvian 3.1M tweets, and counting… ~5,500 + 744 tweets with manually annotated sentiment (positive, neutral, negative) for training and testing 744 tweets with manually annotated named entity classes of person names, locations, organizations, food and drinks, and miscellaneous named entities ~43,000 automatically aggregated question-answer tweet pairs ~155,000 tweets with images ~200,000 with location info Also, on Hugging Face https://huggingface.co/datasets/matiss/Latvian-Twitter-Eater-Corpus-Images https://huggingface.co/datasets/matiss/Latvian-Twitter-Eater-Corpus-Translation 5
Yearly Data Distribution
Data processing
Experiments Sentiment analysis – about 5,500 tweets annotated for training and 744 as a test dataset Named entity recognition – the same 744 tweets annotated with place, person, food, time, and misc. entities Question answering – about 19,000 tweets that express questions along with any replies to the tweets make up about 43,000 question-answer tweet pairs Multimodal experiments – about 155,000 tweets have images, experiments still in progress...
How to determine sentiment? It was difficult to agree upon sentiment of some tweets Consider those: “ Batars tak arī viņus ēda paļube tgd mums no 9 izlabos uz 3 :D ” “Batars was also eating them and now our grades will be marked from 9 to 3 :D” “ Ja vēlies pazaudēt pāris kilogramus, izrauj savus zobus! Tad arī turpmāk būs grūti apēst parāk daudz ” “If you want to lose weight, just pull out your teeth! Then it is going to be difficult to eat too much”
Sentiment over time
Relation to temperature and other senses Cold soup is very popular in Latvia in summer
Day of week & time of day Pancakes on weekend afternoons, evenings Monday Tuesday Wednesday Thursday Friday Saturday Sunday Morning 1,107 1,128 1,122 1,049 1,221 1,617 1,887 Afternoon 2,122 2,071 2,015 2,030 2,236 2,704 3,410 Evening 2,133 2,171 2,096 2,044 1,810 1,856 2,515 Night 615 603 609 601 588 583 668 Salads on weekday mornings, afternoons
Relation to weather ‘Weather people’ is a term used by Bakhshi (2014) to explain our dependence on the weather for food choice and satisfaction The w eather: significantly alters consumers’ mood and consequently - behaviour affects both the frequency and the content of feedback provided by food consumers
Weather Data Availability
Product Rainy Windy Warm Cold Tea 8.78% 6.64% 7.70% 10.08% Coffee 6.59% 5.94% 6.77% 6.73% Chocolate 4.83% 3.50% 4.56% 5.14% Ice cream 3.05% 1.75% 4.04% 2.39% Meat 4.20% 9.44% 4.38% 3.95% Potatoes 3.16% 2.80% 3.42% 3.17% Salad 2.19% 3.15% 2.14% 1.81% Cake 2.77% 4.20% 2.85% 2.93% Soup 2.44% 2.10% 2.63% 2.57% Pancakes 2.16% 0.70% 2.07% 2.20% Sauce 2.01% 0.70% 2.07% 1.65% Apples 1.35% 1.75% 1.86% 1.24% Dumplings 2.25% 1.05% 2.28% 2.12% Chicken 1.75% 2.10% 1.85% 1.72% Negative Neutral Positive Cold 12.59% 37.25% 50.17% Warm 13.20% 38.68% 48.12% Windy 23.15% 48.40% 28.45% Snowy 11.88% 36.06% 52.06% Rainy 13.63% 38.64% 47.73% High Pres 23.10% 48.26% 28.63% Low Pres 12.63% 38.72% 48.65% From the ~167,000 tweets with location data ~68,000 from Riga ~9,000 from areas around Riga For more location-related tweets, we selected all remaining tweets which mention Riga or any of its surrounding areas (Mārupe, Ķekava, etc.) in any valid inflected form, adding ~54,000 tweets Total amount for the analysis - 131,595 tweets For sentiment analysis We use the 5,420 annotated tweets to fine-tune multilingual BERT for this task along with ∼20,000 sentiment-annotated Latvian tweets from other sources Reaching an accuracy of 74.06% on the 744 tweet test set from LTEC Weather Relation Results
Rising food costs Recent food inflation rates in the Baltic countries have been the highest in the euro area, ranging from 12% to 19% year-on-year Global food prices increased by 31% from December 2019 to December 2022, while in Latvia this increase was 39.8% Latvia’s GDP per capita in 2022 was around three quarters of the EU average, meaning that food price increases have a significant impact on overall household spending patterns
Price-related tweets in LTEC Word Count Word Count to buy 45,702 spending 1,144 cheap 21,044 economics 1,000 to cost 10,751 income 989 price 10,460 expenses 337 money 8,340 finance 255 salary 6,558 inflation 255 expensive 5,521 currency 71 afford 1,683 economy 64 value 1,637 markup 52 costs 1,425 deflation 5
To buy
Text-image Relation
Annotating Images The tweet text is represented in the image AND the image adds to the meaning of the text The tweet text is NOT represented in the image, BUT the image adds to the meaning of the text The tweet text is represented in the image BUT the image does NOT add to the meaning of the text NEITHER the tweet text is represented in the image NOR the image adds to the meaning of the text
Tweets with Images
Tweets with Images Popularity of posting images along with food-related tweets has shifted over the years between 5–20% of total monthly tweets . After annotating 800 image-tweet pairs we found that the majority do textually describe what is represented in the image, and ∼1/2 of the cases the image also adds to the meaning of the text. Only ∼9% add to the meaning without describing the contents in the text, and a mere 6% neither add to the meaning, nor are described by the text.
Rare Examples @elnormous Esat komisks ar saviem roltoniem. Iegūglējiet jebkuru citu Eiropas valsti un roltoni būs Jūsu mīļākais ēdiens.
Experiments with LVLMs We experimented with LLaVA models prompting them using the original tweet texts in Latvian or automatic translations into English. Prediction accuracy on combined output of both prompts was 20.69% on the original texts, and improved to 27.83% on English translations.
Future / Ongoing Work See if there is any link between unrelated image-text pairs and misinformation or disinformation Find ways to improve those 20.69% and 27.83% accuracy scores Fine-tune LLMs on Latvian and/or social media texts Annotate more text-image pairs for training/tuning data Fine-tune LVLMs on the text-image relation task
All on GitHub Website - https://github.com/M4t1ss/TwitEdiens Main corpus - https://github.com/Usprogis/Latvian-Twitter-Eater-Corpus NER corpus - https://github.com/RinaldsViksna/Latvian-food-NER-corpus Sentiment analysis - https://github.com/M4t1ss/sentiment-analysis-toolkit Processing scripts - https://github.com/M4t1ss/Latvian-Twitter-Eater-Corpus-Processing