The language of social media

dianamaynard 402 views 61 slides Nov 30, 2018
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

Talk given at the CIPR annual conference


Slide Content

The language of social media Dr. Diana Maynard University of Sheffield PROFESSIONAL STANDARDS – cipr.co.uk

Twitter Fun Facts 500 million tweets sent per day 24% of all internet male users use Twitter (vs 21% of women) 37% of Twitter users are 18-29 25% of Twitter users are 30-49

Which country has the most Twitter users?

Twitter Users per country US: 67 million Brazil: 27.7 million Japan: 25.9 million Mexico: 23.5 million .... UK: 13 million

Which country has the highest penetration of Twitter users?

1/3 of all internet users there are on Twitter

Who do we follow on Twitter?

Top 10 most followed Twitter users 2017 2015 2013 Katy Perry Katy Perry Katy Perry Justin Bieber Justin Bieber Justin Bieber Barack Obama Taylor Swift Lady Gaga Taylor Swift Barack Obama Barack Obama Rihanna Youtube Taylor Swift Ellen de Generes Lady Gaga YouTube Lady Gaga Rihanna Britney Spears Youtube Ellen de Generes Rihanna Justin Timberlake Twitter Instagram Twitter Justin Timberlake Justin Timberlake

Social media: a valuable source of information (not just stupid stuff about pop stars) business insights sharing and receiving news campaigns sharing information during disasters all kinds of collective intelligence an alternative to traditional polls and much more

Why is social media interesting to study? Fast-growing, highly dynamic and high volume source of data – big data! Reflects language used in today's society Reflects current views of society Challenging research area for Text Analysis due to specialised use of language

Gartner 3V definition of Big Data Volume Velocity High volume & velocity of messages: 500 million tweets per day Variety Stock markets Earthquakes Social arrangements + Veracity

Big Data is not new! Staff sorting 4M used tickets from #London Underground to analyse line use in 1939

Linguistic challenges of social media Language Problem: typically exhibits very different language style Solution: train specific language processing components Relevance Problem: topics and comments can rapidly diverge. Solution: train a classifier Lack of context Problem: hard to disambiguate entities Solution: data aggregation, metadata, entity linking

People don’t write “properly” Grundman:politics makes # climatechange scientific issue,people don’t like knowitall rational voice tellin em wat 2do Want to solve the problem of # ClimateChange ? Just #vote for a #politician! Poof! Problem gone! #sarcasm #TVP #99% Human Caused # ClimateChange is a Monumental Scam! http:// www.youtube.com / watch?v =LiX792kNQeE … F**k yes!! Lying to us like MOFO's Tax The Air We Breath! F**k Them! The last people I will listen2 about guns r those that know nothing about them&politicians who live in states w/strictest gun laws #cali #ny

16 Ecuador, 7.8 earthquake , April 2017, ~700 people die Droughts, affecting 60 million in 34 countries Maxwell, California, Feb 2017 Portugal, forest fires, 64 confirmed deaths, Jun 2017 Manchester, May 2017, 22 dead Haiti, Hurricane Matthew, Oct 2016, ~500 people died, farming devastated

How is social media relevant to disasters?

Uses of social media during disasters Broadcasting info about the disaster Requesting info from local people and eyewitnesses Requesting and offering help and support Disaster mapping Mobilising the crowd to support initiatives

In the US, 1.1 million tweets were sent in the first day of Hurricane Sandy, and over 20 million in total Over 800K photos with #Sandy hashtag on Instagram 2.3M tweets were sent with the words “Haiti” or “Red Cross” in 2010 More than 23 million tweets were posted about the haze in Singapore In Nepal, more than half a million posts were shared about the devastating earthquake in 2015 Some (big) numbers)

How can social media help? Harnessing the Crowd Using citizen reporters, and digital responders for mapping crises Ushahidi deployed over 50k times Free and open source Working with us on the COMRADES project

Tools to help disaster victims get aid quickly Find mentions of locations in the text, match them to a knowledge base, and plot them on a map 21

How important and urgent is the message? What actions need to be taken?

Understanding climate change: sex will save the planet!

Behaviour Analysis Based on the assumption that users in different behavioural stages communicate differently (different emotions, directives, etc.) Pajarito @ lindopajarito . 2h Our building needs 40% of all energy consumed in Switzerland!  DJPajarito @ DJPajaritoGenial . 12h I'm so proud when I remember to save energy and I know however small it's helping. Desirability : Negative sentiment (expressing personal frustration- anger/sadness) Buzz : Positive sentiment (happiness/joy). I/we + present tense HotelPajarito @ HotelPajarito . 18h Join us today today to switch of a light for EH!  Invitation : Positive sentiment (happy) + use of vocatives

What matters most to people around the world? Exploring opinions on Twitter of people around the world about societal issues – priorities used to re-rank topics for well-being index http://www.oecdbetterlifeindex.org/

How do people talk about elections and political events? How do the MPs talk about different topics? How does the public respond to them? Social media and politics

Real-time Opinion Monitoring vs replies

Climate change, ISIS and Trump 29

Parties, topics and location

Twits, twats and twaddle: analysis of hate speech towards politicians

Online abuse Puts people off debating online Puts people off becoming politicians Seems to be getting worse Might be particularly bad for particular groups (females, ethnic minorities, LGBT etc ) "I am seriously considering whether or not to stand next time" "My staff try not to let me go out alone" "Misogynist comments, sexual abuse … My children saw this" "death threats"

Swear words are a sign of abuse, right?

Well, maybe not always

What about if we specifically mention someone with a nasty word? That has to be bad, right? You *$!%*&”!

Well, not always ….

Hashtags can be misleading These are all perfectly innocent: # powergenitalia # lesbocages # molestationnursery # teacherstalking #therapist

And what about foreign words? # slagroom

But we still need to analyse hashtags

Who is being abused? Who is abusing them? What is the abuse about? Is it really getting worse? Aims of the Analysis

Collect tweets to and from politicians in the run-up to the 2015 and 2017 UK elections Annotate all the interesting information (who, what, when, where) with the social media toolkit Run an abuse classifier Analyse the results Plan

Tweets are tracked in real time using the streaming API Tweet Collection

Individual tokens are extracted Tokenisation

Spelling and abbreviations are normalised Normalization

Parts-of-speech are identified POS tagging

We discover mentions of entities such as people, locations, organisations and products Named Entities

Find mentions of MPs and link to information from YourNextMP + DBpedia Politician Recognition

Tweets are matched against a detailed topic ontology Topic Detection

Tweets are linked to NUTS regions based on place tags and user home locations Geolocation

We classify users into e.g. journalist, charity, member of publ ic User Classification

Tweet text and annotations are indexed in semantic search engine Mímir for search and visualisation Semantic Search

F ind ing abus ive terms n* gge r witch homo God botherer 404 abusive terms collected But only annotated when used in specific situations shut up f**k you Uncivil language idiot kill Threats die Obscene nouns c* nt tw *t rape Racist and bigoted language

Analysing the data

Did the abuse get worse? There was more abuse in 2017 than in 2015 2017 2015

Who got the abuse in 2015? Men got more abuse then women Conservatives got more abuse than Labour

Who got the abuse in 2015 ? A small number of prominent MPs

What about in 2017? The same thing happened (but to different people)

Check out the interactive version! http://demos.gate.ac.uk/politics/buzzfeed/sunburst.html

Take-away message Social media contains an awful lot of interesting information The way people talk on social media is critical, and messages framed in the right way can lead to real behavioural change If we can understand this properly, this can give us incredibly valuable insights It’s worth spending the time to do this properly More about all this on our blog: https://gate4ugc.blogspot.com/

Acknowledgements Work partially funded by the European Union/EU under the Information and Communication Technologies (ICT) theme of the 7th Framework and H2020 Programmes for R&D SoBigData ( 654024) http://www.sobigdata.eu COMRADES (687847) http://www.comrades-project.eu