“What Else Are They Talking About?”: A Large-Scale Longitudinal Analysis of Misinformation Super-Spreader Communities on Facebook
Snurb
194 views
22 slides
Jun 22, 2024
Slide 1 of 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
About This Presentation
Paper by Daniel Angus, Stephen Harrington, Axel Bruns, Phoebe Matich, Nadia Jude, Edward Hurcombe, and Ashwin Nagappa, presented at the ICA 2024 conference, Gold Coast, 22 June 2024.
Size: 16.71 MB
Language: en
Added: Jun 22, 2024
Slides: 22 pages
Slide Content
“What Else Are They Talking About?”: A Large-Scale Longitudinal Analysis of Misinformation Super-Spreader Communities on Facebook Prof Daniel Angus 1 , A/Prof Stephen Harrington 1 , Prof Axel Bruns 1 , Phoebe Matich 1 , Nadia Jude 1 , Dr Edward Hurcombe 2 , Dr Ashwin Nagappa 1 Queensland University of Technology (QUT) Royal Melbourne Institute of Technology (RMIT)
Evaluating the Challenge of ‘Fake News’ and Other Malinformation ARC Discovery Project (2020 - 2024) Prof. Axel Bruns, Prof. Daniel Angus, A/Prof. Stephen Harrington, Dr Edward Hurcombe, Ms Jane Tan, Prof. Scott Wright (Bournemouth), Prof. Jennifer Stromer-Galley (Syracuse), Prof. Karin Wahl- Jørgensen (Cardiff) This project conducts a systematic, large-scale, mixed-methods analysis of empirical evidence on the dissemination of, engagement with, and impact of ‘fake news’ and other malinformation in public debate, in Australia and beyond.
The Stack
The Stack
Finding Relevant Data ( FakeNIX ) Iteratively updated masterlist of domains listed in existing studies of ‘fake news’ 2,314 domains to date (from Shao et al., 2016; Starbird et al., 2017; Allcott et al., 2018; Grinberg et al., 2019; Guess et al., 2018; 2019; etc.) Data from CrowdTangle: any Facebook posts from public pages / groups / verified profiles that contained links to any of these domains 1 Jan. 2016 to 31 Mar. 2021: 42.6million posts from 918,760 pages/groups Limitations: ‘Fake news’ domain lists largely US- / Anglocentric Crowdtangle’s coverage is not complete
US progressives US conservatives France / Germany Italy Brazil India alternative health conspiracies UK alternative finance Nodes: public pages, groups, verified profiles / domains in posts Size: weighted in-degree Colour: weighted in-degree FakeNIX domain posts, 1 Jan. 2016 to 31 Mar. 2021 Angus, D., Bruns, A., Hurcombe, E., & Harrington, S. (2021). ‘Fake news’ on Facebook: a large-scale longitudinal study of problematic link-sharing practices from 2016 to 2020. In Selected Papers in Internet Research 2021: Research from the Annual Conference of the Association of Internet Researchers AoIR - Association of Internet Researchers. https://doi.org/10.5210/spir.v2021i0.12089
Thematic Mapping of Pages/Groups Too many pages/groups in our collection to capture all content (918,760 in total) …instead… What characterises the top 500 pages, and 500 groups in our collection? Dimensions of interest: the subscriber count; count of distinct FakeNIX domains it has shared at least one link to; number of links to any FakeNIX domain; and total engagement from users towards this content.
Rank Product Technique from genetics (Breitling et. al., 2004), useful in combining multiple quantities into a single rank. Uses a simple geometric mean of the ranks of each quantity. Rank product = (subscriber rank * domains rank * links rank * engagement rank) ^ 1/4 Breitling, R., Armengaud , P., Amtmann , A., & Herzyk , P. (2004). Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS letters, 573 (1-3), 83–92. https://doi.org/10.1016/j.febslet.2004.07.055
Analysing the ‘Prominent 1,000’ … Actually 954 Collect all posts from these pages/groups over a six-year period: 2016 – 2021* Text from posts (not in images) combined into a single text field Text collated into yearly quarters per page/group: 2016Q1_1, 2016Q2_1, … , 2021Q4_954 Latent Dirichlet Allocation (n = 40) to generate topic model for this entire corpus Keyword analysis (top terms, bigrams, trigrams) Qualitative exploration of emergent themes Thematic prominence over time Dynamic Networks of thematic similarity between pages/groups *481 pages, 473 groups = 954 total, ~70million posts
What (else) do they say?
Overall Topics Politics World News Religion Health & Wellness Law & Order Popular culture and entertainment Supernatural and conspiratorial And more…
Entertainment Law & Order Types of Links Disparate sharing of link types (fake news links and non-fake news links) by different pages/groups. Early evidence suggests this may relate to topical focus of the pages.
Pop-culture (tv gossip, spoilers) Sports (NFL) Real-estate UK Royals Gossip Scam alerts Weather and emergency alerts
What else do they share?
Similarity Map Combined similarity between spaces: Non- FakeNIX domain sharing practices Facebook on-sharing practices YouTube video sharing practices Using practice mapping method (Bruns et al., forthcoming) Major clusters: Size = number of non- FakeNIX posts Red = Trump / MAGA etc. Blue = Berniecrats et al. (plus other languages, other countries, evangelists, conspiracy theorists, …) Network filtered for edges with sum of cosine similarities ≥ 0.5
Fox News and Breitbart dominate Far more mainstream media sharing Some mainstream media sharing Biased sources, but less disinformation
Takeaways
Observations Dominance of US political topics: Mostly a result of US focus in FakeNIX masterlist Lack of similar lists of problematic sources (partly because such judgments are difficult) Divergent patterns between MAGA and Berniecrats Berniecrats : mostly mainstream news sharing , with occasional problematic content Trump / MAGA : mostly problematic content with occasional mainstream news (occasional content could be hate-sharing in both cases)
Further Outlook Computational challenges of large datasets Longitudinal work is still ongoing, and difficult to present in conventional paper/article formats Large-scale topic modelling is computationally intensive and dominated by major topics (‘Trump’) Facebook data are ill-suited to conventional network mapping practice mapping approach helps Future repeatability: Can we still do this with the Meta Content Library? ‘Virtual digital enclave’ very difficult to use with standard analytics tools (Tableau, Gephi, …) Can we use the Facebook URL Shares Dataset to extend to non-public sharing on Facebook? Difficult with limited demographic and temporal resolution of sharing and engagement data Can we keep the masterlist of problematic domains fresh? Some new data sources, still very US-centric – e.g. Iffy Index of Unreliable Sources
This research is supported by the ARC Discovery project Evaluating the Challenge of 'Fake News' and Other Malinformation and the ARC Laureate Fellowship project Dynamics of Partisanship and Polarisation in Online Public Debate . Facebook data are provided courtesy of CrowdTangle. Acknowledgments