AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of Polarisation in the News

Snurb 107 views 39 slides May 01, 2024
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Invited talk presented at ZeMKI, Bremen, 30 Apr. 2024.


Slide Content

AI as Research Assistant: Upscaling Content Analysis to Identify Patterns of Polarisation in the News Axel Bruns with important contributions from: Australian Laureate Fellow Laura Vodden Katharina Esau Sebastian Svegaard Digital Media Research Centre Tariq Choucair Samantha Vilkins Kate O’Connor Farfan Queensland University of Technology Laura Lefevre Vishnu PS Carly Lubicz-Zaorski Brisbane, Australia [email protected] @snurb_dot_info | @[email protected] | @snurb.bsky.social

Image: Midjourney Polarisation

Our Project Australian Laureate Fellowship (2022-27) Determining the Drivers and Dynamics of Partisanship and Polarisation in Online Public Debate Digital Media Research Centre, Queensland University of Technology, Brisbane, Australia 4 postdocs, 4 + 4* PhD students, 1 data scientist Cross-national comparisons (intended: AU, US, UK, DE, DK, CH, probably + BR, PE, CA) Longitudinal analysis over the course of the project * Starting in 2024 – interested? Get in touch! ( [email protected] )

Image: Midjourney Forms of Polarisation

(https://www.pewresearch.org/politics/2022/08/09/as-partisan-hostility-grows-signs-of-frustration-with-the-two-party-system/pp_2022-08-09_partisan-hostility_01-08/) (https://www.pewresearch.org/politics/2023/09/19/the-republican-and-democratic-parties/pp_2023-09-19_views-of-politics_04-02/)

Forms of Polarisation Polarisation at what levels? Micro: between individuals Meso : between groups Macro: across society Mass: involving everyone Elite: amongst formal political actors (however defined) See: Esau et al. (2023) — https://eprints.qut.edu.au/238775/ (and chapter forthcoming in the Routledge Handbook of Political Campaigning )

Forms of Polarisation Polarisation on what attributes? Issue-based: disagreements over specific policy settings Ideological: fundamental differences based on political belief systems Affective: political beliefs turned into deeply felt in-group / out-group identity Perceived: view of society, as based on personal views and media reporting Interpretive: reading of issues, events, and media coverage based on personal views Interactional: manifested in choices to interact with or ignore other individuals/groups (and more…)

Image: Midjourney A Problem? (When?)

Agonism? Polarisation? Dysfunction? How bad is it, exactly? All politics is polarised (just not to the point of dysfunction) Much ( most ?) politics is multipolar, not just left/right When does mild antagonism turn into destructive polarisation? We suggest five symptoms ( Esau et al., 2023 ): breakdown of communication; discrediting and dismissing of information; erasure of complexities; exacerbated attention and space for extreme voices; exclusion through emotions. Image: Midjourney

Weren’t you going to talk about AI? And news?

News Outlet Polarisation ?

Park, Sora, Caroline Fisher, Kieran McGuinness, Jee Young Lee, and Kerry McCallum. 2021. Digital News Report: Australia 2021 . Canberra: News and Media Research Centre. https://doi.org/10.25916/KYGY-S066 . News Audience Polarisation

(And by extension, news audience polarization?) Can we assess news content polarisation?

This study examines non-editorial news coverage in leading US newspapers as a source of ideological differences on climate change. A quantitative content analysis compared how the threat of climate change and efficacy for actions to address it were represented in climate change coverage across The New York Times , The Wall Street Journal , The Washington Post , and USA Today between 2006 and 2011. Results show that The Wall Street Journal was least likely to discuss the impacts of and threat posed by climate change and most likely to include negative efficacy information and use conflict and negative economic framing when discussing actions to address climate change . The inclusion of positive efficacy information was similar across newspapers. Also, across all newspapers, climate impacts and actions to address climate change were more likely to be discussed separately than together in the same article. Implications for public engagement and ideological polarization are discussed. ( http://journals.sagepub.com/doi/10.1177/0963662515595348 )

Image: Midjourney

What Questions Can We Ask? Polarisation in news coverage: Who gets to speak in the coverage (viewpoint diversity)? How are they presented (detail and frequency; attribution; direct/indirect speech)? What language and key terms are used in the journalistic text (framing)? How does this match the language of which speakers / actors / stakeholders? How consistent is this across articles from the same news outlet, over time? How uniform or diverse is this across different news outlets? Are such coverage differences issue-specific, or persistent across issues? Do such differences map onto perceived outlet polarisation or audience polarisation?

😫😫😫 But first…

News Data Are Hard to Find The trouble with news databases: Factiva, ProQuest, LexisNexis, … are geared for library use and qualitative research Some surprising gaps in news outlet coverage (and very text-centric) Licencing arrangements prohibit large-scale content exports (>100 articles) Exacerbated by growing publisher concerns about use of news data in AI training Example: Factiva – ~US$30,000 per annum for API access, double for longer-term storage

Non-paywalled content Paywalled content

🔁 🧐

⏬ 🤔

⚠️ Work in Progress… Where we’ve got to…

Viewpoint Diversity and Stance Detection Working prototypes: Who is speaking? e.g. Jane Smith; a spokesperson How are they introduced? e.g. Defence Minister; eyewitness Who are they said to represent? e.g. the government; themselves Are they afforded direct or indirect speech? e.g. direct quote; paraphrase What are they saying? e.g. “It wasn’t me. I didn’t do it.” Further steps: Potentially more successful after Named Entity Recognition (to prime LLM process) More to be done on stance detection (what stance does their statement represent?)

…what we’re exploring…

(With thanks to Tariq Choucair.)

LLM Prompting, Finetuning, Evaluation Prompt development: Highly LLM- (and version-) specific – e.g. ChatGPT 3.5 vs 4.0 Room for uncertainty (‘other’ category) and asking for code choice explanations may help Finetuning: Repeated tuning against manually coded ‘gold standard’ datasets can improve performance But may risk overtuning to specific contexts, creating worse results for more diverse data Evaluation: Potential to combine and average repeated LLM runs – using the same or different LLMs Important to remember that human coders don’t always agree either

…and where we’re going

Towards Advanced AI-Supported Analysis First steps towards frame analysis: Framing remains very topic-specific – may be difficult to develop general approach Potential to take initial topic modelling step (to identify key themes), … … and then ask LLM to determine how they are framed in articles Can we further enrich / preprocess the input data through Natural Language Processing? Exploring non-textual data: (Online) news content is increasingly image-, audio-, and video-based Some early steps towards AI transcription of AV content (but need to identify speakers) Promising tools for image clustering (still images / keyframes from videos) Much more to do in combining these data points in a meaningful way

🧑‍🏭 Very early prototype… And then what?

Current thinking: Quantifying specific aspects of individual participant activities, then identifying and interpreting similar patterns at a group level. * With particular thanks to Kateryna Kasianenko. Beyond Qualitative Interpretation: Practice Mapping* Image: Midjourney

Twitter @mention network during Voice to Parliament campaign Red: exclusively using #VoteNo; green: exclusively using #VoteYes. Twitter interaction pattern similarity network – based on cosine similarity between normalised interaction vectors per account, colours based on modularity detection. Pro-Voice campaigners Labor supporters Anti-Voice campaigners Liberal / National supporters

Account-to-account interactions (relative to interactive affordances available on any given social media platform) Account’s post content (topics, sentiment, hashtags, named entities, etc.) Account’s use of sources (URLs, domains, AI-coded source content features , etc.) Account’s profile information (name, description, etc.) Manually and computationally coded information about the account and its posts … Potential Patterns to Operationalise in Practice Mapping Image: Midjourney

Assessing Destructive Polarisation Key questions: Does practice mapping show distinct practices? What divergent patterns drive such distinctions? Do these patterns map onto one of the symptoms of destructive polarisation? (Or: do they represent a new pattern that might be seen as destructive – a new symptom?) How severe are these differences (i.e. how deeply and destructively polarised is the situation)? How are these patterns evolving over time? Image: Midjourney

Image: Midjourney

Thank you! Image: Midjourney

This research is supported by the Australian Research Council through the Australian Laureate Fellowship project Determining the Dynamics of Partisanship and Polarisation in Online Public Debate . Acknowledgments