NEWORDER Project - Science in the online knowledge order

stefandietze 11 views 15 slides May 21, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Project overview of NEWORDER project (https://www.gesis.org/en/research/external-funding-projects/details/project/213/understanding-the-erosion-of-the-traditional-knowledge-order-in-scientific-online-discourse-and-its-impact-in-times-of-crisis)


Slide Content

NEWORDER –Science in the Online Knowledge
Order
Stefan Dietze, 13.10.2023

Discourse Interactions
Algorithms/AI
Motivation: science online discourse vs offline society & policies
Society, Media, Politics & Policies
(Offline & Online)
Science discourse online
(NEWORDER focus: news & social media) 2

3
▪Percentage of tweets containing
links to scientific articles (journals,
publishers, science blogs etc)
▪Uses list of > 17 K science web
domains (URLs)
▪Data source: 1% sample of Twitter
(https://data.gesis.org/tweetskb/),
(> 14 bn tweets archived since
2013)
Motivation: scientific online discourse is on the rise
Example: Twitter / X

4
NEWORDER project
Interdisciplinary approach, team and objectives
▪Perception of roles, sources, and authority; impact on trust-
worthiness assessment
(Cognitive Psychology)
▪Dissolution of phases, hierarchies and contexts in the
scientific process
(Social Sciences, Media & Communication Studies)
▪Computational methods for collecting, detecting and
classifying scientific online discourse
(Computer Science/AI & Computational Linguistics)
Cress, Utz (IWM & Uni Tübingen)
Marcinkowski, Koss (HHU)
Dietze, Boland, Jabeen (GESIS), Kallmeyer (HHU)

How can „scientific discourse“ be defined?
Example: Twitter / X
5
Science claim
Science reference
Science relevance
No science
Science reference
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse,
CIKM2022

Training AI to detect science discourse: SciTweets dataset & classifier
6
▪Manual annotation of ground truth dataset for
testing models (heuristics-based sampling,
annotation framework, > 1K annotated tweets)
▪Training AI models to detect science discourse in
large-scale discourse data (e.g. Web archives)
▪Reasonable classification performance using fine-
tuned language model (SciBERT) applied to
TweetsKBdata
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse,
CIKM2022
https://github.com/AI-4-Sci/SciTweets

What is science discourse and how does it evolve?
Increasing amount and proportion of not peer-reviewed science works
7
Absolute amount of tweets sharing preprints Proportion of preprints among shared science URLs

How is public attention distributed?
Power law distribution
8
•10% of studies receive
> 75% of all Twitter
mentions
•Long tail of studies
with few mentions
•Data source: 1.67 M
tweets mentioning at
least one of the
primary science
studies in the
„Altmetrics“ corpus
Top x (%) of mentioned science studies
Share
of

twitter

mentions
(%)

Challenge: online science discourse is not well-informed
Links to actual scientific studies/context missing in news & social media
9

▪NLP models able to predict missing primary science reference (e.g. DOI or journal paper link) for
given informal reference (e.g. “Heinsberg Studie”) or secondary reference (news article)
Challenge: online science discourse is not well-informed
Links to actual scientific studies/context missing in news & social media
10

▪NLP models able to predict missing primary science reference (e.g. DOI or journal paper link) for
given informal reference (e.g. “Heinsberg Studie”) or secondary reference (news article)
Challenge: online science discourse is not well-informed
Links to actual scientific studies/context missing in news & social media
11
▪Supervised & unsupervised approaches using DL language models

Science discourse is „different“
12Examples from http://snopes.com
Non-science claim
Science claim

Computational (AI) challenge
NLP methods (e.g. for fact-checking) perform worse on science discourse
13
▪Take-away: AI-based methods geared towards scientific discourse required
Performance of state-of-the-art AI/deep learning using standard benchmark datasets
Claim Check-Worthiness
Detection
Fake News Detection
Claim Verification

Wrap-Up & Outlook: Interdisciplinary Work PlanMedia & Communication Studies
(spreadingpattern& societalimpact)
WP5 Longitudinal online discourse analysis
WP2Dissolution ofphases& contextsWP3 Perceptionofroles, credibility& trust
WP1 Data collection& studypreparation
Cognitive& SocialPsychology
(effectson individuals)
Computer & Information Science
(understandingonline discourse)
WP4 NLP for classifying sources & roles

15
http://gesis.org/en/kts