Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

PreethaChatterjee1 20 views 6 slides Jun 09, 2024
Slide 1
Slide 1 of 6
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6

About This Presentation

In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual...


Slide Content

21st International Conference on Mining Software Repositories
Incivility in Open Source Projects:
A Comprehensive Annotated Dataset of Locked
GitHub Issue Threads
Ramtin Ehsani, Mia Mohammad Imran , Robert Zita, Kostadin Damevski, Preetha Chatterjee
Drexel University
1
Preprint: https://arxiv.org/abs/2402.04183
Virginia Commonwealth
University
Elmhurst University
[email protected]

Motivation and Research Objective
•Fostering healthy collaborations, one of the main challenges in OSS
•Understanding and addressing incivility within OSS discussions is crucial
•A lack of a comprehensive approach to address uncivil interactions
•Scarcity of large annotated SE datasets
2
Research Objective: Curating a dataset of locked GitHub issue discussion threads
Annotated dataset of locked GitHub issue threads with heated discussions

Dataset Annotation
•404 Locked issue threads, and 5,961 Individual comments
•Locked as "too heated" or demonstrated clear characteristics indicative of heated
discussions
•A total of 19 annotators
•To further improve the annotation quality, we used GPT-4
•Manually checked the instances of disagreements between GPT and annotators
3

Annotated Features
4
•Tone Bearing Discussion Features (TBDFs), uncivil features*
⚬Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc.
•Triggers*
⚬Failed use of code, Technical disagreements, Communication breakdown, etc.
•Targets*
⚬People, Code/Tool, Company/organization, Undirected
•Consequences*
⚬Discontinued further discussion, Escalating further, etc.
*
C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022
*
Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021
*
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023
*
Our open coding process

Dataset Description
5
•1,365 comments annotated with an uncivil feature
⚬Bitter frustration, Impatience, and Mocking the most recurrents (~68%)

Summary Research Opportunities
● A curated dataset of 404 locked GitHub issue threads from 213
OSS projects
[ Scan the QR code to access]
● Bitter frustration, Impatience, and Mocking are the most
prevalent TBDFs
● Failed use of tool/code or error messages the most common
trigger
● People are the most common target of incivility
● Discontinued further discussion is the most common
consequence
6
Preprint: https://arxiv.org/abs/2307.15631
[email protected]
Preprint: https://arxiv.org/abs/2402.04183
[email protected]
● Resource for conducting comprehensive analyses of incivility
● Automatic incivility detection tools
○ an SE-specific incivility detection tool
● More than just flags for incivility
○ Offer insights into the specific types
● Exploration of how incivility might impact projects’ health
● Investigate how incivility affects targets from underrepresented
communities