Studying Humans in Software Engineering [Keynote talk at BPM 2024]

aserebrenik 129 views 54 slides Sep 05, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Software development and business processes might appear very different but in both cases, humans are either actively involved or affected by whatever is developed or designed. So, it is not surprising that differences between the people developing software are reflected in how software is developed...


Slide Content

Studying Humans in
Software Engineering
Alexander Serebrenik
@aserebrenik
[email protected]
First of all, I would like to thank the chairs for inviting me.
When they have invited me I got kind of scared: after all, I am a software engineering researcher and not a BPM researcher! However, both in software development and
business processes humans are either actively involved or affected by whatever is developed or designed. Both software development and business process
management and enactment can be seen as forms of computer-supported collaborative work.

bigstockphoto.com
It is not surprising that differences between the people developing software are reflected in how software is developed and what the resulting software looks like. This
observation has been made on several occasions.

Fabio Palomba, Damian Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, Alexander Serebrenik: Beyond
Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells? IEEE Trans. Software
Eng. 47(1): 108-129 (2021)
Lambiase, Catolino, Tamburri, Serebrenik, Palomba, Ferrucci. Good Fences Make Good Neighbours? On the Impact of Cultural and
Geographical Dispersion on Community Smells, ICSE-SEIS, 2022
Black Cloud
Long Method
+
Radio Silence
Organizational Silo
+
Misplaced Class
+
Blob
+
+
Lone Wolf
Gender Diversity Index
-
+
Geographic Diversity
-
-
-
National Culture Diversity:
Individualism
-
-
Spaghetti Code
+
Feature Envy
+
National Culture Diversity:
Power Distance
+
Here we see on the left several variables related to composition of a software development team in terms of several diversity attributes; in the middle we have examples
of suboptimal communication within the team; on the right - suboptimal organisation of the source code, so called code smells.
Let us first take a look at the lower part of the scheme. Lone wolf: unsanctioned or defiant contributors who carry out their work irrespective or regardless of their peers,
their decisions and communication. If I am a Lone wolf, then I do not care about how other people look at my code, or I cannot put myself in their shoes, so I will not
restructure my code for them! Low diversity in national culture (individualism) means that developers from are coming from countries that value individualism in a similar
way; since the sample is US-dominated, this means that most developers are influenced by the highly competitive US culture, that is more likely to produce lone wolves.
We also see that more gender diverse teams are less likely to develop communication problems such as Black cloud. Back cloud means that there is so much
communication that the participants cannot distinguish useful communication and useless communication, and women are known to take mediating roles.
graph smells {
rankdir=LR
BC -- LM [label="+"]
OS -- MC [label="+"]
OS -- LM [label="+"]
OS -- B [label="+"]
RS -- B [label="+"]
GDI -- RS [label="+"]

GDI -- BC [label="-", penwidth=3, fontname="times-bold"]
GD -- RS [label="-"]
GD -- BC [label="-", penwidth=3, fontname="times-bold"]
GD -- OS [label="-"]
IDVD -- LW [label="-"]
IDVD -- RS [label="-"]
PDID -- RS [label="+"]
LW -- SC [label="+"]
LW -- FE [label="+"]
BC [style=filled, fillcolor=gray, label="Black Cloud"]
OS [style=filled, fillcolor=gray, label="Organizational Silo"]
RS [style=filled, fillcolor=gray, label="Radio Silence"]
LW [style=filled, fillcolor=gray, label="Lone Wolf"]
MC [style=filled, label="Misplaced Class"]
LM [style=filled, label="Long Method"]
B [style=filled, label="Blob"]
SC [style=filled, label="Spaghetti Code"]
FE [style=filled, label="Feature Envy"]
GDI [label="Gender Diversity Index"]
GD [label="Geographic Diversity"]
IDVD [label="National Culture Diversity: Individualism"]
PDID [label="National Culture Diversity: Power Distance"]
{rank=same;B;SC;FE;MC;LM}
{rank=same;LW;RS;OS;BC}
{rank=same;GD;IDVD;GDI;PDID}
edge[style=invis];
OS -- BC -- RS -- LW;
}

Photo by Yan Krukov from Pexels: https://www.pexels.com/photo/photo-of-woman-leaning-on-wooden-table-while-looking-upset-4458411/Canada Learning Code - September
Understanding software development calls for understanding humans involved, both their individual experiences and the ways they work as part of a team. Of course,
these two cannot be completely separated since the ways we feel and work as individuals influences and is influenced by the ways we collaborate and communicate.
What makes this even more complicated is that nowadays teams tend to involve both human and artificial developers, namely bots.

Photo by Yan Krukov from Pexels: https://www.pexels.com/photo/photo-of-woman-leaning-on-wooden-table-while-looking-upset-4458411/Canada Learning Code - September
To illustrate these two sides of our research in this talk we start by discussing a study aiming at understanding individual experiences of women older than 40 who
develop software (why 40? just wait and see), and then proceed to a study of how developers respond to bots on GitHub.

Qiu, Nolte, Brown, Serebrenik, Vasilescu (2019). Going Farther Together: The Impact of Social Capital on Sustained Participation in
Open Source. ICSE, IEEE, pp. 688-699
We have been studying experiences of women in software development since 2013… For example, in our previous work, we have observed that men engage for longer
on Stack Overflow and are similarly more likely to stay around for longer on GitHub. After one year ca 70% of men are still involved in the project, while for women this
percentage is closer to 60%

Qiu, Nolte, Brown, Serebrenik, Vasilescu (2019). Going Farther Together: The Impact of Social Capital on Sustained Participation in
Open Source. ICSE, IEEE, pp. 688-699
Based on the social capital theory we have hypothesised that developers involved in projects with different programming languages are more likely to diversify their skills
and be engaged for longer. We see that this indeed the case, the plot showing difference in survival between high- and low-level language diversity teams is always
positive. Moreover, after one year women benefit more from engagement in high-language diversity teams…

https://freerangestock.com/photos/142291/three-female-coworkers-talking-and-smiling-while-standing-in-the-workplace.html
However, as we have explored this domain we have started realising that it is not enough to look at women in software engineering as a homogenous group: obviously
different women might have different experiences and we need to understand their experiences.

Cech. The intersectional privilege of white able-bodied heterosexual men in STEM. Science Advances 8(24), 2022
GAP
Using survey data of U.S. STEM professionals (N = 25,324), this study of Erin Cech examines whether white able-bodied heterosexual men (WAHM) are uniquely
privileged in STEM. The results show that WAHM experience better treatment and rewards in STEM compared with members of all 31 other intersectional gender, race,
sexual identity, and disability status categories. This figure shows that WAHM experience more social inclusion; similar figures show higher professional respect, career
opportunities, salaries and persistence intentions (compared to STEM professionals in 31 other intersectional groups). https://www.science.org/doi/10.1126/
sciadv.abo1558

 Tokyo. Japan. 1997.  Peter Marlow Foundation Photographer Member of Magnum Photos
Gender and age are the most studied diversity axes in social software engineering research.

What age- and gender-specific experiences have veteran
software developers of marginalized genders had in their
careers?
Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
•Hence, our research questions were:…
•In our research questions and sampling approach, we included all marginalized genders, but our participants identified as
women plus one non-binary person who identified as woman for most of their career.
•Hence our results focus on women, while our research questions are broader.
RQ2. What strategies have veteran software developers of marginalized genders adopted that they perceive as contributing to their
survival in software engineering?

Overview of Sterre’s findings: strategies, experiences and perception. Of course, I do not expect you to see all the details.
The findings of the first study show that gender- and age-related experiences often cannot be easily separated. The red arrows show “age and gender” - related experiences and strategies as opposed to
those specific to gender or to age (white arrows).
older women, who are sometimes unsure of whether the negative experiences were because of their gender or their age.
There were not many Positive experiences related to age and gender, although Being a Role Model and More Opportunities Due to Gender and Age were found. One participant described how companies
specifically looking to develop products aimed at her demographic led to opportunities: “A company approached me and said they were in the business, they wanted to make an app that would help predict
who would have a stroke. . . They were like ‘our ideal candidate would be a Woman of Color [who has] also survived a stroke.’ ”
Negative experiences were far more common, such as Seen as Non/less Technical , which has also been widely observed in the literature.
We found that Gender Related Strategies contained the most strategies, with eight separate categories and 308 code segments. The categories were: Against Gender Bias Strategies, Career Related
Strategies, Changing Work Environment, Changing Your Appearance, Communication Methods, Ignoring Situations, Traditionally Feminine, and Traditionally Masculine. Of these, Against Gender Bias
Strategies was the largest category, with 70 code segments and eight subcodes, such as Backing Other Women Up

A company approached me. . . They
were like ‘our ideal candidate would be
a Woman of Color [who has] also
survived a stroke’. —Elliot
As I approached menopause, there was
another shift of just this contempt,
because you’re not even a sexually
available female. And there’s ‘No, I don’t
even have an interest in having sex
with you and so why would I ever listen
to you?‘ You’re going to try and tell me
I’m wrong and you’re unattractive.’ So it
got worse.. —Emery
+
-
Here we see the part of the sunburst related to experiences.
older women, who are sometimes unsure of whether the negative experiences were because of their gender or their age.
Please take time to read he quotes.

The second study I would like to discuss with you started from the realisation that human software developers are no longer the only contributors nowadays: whether AI-
powered or not, development teams involve automatic developers ranging from relatively simple ones welcoming new contributors to open source projects to more
complex ones contributing code and performing code reviews.

What bot characteristics shape human perceptions of bot
behavior?
Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A. Ernst, Alexander Serebrenik, Andrzej Wasowski: Autonomy Is
An Acquired Taste: Exploring Developer Preferences for GitHub Bots. ICSE 2023: 1405-1417

https://perchance.org/ai-text-to-image-generator
We have observed that less experienced GitHub users tend to prefer reactive, less autonomous bots, that receive explicit commands, not dissimilar to the robot on the
left that is given a pencil to draw. More experienced GitHub users prefer more autonomous bots, like a picture on the right showing a person and a robot drawing
together. Autonomy is hence an acquired taste.
Therefore, it should be possible to more effectively configure bot behavior

So far I have focused on the results of our studies. Next let us open a bit the black box and see how these studies have been conducted, and what design decisions have
been embedded in these studies.

What age- and gender-specific experiences have veteran
software developers of marginalized genders had in their
careers?
Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
Recall our RQ. We are interested in experiences.

https://www.nonprofitmarketingguide.com/blog/wp-content/uploads/2017/05/bigstockphoto_Interview_Time_4408972.jpg
INTERVIEW
Since we are interested in obtaining profound insights in the individuals’ experience, we have decided to conduct interviews.

What age- and gender-specific experiences have veteran
software developers of marginalized genders had in their
careers?
Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
Answering this RQ calls for understanding of two difficult constructs, namely “being old” and “being of minoritized gender”.

Sebastian Baltes, George Park, Alexander Serebrenik. Is 40 the new 60? How popular media portrays the employability of older
software developers, IEEE Software, 37(6):26-31, 2020
The problem here is that not much research has been conducted on the perception of age among software developers, so we used
our own previous work.

•Previous research has shown that 40 is the threshold when developers are starting to be seen as old, so we have reused the
same threshold in the current study.
•We further operationalized ‘older’ as having at least 18 years of experience in the software industry, based on a typical career
including school and university.

However, this study has been conducted on the US public discourse, which means that the subsequent study should be US-based as well.

Alexander Serebrenik. How to Ask about Gender Identity of Software Engineers and “Guess” it from the Archival Data, Equity, Diversity,
and Inclusion in Software Engineering: Best Practices and Insights (Damian et al. eds.) 2024.
When it comes to gender, there are roughly three ways: ask people about their gender, select the dataset that explicitly records gender-related information or use an
algorithmic tool to guess it based on such information as names or profile images.

Alexander Serebrenik. How to Ask about Gender Identity of Software Engineers and “Guess” it from the Archival Data, Equity, Diversity,
and Inclusion in Software Engineering: Best Practices and Insights (Damian et al. eds.) 2024.
Limited number
of responses
Best answers
possible
Good but possibly
outdated answers
Few datasets record
gender explicitly
Imprecise,
unreliable, ethics?
As many data
points as needed
Advantages and disadvantages of each one of the solutions

Alexander Serebrenik. How to Ask about Gender Identity of Software Engineers and “Guess” it from the Archival Data, Equity, Diversity,
and Inclusion in Software Engineering: Best Practices and Insights (Damian et al. eds.) 2024.
Limited number
of responses
Best answers
possible
Good but possibly
outdated answers
Few datasets record
gender explicitly
Imprecise,
unreliable, ethics?
As many data
points as needed
In our work we have focused on the second way but we will see that we have made use of the two other techniques as well.

Created by popcornarts
from the Noun Project
So we need a dataset with the US-based data about software developers that explicitly records gender and age.

So we have found a tweet by Tracy Chou. She is one of Time's 12 Women of the Year (2022), a software engineer and advocate
for diversity in technology-related fields. Her tweet alludes to the negative experiences and rarity of older women who are still
active in the field of software development.

This tweet has started an entire thread and we have found several similar threads. We see that individuals responding to the message
by Tracey Chou voluntarily divulge their number of years of experience in tech and subsequently their age.

So what about their gender? As I mentioned before, one of the options would be to use a guessing tool that would try to infer gender based on the first name. While this
is oftentimes the only option, this solution is suboptimal for many reasons including that we can only infer how others perceive the person, not how they perceive
themselves. Moreover, this would not work for the second example.

This is why we have focused on individuals that have either indicated pronouns typically associated with women and non-binary people (she/they), used gender-specific
terms (mother, wife) or at least do not include pronouns and terms that are typically associated with men (he, husband). Of course, this is a rather coarse grained way of
identifying people and this still needs to be verified.

Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
This is why to ensure correctness of our dataset, we used a prescreening survey to verify that the respondents actually belong to our demographics.

Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
Now we have two options: either to proceed with the interviews or go back to our data collection phase and consider additional threads.

Sterre van Breukelen, Ann Barcomb, Sebastian Baltes, Alexander Serebrenik: "STILL AROUND": Experiences and Survival Strategies
of Veteran Women Software Developers. ICSE 2023: 1148-1160
We have decided to continue with the group we had and interviewed the participants. We have stopped after 14 interviews since we have reached theoretical saturation.
The notion of theoretical saturation comes from the grounded theory

Greta Hoffman from Pexels
•To evaluate the stability of our findings, we also included three participants that, while not strictly belonging to the target demographics, might share
experiences and strategies with other interviewees.
•One participant slightly younger, one who left the industry, and one identified as non-binary.
•The answers of those participants were in line with the other feedback we got from our interviews.
•In total, we have conducted 14 interviews until reaching saturation.

Martin P. Robillard, Deeksha M. Arya, Neil A. Ernst, Jin L. C. Guo, Maxime Lamothe, Mathieu Nassif, Nicole Novielli, Alexander
Serebrenik, Igor Steinmacher, Klaas-Jan Stol: Communicating Study Design Trade-offs in Software Engineering. ACM Trans. Softw.
Eng. Methodol. 33(5): 112:1-112:10 (2024)
Trade-offs
Essentially what we have is a series of decisions and trade-offs: what research method do we use (interviews or surveys), how do we define “being old”, how do we
guess gender?

Martin P. Robillard, Deeksha M. Arya, Neil A. Ernst, Jin L. C. Guo, Maxime Lamothe, Mathieu Nassif, Nicole Novielli, Alexander
Serebrenik, Igor Steinmacher, Klaas-Jan Stol: Communicating Study Design Trade-offs in Software Engineering. ACM Trans. Softw.
Eng. Methodol. 33(5): 112:1-112:10 (2024)
1.decision point
2.alternatives
3.considerations
4.rationale
5.implications
1.A trade-off is identified by its decision point, which can act as its identifier.
2.Alternatives, relative importance of these alternatives. A review of alternatives can include properties, such as whether or not the set of alternatives is closed or
whether or not they are mutually exclusive.
3.The selection of one alternative over competing options is the outcome of a system of considerations that relates the costs and benefits of each alternative, as well as
constraints limiting the design space.
4.The rationale for selecting an alternative can then be expressed in terms of these considerations.
5.Additional discussion of the implications of the decision supports an in-depth exploration of the consequences of the choice made, in contrast to the inevitably more
general cost-benefit calculus involved in the previous point (i.e., considerations).

So let us revisit one of the decisions we have taken when studying how software developers perceive bots on GitHub.

We have used vignettes!

1.decision point: how to recruit participants that represents the
population of software developers who use pull requests.
We required a sample of participants that represents the population of software developers who use pull requests.

2. alternatives
Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A. Ernst, Alexander Serebrenik, Andrzej Wasowski: Autonomy Is
An Acquired Taste: Exploring Developer Preferences for GitHub Bots. ICSE 2023: 1405-1417
The alternatives were (1) to recruit developers from Prolific, an online platform that offers a pool of study participants and tools for managing payment and other study
operations; (2) to recruit students via university channels; (3) to approach developers directly using GitHub profiles; (4) to choose another crowd-worker platform such as
Mechanical Turk (MTurk); or (5) a combination of these. Our decision was to recruit developers from Prolific and also from students in our classes.

3. considerations
Created by Lusi Astianah
from Noun Project
%
Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A. Ernst, Alexander Serebrenik, Andrzej Wasowski: Autonomy Is
An Acquired Taste: Exploring Developer Preferences for GitHub Bots. ICSE 2023: 1405-1417
Quality of the responses
Likelihood of the responses
Cost

4. rationale
%
++++
Created by Lusi Astianah
from Noun Project
+
+++++++
+++++
++++
Quality of the responses
Likelihood of the responses
Cost

Amir Ghorbani, Nathan Cassee, Derek Robinson, Adam Alami, Neil A. Ernst, Alexander Serebrenik, Andrzej Wasowski: Autonomy Is
An Acquired Taste: Exploring Developer Preferences for GitHub Bots. ICSE 2023: 1405-1417
5. implications
One implication of our choice was that we lost a large number of initial volunteers who failed the Prolific screening. Only 12% were able to complete the study. However,
rigorous screening( and attention questions) gave us more confidence in the validity of the results.

So, let us try to move to the BPM world and see how these trade-offs and decisions are discussed in the literature. And I need to apologise upfront: I am not a BPM
researcher, so I might be misinterpreting some elements in your research. I will not discuss individual papers. However, I have performed a brief check with a senior
member of the community that
So in 2023, the Foundations track has published 7 papers, the engineering track 11 papers and the management track - 9 papers.

What trade-offs do you resolve
when choosing your data?
Recall the way we have discussed how we have selected the data in the study of veteran women in software development. The question I was wondering about was
what trade-offs do you resolve when choosing your data.

Live on the 17th of April, 2015. Construma Exibition. Car racing simulator in
action. SBR Racing. Sources: versenyszimulator.hu ©© Derzsi Elekes Andor,
Budapest, 2015
Glen Duncombe. In Car Footage from a Van Diemen RF01 driven by Micheal
Fitzgerald Cork Racing. 12 December 2010. WIkiCommons.
Specifically, papers in the foundation and engineering tracks tend to make a distinction between “simulated” and “real-world” logs. The word “simulated” seems to be
used to mean “artificially produced” without further indication of what particular trade-offs have been inherent in the simulation process - what has been simulated and
why. In the same way, the “real world” is diverse and techniques that work for a racing car, do not necessarily work for trucks, family cars or bicycles!

BPI Challenge
One of the surprising (for me) things was the dominance of the logs of the BPI challenge. At least 6/7 in the Foundation track, and 6/11 in the Engineering track use one
or more of the BPI challenge logs.

https://i.ytimg.com/vi/cLxnoK6hKEk/maxresdefault.jpg
And while the BPI challenge, or a mining challenge in general is a great idea, overuse of this data comes with a risk of the techniques geared towards a limited number of
specific instances of data, in this case, logs.
This means that techniques that work well in the lab environment on these particular log datasets, might break when applied in the real world to different kinds of logs.

https://cdn.labx.com/v2/images/articles/52740c6d-8b7e-44ed-a42c-14eba1a72052.webp
An alternative solution would require a careful reflection that the techniques might work only in context of the specific logs, in the same way the lab glass can only used in
the constrained environment of the fume hood. This would also require analysis of the context of the original data and interplay with the techniques designed.

Martin P. Robillard, Deeksha M. Arya, Neil A. Ernst, Jin L. C. Guo, Maxime Lamothe, Mathieu Nassif, Nicole Novielli, Alexander
Serebrenik, Igor Steinmacher, Klaas-Jan Stol: Communicating Study Design Trade-offs in Software Engineering. ACM Trans. Softw.
Eng. Methodol. 33(5): 112:1-112:10 (2024)
Trade-offs
I would like therefore to call the BPM community too carefully reflect on the decisions and the trade-offs implicit in the studies. One cannot just take an existing log,
ignore its context and try to analyse it.

Alexander Serebrenik @aserebrenik [email protected]
So let us summarise. We have started with discussing two recent studies of humans in software engineering, the first one was focused on experiences and survival
strategies of veteran women, the second one - on developers’ perceptions of bots on GitHub.

Alexander Serebrenik @aserebrenik [email protected]
Then we have revisited these studies from the methodological perspective, opening the black box and observed that each study involved multiple trade-offs, careful
consideration of alternatives and the implications of the choices made.

Alexander Serebrenik @aserebrenik [email protected]
Finally, we have tried to look to the current BPM research though the lens of these trade-offs and we have observed that often times the same data is used outside of the
specific context; I would like to call the BPM community too carefully reflect on the decisions and the trade-offs implicit in your studies.