Googling for Software Development: What Developers Search For and What They Find (MSR 2021)

andrehoraa 44 views 42 slides Jul 25, 2024
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

Developers often search for software resources on the web. In practice, instead of going directly to websites (e.g., Stack Overflow), they rely on search engines (e.g., Google). Despite this being a common activity, we are not yet aware of what developers search from the perspective of popular softw...


Slide Content

Googling for Software Development:

What Developers Search For and
What They Find
MSR 2021
Andre Hora

Developers often search for
software resources on the web
2

Developers often search for
software resources on the web
They may spend ~20% of
their time on the web
3

Code examples
Novel technologies
Bug-fixes
Documentation
etc.
4

Stack Overflow

50M users/month
W3Schools

2.5B pageviews/year
5

Over 85% of
their traffic
come from
web search
engines

[alexa.com]
Stack Overflow

50M users/month
W3Schools

2.5B pageviews/year
6

7

8

9

What do developers search for
and what they find?
10

Search Queries
11

Search Queries Search Results
12

Search Queries Search Results
Understand real-world search
queries and developers’ needs
Detect where search engines
find software resources and
explore the results
13

Study Design

1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites
15
1

1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
16
1 2
1.3M distinct queries

1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
17
1 2
What Developers Search For
3
1.3M distinct queries
•RQ1: Query content
•RQ2: Query size & keywords
•RQ3: Query structure
•RQ4: Query similarity

1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
18
1 2
What Developers Search For
3
What Developers Find
4
•RQ1: Query content
•RQ2: Query size & keywords
•RQ3: Query structure
•RQ4: Query similarity
•RQ5: Result resources
•RQ6: Result variation
1.3M distinct queries
Search API

Results

RQ1
Query Content

RQ1 What is the content of the
search queries?
21

RQ1 What is the content of the
search queries?
22

RQ1 What is the content of the
search queries?
23

RQ1 What is the content of the
search queries?
24
Developers’ queries typically provide references to
technologies, such as programming languages (30%),
software technologies (24.5%), and web frameworks (5%)

RQ2
Query Size & Keywords

RQ2 What is the size of the search
queries? Where are the keywords located?
26
Size
Keyword position
49.2 65.2 48.7 3

RQ2 What is the size of the search
queries? Where are the keywords located?
27
Size
Keyword position
49.2 65.2 48.7 3
Developers’ queries are short: 3 words, on
the median. Keywords are more likely to be
the first than the last word in the query

RQ3
Query structure
RQ4
Query similarity

29
As in general web search, developers also tend to exclude
function words in their queries, which are mostly
composed of content words (80.3%).
Most of the developers’ queries are similar among each
other: while 40% have no similar counterpart, 60% have at
least one similar peer. 8% have 10 or more similar ones.
RQ3
RQ4

RQ5
Result resources
Search API

RQ5 Where does Google find
software resources?
31

RQ5 Where does Google find
software resources?
32
Google finds software resources mostly on Stack Overflow
(11%), YouTube (6%), and W3Schools (5%). However, the
results may vary according to query (keyword or general).

RQ6
Result variation
Search API

RQ6: How do Google results vary for similar queries?
34

RQ6: How do Google results vary for similar queries?
35
1
2
3
4

RQ6: How do Google results vary for similar queries?
36
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python

RQ6: How do Google results vary for similar queries?
37
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser

RQ6: How do Google results vary for similar queries?
38
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser
Synonymous word
Query 1: python email parser
Query 2: py email parser

RQ6: How do Google results vary for similar queries?
39

RQ6: How do Google results vary for similar queries?
40
The links and order of the top 10 Google search results are very
likely to change due to similar queries, whereas the top 1 is much
less affected. However, overall, the intersection of links due to
similar queries is high, at least 70% in most cases.

1.Developers’ queries are likely to include key
contexts (e.g., technologies)
2.Developers’ queries share characteristics with
general ones: both are short and tend to omit
function words
3.Google finds software resources mostly on
Stack Overflow (11%) with an over-
concentration in the top 1 results (28%)
4.YouTube is a prominent source for Google
(mostly top 3 results of general queries)
5.Performing minor changes to queries do not
broadly affect the top 1 search results nor the
overall top 10 (however, there are exceptions!)
Takeaways

Googling for Software Development:

What Developers Search For and
What They Find
MSR 2021
Andre Hora
Tags