Googling for Software Development: What Developers Search For and What They Find (MSR 2021)
andrehoraa
44 views
42 slides
Jul 25, 2024
Slide 1 of 42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
About This Presentation
Developers often search for software resources on the web. In practice, instead of going directly to websites (e.g., Stack Overflow), they rely on search engines (e.g., Google). Despite this being a common activity, we are not yet aware of what developers search from the perspective of popular softw...
Developers often search for software resources on the web. In practice, instead of going directly to websites (e.g., Stack Overflow), they rely on search engines (e.g., Google). Despite this being a common activity, we are not yet aware of what developers search from the perspective of popular software development websites and what search results are returned. With this knowledge, we can understand real-world queries, developers’ needs, and the query impact on the search results. In this paper, we provide an empirical study to understand what developers search on the web and what they find. We assess 1.3M queries to popular programming websites and we perform thousands of queries on Google to explore search results. We find that (i) developers’ queries typically start with keywords (e.g., Python, Android, etc.), are short (3 words), tend to omit functional words, and are similar among each other; (ii) minor changes to queries do not largely affect the Google search results, however, some cosmetic changes may have a non-negligible impact; and (iii) search results are dominated by Stack Overflow, but YouTube is also a relevant source nowadays. We conclude by presenting detailed implications for researchers and developers.
Size: 1010.49 KB
Language: en
Added: Jul 25, 2024
Slides: 42 pages
Slide Content
Googling for Software Development:
What Developers Search For and
What They Find
MSR 2021
Andre Hora
Developers often search for
software resources on the web
2
Developers often search for
software resources on the web
They may spend ~20% of
their time on the web
3
Code examples
Novel technologies
Bug-fixes
Documentation
etc.
4
Over 85% of
their traffic
come from
web search
engines
[alexa.com]
Stack Overflow
50M users/month
W3Schools
2.5B pageviews/year
6
7
8
9
What do developers search for
and what they find?
10
Search Queries
11
Search Queries Search Results
12
Search Queries Search Results
Understand real-world search
queries and developers’ needs
Detect where search engines
find software resources and
explore the results
13
Study Design
1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites
15
1
1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
16
1 2
1.3M distinct queries
1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
17
1 2
What Developers Search For
3
1.3M distinct queries
•RQ1: Query content
•RQ2: Query size & keywords
•RQ3: Query structure
•RQ4: Query similarity
1.stackoverflow.com
2.w3schools.com
3.geeksforgeeks.org
4.tutorialspoint.com
5.programcreek.com
Selecting the Websites Collecting the Search Queries
18
1 2
What Developers Search For
3
What Developers Find
4
•RQ1: Query content
•RQ2: Query size & keywords
•RQ3: Query structure
•RQ4: Query similarity
•RQ5: Result resources
•RQ6: Result variation
1.3M distinct queries
Search API
Results
RQ1
Query Content
RQ1 What is the content of the
search queries?
21
RQ1 What is the content of the
search queries?
22
RQ1 What is the content of the
search queries?
23
RQ1 What is the content of the
search queries?
24
Developers’ queries typically provide references to
technologies, such as programming languages (30%),
software technologies (24.5%), and web frameworks (5%)
RQ2
Query Size & Keywords
RQ2 What is the size of the search
queries? Where are the keywords located?
26
Size
Keyword position
49.2 65.2 48.7 3
RQ2 What is the size of the search
queries? Where are the keywords located?
27
Size
Keyword position
49.2 65.2 48.7 3
Developers’ queries are short: 3 words, on
the median. Keywords are more likely to be
the first than the last word in the query
RQ3
Query structure
RQ4
Query similarity
29
As in general web search, developers also tend to exclude
function words in their queries, which are mostly
composed of content words (80.3%).
Most of the developers’ queries are similar among each
other: while 40% have no similar counterpart, 60% have at
least one similar peer. 8% have 10 or more similar ones.
RQ3
RQ4
RQ5
Result resources
Search API
RQ5 Where does Google find
software resources?
31
RQ5 Where does Google find
software resources?
32
Google finds software resources mostly on Stack Overflow
(11%), YouTube (6%), and W3Schools (5%). However, the
results may vary according to query (keyword or general).
RQ6
Result variation
Search API
RQ6: How do Google results vary for similar queries?
34
RQ6: How do Google results vary for similar queries?
35
1
2
3
4
RQ6: How do Google results vary for similar queries?
36
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
RQ6: How do Google results vary for similar queries?
37
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser
RQ6: How do Google results vary for similar queries?
38
1
2
3
4
Word swap
Query 1: python email parser
Query 2: email parser python
Word removal
Query 1: python email parser
Query 2: email parser
Synonymous word
Query 1: python email parser
Query 2: py email parser
RQ6: How do Google results vary for similar queries?
39
RQ6: How do Google results vary for similar queries?
40
The links and order of the top 10 Google search results are very
likely to change due to similar queries, whereas the top 1 is much
less affected. However, overall, the intersection of links due to
similar queries is high, at least 70% in most cases.
1.Developers’ queries are likely to include key
contexts (e.g., technologies)
2.Developers’ queries share characteristics with
general ones: both are short and tend to omit
function words
3.Google finds software resources mostly on
Stack Overflow (11%) with an over-
concentration in the top 1 results (28%)
4.YouTube is a prominent source for Google
(mostly top 3 results of general queries)
5.Performing minor changes to queries do not
broadly affect the top 1 search results nor the
overall top 10 (however, there are exceptions!)
Takeaways
Googling for Software Development:
What Developers Search For and
What They Find
MSR 2021
Andre Hora