Supporting software documentation with source code summarization

THEEMPERORRAFAT 28 views 42 slides Sep 04, 2024
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

Source code summarization is a process of generating summaries that describe software code, the majority of source code summarization usually generated manually, where the summaries are written by software developers. Recently, new automated approaches are becoming more useful. Thes...


Slide Content

Source Code Summarization
Ra'Fat Al-Msie'deen*, Anas H. Blasi
*Department of Software Engineering, Faculty of IT,
Mutah University, Mutah 61710, Karak, Jordan
E-mail address: [email protected]
https://rafat66.github.io/Al-Msie-Deen/

➢Tocitethisversion:
R.Al-Msie’deenandA.Blasi,“Supportingsoftwaredocumentationwith
sourcecodesummarization,”InternationalJournalofAdvancedand
AppliedSciences,vol.6,no.1,p.59–67,2019.
DOI:10.21833/ijaas.2019.01.008

Ra'Fat Al-Msie'deen
Suncode

Supporting software
documentation with
source code
summarization

Abstract
Source code summarization is a
process of generating summaries that
describe software code, the majority of
source code summarization usually
generated manually, where the summaries
are written by software developers.
Recently, new automated approaches are
becoming more useful. These approaches
have been found to be effective in some
cases. The main weaknesses of these
approaches are that they never exploit
code dependencies and summarize either
the software classes or methods but not
both. This paper proposes a source code
summarization approach (Suncode) that
produces a short description for each class
and method in the software system.

Abstract ... cont.
To validate the approach, it has been
applied to several case studies. Moreover,
the generated summaries are compared
to summaries that written by human
experts and to summaries that written by a
state-of-the-art solution. Results of this
paper found that Suncode summaries
provide better information about code
dependencies comparing with other
studies. In addition, Suncode summaries
can improve and support the current
software documentation. The results found
that manually written summaries were
more precise and shorter as well.

Suncode
Input
Extracting the software source code1
Software source code
Identifying rapid summary messages for each class and method2
Generating a readable English description for each class and method3
Output Text summaries for every class and method in the software
The name of this class/method is -----------------------------------
---------------------------------------------------------.
“English readable sentences”

Automatic Source Code
Summarization: Suncode
Keywords:
Software engineering, Software
documentation, Source code
summarization, Software comprehension,
Summary.

Suncode
•Suncode developed to document
legacy software systems. For
each class and method in the
software there is a summary. The
produced summary for the class
or method contains textual and
structural information based on
its code.

Suncode …
•Suncode used static code analysis to
parse the software code.
•Suncode accepts as inputs the software
code and produces a summary for each
class and method in the software.

Suncode …
Al-Msie'deen’s approach relies on generating a rapid summary messages for
each class and method in the software.
The rapid summary message is a predefined template.
This template is a natural language sentence describing a particular kind of
rapid summary message.
Each template is filled in with keywords selected from the extracted code.
Suncode approach constructs summaries by combining the rapid summary
messages.

Approach
overview

Source code
summarization process
1.Extracting the software source code
2.Identifying rapid summary messages
for each class and method
3.Generating a readable English
description for each class and method

Figure 2: XML
as a format of
expression of
object-oriented
source code.

Generate a class summary

Identifying rapid summary messages
for each class and method
•Based on the extracted XML file from the
previous step,
•Suncode produces different types of
messages that represent information about a
class’s/method’s context.
•These messages are called rapid summary
messages.
•Suncode creates six different types of rapid
summary messages that represent
information about a class’s context.
•These rapid summary messages are briefly
described in Table 1.

Table 1: Rapid
summary
messages that
Suncode creates
for class's context.

Fig. 3: Rapid summary messages that Suncode creates for
class's context

Table 3: An example of rapid summary messages
produced by Suncode for class's context (e.g.,
myoval class).

Generate a
method
summary
-
-

Table 2: Rapid summary messages that Suncode creates
for method's context.

Fig. 4: Rapid summary messages that Suncode creates for
method's context

Table 4: An example of rapid summary
messages generated by Suncode for
method's context (e.g., main method).

Figure 5: An example of a summary
generated by Suncode approach (e.g.,
draw method).

Experimentation
•To validate Suncode, the experiments were
conducted on two Java open-source software
systems:
1.NanoXML and ArgoUML. NanoXML:
http://nanoxml.sourceforge.net/orig/index.h
tml
2.ArgoUML: http://argouml-
downloads.tigris.org/argouml-0.28.1/

Table 5: The extracted summary from the
getResult method in NanoXML software.
NanoXML Javadocs: http://nanoxml.sourceforge.net/orig/NanoXML-2-JavaDoc/index.html

Table 6: The mined summary from the read method in NanoXML
software.
McBurney, P. W. and McMillan, C. (2016a). Automatic source code summarization of
context for java methods. IEEE Trans. Software Eng., 42(2):103–119.

Table 7: The extracted summary from the
ArgoStatusEvent class of ArgoUML software.
ArgoUML Javadocs: http://argouml-stats.tigris.org/nonav/javadocs/javadocs-0.28/

Results - Samples

Declared package message
Name summary message
Inheritance message
Access level message
Attribute message
Method message
Generate a class summary:

Name summary message
Access level message
Return data type message
Declared class message
Parameters number message
Parameters message
Local variable message
Attribute access message
Method invocation message
Generate a method summary:

The class name is MyOval. The access level of the class is public. The declared package is
Drawing.Shapes.coreElements. This class inherits from the MyShape class. This class contains the following
attribute: example. This class contains the following methods: MyOval and draw.
Declared package message
Name summary message
Inheritance message
Access level message
Attribute message
Method message

The method name is draw. The access level of the method is public. The method return data type is void. The method
is declared in the MyLine class. This method contains 1 parameter. The method signature consists of the following
parameter: [Graphics-g] . This method contains the following local variable: [painterPaintJPanel-PaintJPanel]. This
method accesses the following attribute: [g, which is declared in the MyShape class]. This method invokes the
following methods: [setColor, which is declared in the Graphics class], [getColor, which is declared in the MyShape
class], [drawLine, which is declared in the Graphics class], [getX1, which is declared in the MyShape class], [getY1,
which is declared in the MyShape class], [getX2, which is declared in the MyShape class] and [getY2, which is
declared in the MyShape class].
Name summary message
Access level message
Return data type message
Declared class message
Parameters number message
Parameters message
Local variable message
Attribute access message
Method invocation message

Conclusion
•This paper has presented a new approach for
automatically generating summaries of software classes
and methods. Suncode approach differs from other
approaches in that it summarizes the software classes
and methods. Moreover, Suncode exploits textual and
structural information to summarize software classes
and methods.
•The only input of the approach is the software code, and
the output is a set of English paragraphs describing
software classes and methods. Suncode used rapid
summary messages to identify the most important
information in the class and method context. Then,
Suncode aggregated messages to create an English
paragraph of this context. The authors have
implemented Suncode and evaluated its generated
results on three case studies.

Results
Results showed that all summaries
were identified. The authors have
compared the summaries produced
from Suncode to summaries written
by human specialists (e.g., Javadocs)
and to summaries written by a state-
of-the-art approach. The authors have
found that Suncode provided better
contextual information than manually
written and the state-of-the-art
summaries. Furthermore, authors can
improve the current software
documentation by combining Suncode
summaries with the manually written
summaries or the state-of-the-art
summaries. In contrast, manually
written summaries were more precise
and shorter.

Future directions
For future work, Suncode plans
to split the names of software
identifiers (e.g., package, class,
attribute and method) into
words by using the camel-case
splitting algorithm. It also plans
to use the class and method
comments in the summarization
process.

References

•Dave, N., Davis, D., Potts, K., and Asuncion, H. U. (2014). Uncovering file
relationships using association mining and topic modeling. In the Sixth
International Conference on Information, Process, and Knowledge
Management, pages 105–111. IARIA.
•Forward, A. and Lethbridge, T. (2002). The relevance of software
documentation, tools and technologies: a survey. In Proceedings of the 2002
ACM Symposium on Document Engineering, McLean, Virginia, USA, November
8-9, 2002, pages 26–33. ACM.
•Haiduc, S., Aponte, J., and Marcus, A. (2010a). Supporting program
comprehension with source code summarization. In Proceedings of the 32nd
ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE
2010, Cape Town, South Africa, 1-8 May 2010, pages 223–226. ACM.
•Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. (2010b). On the use of
automated text summarization techniques for summarizing source code. In
17th Working Conference on Reverse Engineering, WCRE 2010, 13-16 October
2010, Beverly, MA, USA, pages 35–44. IEEE Computer Society.
•Kanellopoulos, Y., Dimopulos, T., Tjortjis, C., and Makris, C. (2006). Mining
source code elements for comprehending object-oriented systems and
evaluating their maintainability. SIGKDD Explorations, 8(1):33–40.

•Ko, A. J., Myers, B. A., Coblenz, M. J., and Aung, H. H. (2006). An exploratory study of how
developers seek, relate, and collect relevant information during software maintenance tasks.
IEEE Trans. Software Eng., 32(12):971–987.
•LaToza, T. D., Venolia, G., and DeLine, R. (2006). Maintaining mental models: a study of
developer work habits. In 28
th
International Conference on Software Engineering (ICSE 2006),
Shanghai, China, May 20-28, 2006, pages 492–501. ACM.
•McBurney, P. W. and McMillan, C. (2014). Automatic documentation generation via source code
summarization of method context. In 22nd International Conference on Program
Comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, pages 279–290. ACM.
•McBurney, P. W. and McMillan, C. (2016a). Automatic source code summarization of context for
java methods. IEEE Trans. Software Eng., 42(2):103–119.
•McBurney, P. W. and McMillan, C. (2016b). An empirical study of the textual similarity between
source code and source code summaries. Empirical Software Engineering, 21(1):17–42.
•Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L. L., and Vijay-Shanker, K. (2013a).
Automatic generation of natural language summaries for java classes. In IEEE 21st
International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20-
21 May, 2013, pages 23–32. IEEE Computer Society.

•Moreno, L., Marcus, A., Pollock, L. L., and Vijay-Shanker, K. (2013b). Jsummarizer: An automatic
generator of natural language summaries for java classes. In IEEE 21st International
Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20-21 May, 2013,
pages 230–232. IEEE Computer Society.
•Radev, D. R., Hovy, E. H., and McKeown, K. R. (2002). Introduction to the special issue on
summarization. Computational Linguistics, 28(4):399–408.
•Roehm, T., Tiarks, R., Koschke, R., and Maalej, W. (2014). How do professional developers
comprehend software? In Software Engineering 2014, Fachtagung des GI-Fachbereichs
Software Technik, 25. February - 28. February 2014, Kiel, Deutschland, page 47. GI.
•Shi, L., Zhong, H., Xie, T., and Li, M. (2011). An empirical study on evolution of API
documentation. In Fundamental Approaches to Software Engineering - 14th International
Conference, FASE 2011, Held as Part of the Joint European Conferences on Theory and Practice
of Software, ETAPS 2011, Saarbrücken, Germany, March 26-April 3, 2011. Proceedings, pages
416–431. Springer.
•Sridhara, G., Hill, E., Muppaneni, D., Pollock, L. L., and Vijay-Shanker, K. (2010). Towards
automatically generating summary comments for java methods. In ASE 2010, 25th IEEE/ACM
International Conference on Automated Software Engineering, Antwerp, Belgium, September
20-24, 2010, pages 43–52. ACM.
•Yau, S. S. and Collofello, J. S. (1980). Some stability measures for software maintenance. IEEE
Trans. Software Eng., 6(6):545–552.

Supporting Software Documentation with Source Code
Summarization
Suncode

Source Code Summarization
Ra'Fat Al-Msie'deen*, Anas H. Blasi
*Department of Software Engineering, Faculty of IT,
Mutah University, Mutah 61710, Karak, Jordan
E-mail address: [email protected]
https://rafat66.github.io/Al-Msie-Deen/