A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

A Study of Variable-Role-based
Feature Enrichment in Neural Models of Code
Aftab Hussain -- Md Rafiqul Islam Rabin -- Bowen Xu -- David Lo -- Mohammad Amin Alipour
University of Houston
Singapore Management University
InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering,
co-located with the 45th International Conference on Software Engineering, ICSE 2023
Melbourne, Australia

Deep Neural models are
widely being used in
Software Development
Visual Studio
Visual
Studio Code
Neovim

Need to investigate approaches that can improve
the performance of deep neural models, under
more sustainable settings.

Feature
Enrichment

Add extra information to the training data.

Add extra information to the training data.
E.g., indicate data dependence between
variables.

So what information exactly
can we add?

Related Works
Feature Enrichment

Allamanis et al. [26] showed adding features that capture global context can
increase the performance of a model.
Each feature is a binary function based on the context tokens.
Interesting features in
code that could help
performance of CI
Models.

Related Works
Feature Enrichment

Rabin et al. [27] found that code complexity features can improve the
classification performance

Recent studies have shown that state-of-the-art models heavily rely on
variables [13, 28], specific tokens [29], and even structures [30].

Interesting features in
code that could help
performance of CI
Models.

Related Works
Changing Code Representation (Code Modelling)

NLP models to capture textual patterns in code [20, 21].

Augment code with dataflow graph info GraphCodeBERT. [24].

Dynamic embeddings, improved performance in RNN for code completion and
bug fixing. [25]

We try an idea from the CS Education
Doman, avoiding need to change code
representation

-Sajaniemi et al., classified variables
in programs into role categories
based on how they are used.

Variable Roles

-Sajaniemi et al., classified variables
in programs into role categories
based on how they are used.

-They applied the role concepts in
teaching programming learning
tasks.
Variable Roles

-Sajaniemi et al., classified variables
in programs into role categories
based on how they are used.

-They applied the role concepts in
teaching programming learning
tasks.

-Roles helped students in
understanding programming
concepts and applying programming
steps in writing programs.
Variable Roles

Roles Examples

for (int i=0 ; i<5; i++){
...
}
Stepper
A numeric for-loop variable that
iterates through a range of values
via arithmetic-operation based
updates
Roles Examples

ArrayList<String> list = new
ArrayList<String>();
list.add("A");
list.add("B");
...
// Iterator to traverse the list
Iterator iter = list.iterator();
while iter.hasNext() {...}
for (String elem: Elements){
...
}

Walker Type I
An enhanced
for-loop variable
Walker Type II
A container
iterator object
Roles Examples

Can adding variable roles in training
neural models of code improve their
performance and robustness, and thus reduce
the effort needed on training?

We present an unsupervised feature enrichment approach
based on variable roles.

We evaluate its impact on
the performance and robustness of Code2Seq.

for (int i=0 ; i<5; i++){
...
}
Adding Roles Information

for (int stepper_i=0 ; stepper_i<5; stepper_i++){
...
}
Adding Roles Information

Input
Program
Our Feature Enrichment Approach

Role Detector
Input
Program
Our Feature Enrichment Approach

Role Detector
List of Stepper
and Walker
Variables
Input
Program
Our Feature Enrichment Approach

Role Detector
List of Stepper
and Walker
Variables
Role Augmenter
Input
Program
Our Feature Enrichment Approach

Role Detector
List of Stepper
and Walker
Variables
Role Augmenter
Input
Program
Output
Program
Our Feature Enrichment Approach

Evaluating our Approach

The Datasets
Java-Large
We augment the Java-Large dataset,
for the method name prediction task:
I/P-method body, O/P-method name

Evaluating our Approach

Evaluating our Approach
Code2Seq-R - best performing Code2Seq after 50 epochs of training
with Augmented Dataset

Code2Seq-O - Pretrained Released Version of Code2Seq that had
been obtained after 52 epochs of training with the Original Dataset.

Evaluating our Approach

Original
(JL Test)
Generating Test Sets for Performance Evaluation

Original
Original,
Roles-added
+ Roles
(JL Test) (JLR Test)
Generating Test Sets for Performance Evaluation

Original
Original,
Roles-added
Filter out un-augmented
methods
Original,
Filtered
Original,
Roles-added,
Filtered
+ Roles
(JLR Test (F))(JL Test (F))
(JL Test) (JLR Test)
Generating Test Sets for Performance Evaluation

Original
Original,
Roles-added
Filter out un-augmented
methods
Original,
Filtered
Original,
Roles-added,
Filtered
+ Roles
(JLR Test (F))(JL Test (F))
(JL Test) (JLR Test)
Generating Test Sets for Performance Evaluation

1. All inputs without roles info.
2. All inputs with roles info.
3. Inputs with steppers and variables
only, but without roles info.
4. Inputs with steppers and walkers only,
with roles info.

Original
Original,
Roles-added
Filter out un-augmented
methods
Original,
Filtered
Original,
Roles-added,
Filtered
+ Roles
(JLR Test (F))(JL Test (F))
(JL Test) (JLR Test)
Generating Test Sets for Performance Evaluation

1. All inputs without roles info.
2. All inputs with roles info.
3. Inputs with steppers and walkers
only, but without roles info.
4. Inputs with steppers and walkers only,
with roles info.
We assess overall performance of the models using these test sets.

Original
(JL Test)
Generating Test Sets for Robustness Evaluation

We similarly get four
transformed
(noise-induced) test sets.

Original Transformed
Transform Variables (Noise induction)
(JLT Test)(JL Test)
Generating Test Sets for Robustness Evaluation

We similarly get four
transformed
(noise-induced) test sets.

Original Transformed
Transformed,
Roles-added
+ Roles
Transform Variables (Noise induction)
(JLT Test) (JLTR Test)(JL Test)
Generating Test Sets for Robustness Evaluation

We similarly get four
transformed
(noise-induced) test sets.

Original Transformed
Transformed,
Roles-added
Filter out un-augmented
methods
Transformed,
Filtered
Transformed,
Roles-added,
Filtered
+ Roles
Transform Variables (Noise induction)
(JLT Test) (JLTR Test)
(JLTR Test (F))(JLT Test (F))
(JL Test)
Generating Test Sets for Robustness Evaluation

We similarly get four
transformed
(noise-induced) test sets.

Original Transformed
Transformed,
Roles-added
Filter out un-augmented
methods
Transformed,
Filtered
Transformed,
Roles-added,
Filtered
+ Roles
Transform Variables (Noise induction)
(JLT Test) (JLTR Test)
(JLTR Test (F))(JLT Test (F))
(JL Test)
Generating Test Sets for Robustness Evaluation

We similarly get four
transformed
(noise-induced) test sets.
We assess overall robustness of the models using these test sets.

Results

Results – Overall Performance

Results – Performance
The impact of role augmentation is indistinguishable with respect to Code2Seq’s
performance in making method name predictions.

Results – Robustness
The impact of role augmentation is indistinguishable with respect to Code2Seq’s
robustness in making method name predictions.

Discussion

Discussion
- Code2Seq may already be capable of capturing the
surrounding structural context of a certain variable.

Discussion
- Code2Seq may already be capable of capturing the
surrounding structural context of a certain variable.

- Need to compare between variables more reliant on code
structure, and those that can be more flexibly used, e.g.,
fixed-value variables.

Discussion
- Code2Seq may already be capable of capturing the
surrounding structural context of a certain variable.

- Need to compare between variables more reliant on code
structure, and those that can be more flexibly used, e.g.,
fixed-value variables.

- Only two variable roles were considered. Although common,
they only covered 8% of the samples of the whole dataset
(1,117,159 of ~14m methods).

Conclusion
- We investigated the impact of explicitly adding variable role
information in code datasets on the performance of Code2Seq.

Conclusion
- We investigated the impact of explicitly adding variable role
information in code datasets on the performance of Code2Seq.

- To the best of our knowledge, this is the first work to evaluate
the impact of Sajaniemi et al.’s notion of variable roles, a
concept that was found to help students learn programming, on
neural models of code.

Conclusion
- We investigated the impact of explicitly adding variable role
information in code datasets on the performance of Code2Seq.

- To the best of our knowledge, this is the first work to evaluate
the impact of Sajaniemi et al.’s notion of variable roles, a
concept that was found to help students learn programming, to
neural models of code.

- This work motivates the need of a systematic framework on
how to provide models meaningful information to enable them
to learn faster and perform better.

Thank you

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx