Mutation testing for DSLs - The case of task-oriented chatbots

PabloGmezAbajo 102 views 54 slides Oct 19, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Mutation testing is a well-known technique for assessing the quality of software test suites. It involves introducing artificial faults into the source code, which generates a set of source variations called mutants. Next, we can apply the test suites to such mutants to account for how many of these...


Slide Content

https: //la ng dev con. org
https://langdevcon.org
Seville 17-19 October, 2024
Mutation testing for DSLs
The case of task-
oriented chatbots
Pablo Gómez-Abajo
Modelling & Software Engineering Research Group
Universidad Autónoma de Madrid

Introduction
▪DSLs are increasingly used to solve problems in specific domains
▪Like any other programming language, DSLs need to be tested
▪Usually by creating and using test suites
▪Mutation testing (MuT) is a common technique used to improve such
software test suites quality
63OI
2/53

What is mutation testing?
▪Approach of software testing to assess
the quality of the test suites
▪Injection of syntax changes in a
program by using a set of mutation
operators
▪The mutations introduced emulate
common programming faults
▪Useful to improve the quality of the
test suites and the mutation operators
set
63OI
3/53
original program mutants
test suite
alive mutantskilled mutants
mutation score
additional
test cases

Mutation testing for automata
63OI
4/53
0
110
Seed model
01
00
Test suite

Mutation testing for automata
5/53
0
101
0
110
Mutant model
01
00✓

MT
01
00✓

Test suite
63OI
Seed model
Test suite

Mutation testing for automata
6/53
0
101
0
110
01
00✓

The mutant is alive
MT
01
00✓

63OI
Mutant model
Test suite
Seed model
Test suite

Mutation testing for automata
7/53
0
110

01
00✓

10
63OI
Seed model
Test suite

Mutation testing for automata
8/53
0
101
0
110

01
00✓

10
MT
01
00✓

10
63OI
Mutant model
Test suite
Seed model
Test suite

Mutation testing for automata
9/53
0
101
0
110
The mutant is killed

01
00✓

10

01
00✓

10
MT
63OI
Mutant model
Test suite
Seed model
Test suite

Motivation
▪However, the existing MuTtools are
▪Specific for a language
▪Encoded by hand
▪They incur in high-costs of maintenance
▪To alleviate such inconveniences, we propose Wodel-Test
▪A model-based solution to engineer language-specific MuTtools
63OI
10/53

Wodel-Test
63OI
11/53
▪Amodel-basedsolutiontoengineer
mutationtestingtools
▪MuTtoolsforautomata,logiccircuits,
Java,ATL,chatbots,etc.
MuTtool
creator
Meta-
model
M2T
transf.
T2M
transf.
Language
support
Mutation
operators
(WODEL)
Mutation
support
Execution
support
Program
compilation
Test
execution
MuT tool specification
MuT tool
Program
under
test
Equivalence
criteria
Test
cases
Generates
input
MuT report
Tester

MuTtoolfor chatbots
▪We have used Wodel-Test to engineer a MuTtool for task-oriented
chatbots
▪The solution uses the intent-based chatbot meta-model created by
S. Pérez-Soler et al. [1]
[1] S. Pérez-Soler, E. Guerra, and J. de Lara. Model-driven chatbot development. In ER, volume 12400 of
LNCS, pages 207–222. Springer, 2020
63OI
12/53

What is a task-oriented chatbot?
▪A task-oriented chatbot is a software application used in natural language
and designed to solve a specific task
▪e.g., booking a ticket, ordering a pizza, setting a medical appointment
▪Via text or speech recognition
▪In the recent years, the use of chatbots has increased
…and many more
▪Since 2022, we also have open-domain chatbots (ChatGPT, etc.) which
engage in conversations on any topic, and which we do not cover in this
work
63OI
13/53

How do chatbots work?
User
NL phrase
Chatbot
chatbot
response
63OI
14/53

How do chatbots work?
User
NL phrase
intent
1
intent
n
Chatbot
match intent

intent
i

chatbot
response
3
extract
params
build
response
external
service
1
4
2
3
63OI
15/53

How do chatbots work?
1.The user sends a natural language
message to the chatbot Utterances
Utterances (user says)
Hi there!
I need to fly from Madrid to Seville on
Thursday at 8 AM
Good bye!
63OI
16/53

How do chatbots work?
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
63OI
17/53

How do chatbots work?
??
Intention?
1. The user sends a natural language
message to the chatbot
2. The chatbot tries to match the
message with an intention
63OI
18/53

How do chatbots work?
Hi there!
Intent: Match the user interaction with
an intention
User says Intent
Hi there!
63OI
19/53

How do chatbots work?
Hi there!
Intent
matched
Intent: Match the user interaction with
an intention
User says Intent
Hi there! Greet
63OI
20/53

How do chatbots work?
Book
I need
to fly
User says Intent
I need to fly from
Madrid to Seville on
Thursday at 8 AM
Intent: Match the user interaction with
an intention
63OI
21/53

How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
Intent: Match the user interaction with
an intention
63OI
22/53

How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8AM
Book a flight
HOW?!
Intent: Match the user interaction with
an intention
63OI
23/53

How do chatbots work?
Book
I need
to fly
Intent
matched User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
HOW?!
Providing training phrases: a set of examples that users can use to
express an intention. Required for matching inputs with intents
Intent: Match the user interaction with
an intention
63OI
24/53

How do chatbots work?
Book
Hi there
Intent
matched
Training phrases: a set of examples
that users can use to express an
intention
●Must be provided with the intent
Training phrase Intent
Hi there! Greet
Hello Greet
Hi Greet
Hey Greet
63OI
25/53

How do chatbots work?
Book
Training phrases: a set of examples
that users can use to express an
intention
●Must be provided with the intent
I need
to fly
Intent
matched Training phrase Intent
Airplane ticket from
Madrid to Barcelona
tomorrow at 10 AM
Book a flight
Flight from Madrid
to Bilbao on
19/10/2024 at 11:30
Book a flight
63OI
26/53

How do chatbots work?
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
63OI
27/53

How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
63OI
28/53

How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
City
63OI
29/53

How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
City entities
63OI
30/53

How do chatbots work?
to:Seville
3. Chatbot extracts information from
the message or asks for missing
information
I need
to fly
User says Intent
I need to fly from
Madrid to Sevilleon
Thursday at 8 AM
Book a flight
At this point, the chatbot extracts key information from the input: parameters
From:Madrid when:Thu. At 8 AM
TimeCity entities
63OI
31/53

How do chatbots work?
TimeCity entities
4. Build the response and send back
the response to the user
I need
to fly
●Responses to the user:
○text, images
●External service queries
○External API rest
○Database, etc.
Usersays Action
I need to fly from
Madridto Sevilleon
Thursday at 8 AM
The price of the
ticket is 120$.
Provide a card
nº and billing
name
Both, user responses and external services queries: actions
63OI
32/53

Testing chatbots
User
Chatbot
Testcase input Testcase output
Hi there! Hi! How can I help
you?
Hi
there!
Hi!
How can I
Help you?

complex
conversations
63OI
33/53

Testing chatbots
We use BotiumandRasa-testas the test suites to test the chatbots
#me
Hi there!
#bot
What day do you want to come in?
#me
GREET_UTTERANCES_USER
#bot
GREET_RESPONSES_USER
Single test interaction
Combination of multiple tests
GREET_UTTERANCES_USER
Hi there!
Hi
Hello
Hey
GREET_RESPONSES_USER
Hi! How can I help you?
Hello, what do you need?
Greetings! This is the flight ticket
assistant Antony, how can i help you?
Multiple user utterances
Possible responses
convo file
(conversation step)
utterances
responses
63OI
34/53

Testing chatbots
Hi
there!
I need to fly
from …
Hi!
How can I
Help you?
The price
of the
ticket …
I lost my
baggage
Please,
provide
the flight
ticket id
… and complex
conversations
63OI
35/53

Mutation testing for chatbots
Usersays Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Intent
matched
Order a coffee
Tell me what
kinds of coffee I
can drink here
Order a wine
63OI
36/53

Mutation testing for chatbots
Usersays Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
0.522
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
Order a coffee:Keeps the two most different phrases
Order a wine
Semantic similarity
63OI
37/53

Mutation testing for chatbots
Usersays Action
What kinds of coffee are available?
What kinds of coffee can I order?
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Order a coffee:Keeps the two most different phrases
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
0.522
0.538
0.475
0.474
Tell me what
kinds of coffee I
can drink here
63OI
38/53

Mutation testing for chatbots
Usersays Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
39/53

Mutation testing for chatbots
Usersays Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
40/53

Mutation testing for chatbots
Usersays Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
63OI
41/53

Mutation testing for chatbots
Usersays Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
response
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
63OI
42/53

Mutation testing for chatbots
Usersays Action
What can I drink here?
Tell me what drinks there are
You can take an
expresso or an
americano
User
NL phrase
Order a
coffe
intent
n
Chatbot
match intent

Order a
wine

chatbot
response
3
extract
params
build
respons
e
external
service
Usersays Action
What kinds of wine are available?
What kinds of wine can I order?
What can I drink here?
Tell me what drinks there are
You can take a
Spanish wine or
a French wine
Order a wine
Intent
matched
Tell me what
kinds of coffee I
can drink here
Order a coffee
Test-suite
63OI
43/53

Mutation operators for chatbots
Operators for training phrases
DP
maxDeletes the most representative phrase of
an intent
DP
minDeletes the most different phrase of an
intent
DPWPDeletes training phrases with required
parameter
DPWLDeletes training phrases with literal
K2P
maxKeeps the 2 most representative phrases
K2P
minKeeps the 2 most different phrases
MP
max
Moves the most representative phrase to
the most similar intent
MP
min
Moves the most different phrase to the
most different intent
Operators for intents
DIPDeletes intent parameter
DPPDeletes parameter prompt
SPOSets required parameter to optional
DFIDeletes fallback intent
Operators for entities
CREChanges regular expression
DLEDeletes literal from entity
Operators for actions
DADeletes actions
DPRDeletes a parameter used in a response
SOSwaps outputs
Operators for conversation flows
DCSDeletes conversation step
DCBDeletes conversation bifurcation
Emulation of common errors made by chatbot developers
63OI
44/53

Mutation testing for chatbots
Dialogflow
chatbot
model
parse
1
CONGA
meta-model
«conforms to»
annotate
annotated
chatbot
model
Tensorflow
annotation
meta-model
«conforms to»
2
mutate
mutation
operators
(WODEL)
3
chatbot
model
mutant
generate
4
chatbot
impl.
test
5
test suites
mutation
analysis
report
chatbot impl.
WODEL-TEST
63OI
45/53

RQ1: How applicable are the defined mut. ops.?
RQ2: How effective are the defined mut. ops.?
39%
48%
67%
60%
77%
73%
78%80%
67%
0%0%
40%
50%
76%
14%
89%
87%
96%
Alive
Killed
Mutation score by
mutation operator
63OI
46/53

RQ1: How applicable are the defined mut. ops.?
RQ2: How effective are the defined mut. ops.?
39%
48%
67%
60%
77%
73%
78%80%
67%
0%0%
40%
50%
76%
14%
89%
87%
96%
Alive
Killed
Mutation score by
mutation operator
63OI
47/53

RQ3: How effective is the MuTprocess?
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
63OI
48/53

RQ3: How effective is the MuTprocess?
Botium automatic Botium by hand Rasa test
45%
94%
20%
Alive
Killed
Mutation score
by test suite kind
63OI
49/53

RQ4: How efficient is the MuTprocess?
times0%
5%
10%
15%
20%
25%
30%
35%
0,1%0,2%0,3%
1,0%1,2%1,4%1,6%1,6%1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
63OI
50/53

RQ4: How efficient is the MuTprocess?
times0%
5%
10%
15%
20%
25%
30%
35%
0,1%0,2%0,3%
1,0%1,2%1,4%1,6%1,6%1,7%
2,6%
4,9%
8,4%
12,8%
27,5%
34,7%
Covid19_tracer
bikeShop
e2e-bot
Spaceonova
personal-bot
yassinelamarti
Rasa-demo
256644
h4h-chatbot
diagrams2ai
dusbot
legal-alien-chatbot
Email-WhatsApp-Integration
lankbanfinance
Data-mining
The mutation testing
process of 67% of the
chatbots was completed
in less than 90 minutes
63OI
51/53

Conclusions
▪Wodel-Test eases the engineering of MuTtools for DSLs
▪Wodel-Test is a better option when we need to
▪Access the source code of the mutants
▪Reason which mutants reduce the mutation score and why
▪Test new mutation operators
63OI
52/53

Future work
▪Automate the detection of semantically equivalent mutants
▪e.g., in the case of chatbots using confidence decrease heuristics
▪Automate the synthesis of tests able to kill the alive mutants
▪Optimize the MuTprocess →Parallelize the mutants generation
▪Chatbots: adapt our approach to LLM-based agents
63OI
53/53

https: //la ng dev con. org
https://langdevcon.org
Seville 17-19 October, 2024
Mutation testing for DSLs
The case of task-
oriented chatbots
63OI
Pablo Gómez-Abajo
Modelling & Software Engineering Research Group
Universidad Autónoma de Madrid