Presentation of our IEEEAccess paper, also presented at the 18CCC
Abstract: Pattern matching algorithms have been studied on numerous occasions, mainly focusing on performance because of the large amount of data used in a matching process. However, a strong focus on performance can entail particula...
Presentation of our IEEEAccess paper, also presented at the 18CCC
Abstract: Pattern matching algorithms have been studied on numerous occasions, mainly focusing on performance because of the large amount of data used in a matching process. However, a strong focus on performance can entail particular issues like the lack of flexibility to match patterns. As a consequence, programming developers need to tweak matching algorithms in contortive ways or create new specialized ones altogether if their specific needs are not supported. Inspired by the self-replication behavior of cells in biology, we explore and evaluate the design and implementation of an algorithm to flexibly match patterns, named Matcher Cells. Through the composition of simple rules applied to cells, developers can adjust the matching semantics of this algorithm to different needs. We describe this algorithm using a pure functional language as a recipe for any Turing-complete programming language and then offer an object-oriented architecture for languages like Java. To show the flexibility of our proposal, we use a concrete implementation in TypeScript to describe two applications, from different domains, that use pattern matching in a stream of tokens. Additionally, we carry out performance and developer experience empirical evaluations with undergraduate students using Matcher Cells. Finally, we discuss the pros and cons of using a biological-based algorithm, exploiting the compositions of rules, to match patterns.
Size: 5.03 MB
Language: en
Added: Sep 04, 2024
Slides: 43 pages
Slide Content
EXPLORING A SELF-REPLICATION
ALGORITHM TO FLEXIBLY MATCH PATTERNSPAUL LEGER, HIROAKI FUKUDA, NICOLAS CARDOZO, DANIEL SAN MARTIN
1 2 3 1 UNIVERSIDAD CATÓLICA DEL NORTE, COQUIMBO - CHILE
SHIBURA INSTITUTE OF TECHNOLOGY, TOKYO, JAPÓN
UNIVERSIDA DE LOS ANDES, BOGOTÁ - COLOMBIA
1
2
3 3 [email protected]/ @ncardoz
2
The pattern matching problem
Reference string:ABC ABCDAB ABCDABCDABDE
Pattern: ABCDABD
Find the occurrences of the pattern in the reference string
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABCDABD
ABC ABCDAB ABCDABCDABDE
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABCDABD
ABC ABCDAB ABCDABCDABDE
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABCDABD
ABC ABCDAB ABCDABCDABDE
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABCDABD
ABC ABCDAB ABCDABCDABDE
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABCDABD
ABC ABCDAB ABCDABCDABDE
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABC ABCDAB ABCDABCDABDE
ABCDABD
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABC ABCDAB ABCDABCDABDE
ABCDABD
Optimal time complexity O(n)
2
The pattern matching problem
Reference string:
Pattern:
Find the occurrences of the pattern in the reference string
ABC ABCDAB ABCDABCDABDE
ABCDABD
Optimal time complexity O(n)
Must have complete information about the pattern and the reference string
3
Flexibility in matching algorithms
request
3
Flexibility in matching algorithms
request
malicious request
3
Flexibility in matching algorithms
request
malicious request
4
Flexibility in matching algorithms
request
malicious request
Request filter
4
Flexibility in matching algorithms
request
malicious request
Request filter
4
Flexibility in matching algorithms
request
malicious request
Request filter
4
Flexibility in matching algorithms
Request filter
4
Flexibility in matching algorithms
Request filter
5
1.The stream sequence and the
pattern may be unknown
beforehand
2.New patterns are required to
satisfy new requirements
6
Matcher Cells
biologically inspired algorithm using self-replication to flexibly match patterns
Replicase pattern
original component
Replicated component
(with possible mutations)
7
The Matcher Cells patterns
a
a
a
$
Pattern sequence definition
Received token
Parent-child link
$Matched cell
Solution (environment)
7
The Matcher Cells patterns
a
a
a
$
a
b
ab→
ab→
Pattern sequence definition
Received token
Parent-child link
$Matched cell
Solution (environment)
8
Flexible semantics
Match rules can be applied to each cell using the current token
a
a
a
$
8
Flexible semantics
Match rules can be applied to each cell using the current token
a
a
a
$
per-cell rules
8
Flexible semantics
Match rules can be applied to each cell using the current token
After all per-cell rules are applied, it is possible to keep/kill cells in a solution to
enable further matching in the same solution
a
a
a
$
per-cell rules
8
Flexible semantics
Match rules can be applied to each cell using the current token
After all per-cell rules are applied, it is possible to keep/kill cells in a solution to
enable further matching in the same solution
a
a
a
$
per-cell rules
per-solution rules
8
Flexible semantics
Match rules can be applied to each cell using the current token
After all per-cell rules are applied, it is possible to keep/kill cells in a solution to
enable further matching in the same solution
a
a
$
per-cell rules
per-solution rules
8
Flexible semantics
Match rules can be applied to each cell using the current token
After all per-cell rules are applied, it is possible to keep/kill cells in a solution to
enable further matching in the same solution
a
a
$
per-cell rules
per-solution rules
Rules can be composed by means of a composable rule that takes a rule as input
and returns another rule. This behavior is used to generate more complex
matching behavior that adapts to the needs of programers
10
Matching semantics
Single matches come from the combination of a match token and a kill creator rule
10
Matching semantics
Single matches come from the combination of a match token and a kill creator rule
a b
ab→
a b b
$
10
Matching semantics
Single matches come from the combination of a match token and a kill creator rule
a b
ab→
a b b
$
(define single−match
(kill−creator
(match−token identity )))
(single−match (Cell a->b) a)
11
Matching semantics
multiple matches allow us to generate multiple match tokens for a sequence
11
Matching semantics
multiple matches allow us to generate multiple match tokens for a sequence
a
b
ab→ ab→
a
b
ab→
b
b
b
ab→
b
$$
11
Matching semantics
multiple matches allow us to generate multiple match tokens for a sequence
a
b
ab→ ab→
a
b
ab→
b
b
b
ab→
b
$$
(define multiple−match
(match−token identity))
(multiple−match (Cell a->b) a)
12
Matching semantics
One match at a time allow us to match a pattern and to create the seed again to match a new pattern
12
Matching semantics
One match at a time allow us to match a pattern and to create the seed again to match a new pattern
a b
ab→
a b b
$
ab→
12
Matching semantics
One match at a time allow us to match a pattern and to create the seed again to match a new pattern
a b
ab→
a b b
$
ab→
(define one-match-at-a-time
(add-seed
(kill−creator
(match−token identity ))))
(one-match-at-a-time (Cell a->b) a)
13
Matching semantics
Always start a match has the seed of a match to create a new seed
13
Matching semantics
Always start a match has the seed of a match to create a new seed
a
b
ab→
a
b
b
$
ab→
ab→
b
ab→
$
13
Matching semantics
Always start a match has the seed of a match to create a new seed
a
b
ab→
a
b
b
$
ab→
ab→
b
ab→
$
(define always-start-match
(always-add
(kill−creator
(apply-reaction identity ))))
(always-start-match (Cell a->b) a)
14
Matching semantics
Match per time frame enables matches exclusively within a time frame, killing all cells once the
given time elapses
a b
ab→
14
Matching semantics
Match per time frame enables matches exclusively within a time frame, killing all cells once the
given time elapses
a b
ab→
time of first
token match
14
Matching semantics
Match per time frame enables matches exclusively within a time frame, killing all cells once the
given time elapses
a b
ab→
bb
time of first
token match
14
Matching semantics
Match per time frame enables matches exclusively within a time frame, killing all cells once the
given time elapses
a b
ab→
b
(define match-per-time-frame
(trace-life-time
(kill−creator
(match-token identity ))))
(match-per-time-frame (Cell a->b) a)
b
time of first
token match
15
Matcher Cells at work
1.
Patterns can use
Regular Expressions
2.
Customized
Semantics
100,000 tweets
1 tweet every 60s
choice of common
patterns using regex
matching type
semantics, multiple
matching
15
Matcher Cells at work
1.
Patterns can use
Regular Expressions
2.
Customized
Semantics
100,000 tweets
1 tweet every 60s
choice of common
patterns using regex
a
+b→ a
+b→
a
*b→
a
matching type
semantics, multiple
matching
16
Matcher Cells at work
16
Matcher Cells at work
Identify sequences of events in the application to adapt its difficulty
- Good performance: answering N exercises correctly in a row
- Bad performance: answering M exercises wrong in a row
- No performance: skipping K exercises in a row
17
Matcher Cells proved to be useful and applicable in
data processing scenarios where different
semantics are required
18
Application areas
Stream processing
Reactive programming
Execution trace analysis
19
Conclusion
1.Matcher Cell offers a flexible pattern
matching algorithm through the definition
of language abstractions and functional
composition
19
Conclusion
1.Matcher Cell offers a flexible pattern
matching algorithm through the definition
of language abstractions and functional
composition
2.Practical solution with implementations
across programming paradigms (!paper)
19
Conclusion
1.Matcher Cell offers a flexible pattern
matching algorithm through the definition
of language abstractions and functional
composition
2.Practical solution with implementations
across programming paradigms (!paper)
3.Usable system with comparable
performance up 500000 tokens (!paper)