Automated Intrusion Response - CDIS Spring Conference 2024

KimHammar 14 views 65 slides May 23, 2024
Slide 1
Slide 1 of 65
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65

About This Presentation

Presentation at CDIS Spring Conference 2024.

The ubiquity and evolving nature of cyber attacks is of growing concern to industry and society. In response, the automation of security processes and functions is the focus of many current research efforts. In this talk we will present a framework for ...


Slide Content

0/40
Automated Intrusion Response
CDIS Spring Conference
Kim Hammar
[email protected]
Division of Network and Systems Engineering
KTH Royal Institute of Technology
May 22, 2024

1/40
Use Case: Intrusion Response
IAdefenderowns an infrastructure
IConsists of connected components
IComponents run network services
IDefender
by monitoring and active defense
IHas partial observability
IAnattackerseeks to intrude on the
infrastructure
IHas a partial view of the
infrastructure
IWants to compromise specic
components
IAttacks by reconnaissance,
exploitation and pivoting
Attacker Clients::: Defender 1IPS 1alertsGateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

2/40
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.1980s1990s2000s-NowResearch
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual conguration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.

2/40
Automated Intrusion Response
Levels of security automation
No automation.
Manual detection.
Manual prevention.
Lack of tools.1980s1990s2000s-NowResearch
Operator assistance.
Audit logs
Manual detection.
Manual prevention.
Partial automation.
Manual conguration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.Can we nd eective security strategies through decision-theoretic methods?

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
OptimizationModel estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
OptimizationModel estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
OptimizationStrategy evaluation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

3/40
Our Framework for Automated Intrusion Response
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Optimization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

4/40
Creating a Digital Twin of the Target Infrastructure
Conguration Spacei***172.18.4.0/24172.18.19.0/24172.18.61.0/24Digital Twins R1 R1 R1
IGiven aninfrastructure conguration, our framework
automates the creation of a digital twin.
IThe
that we can emulate.

5/40
Example Infrastructure Conguration
I64nodes
I24ovsswitches
I3 gateways
I6 honeypots
I8 application servers
I4 administration servers
I15 compute servers
I11vulnerabilities
Icve-2010-0426
Icve-2015-3306
Ietc.
IManagement
I1sdncontroller
I1 Kafka server
I1 elastic server
r&d zone App serversHoneynetdmz
admin
zone workow Gatewayidps
quarantine
zonealerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

6/40
Emulating Physical Components
ContainersPhysical serverOperating systemDocker engineOur framework
IWe emulate physical components withDocker containers
IFocus on
IOur framework provides the

7/40
Emulating Network Connectivity
Management node 1 Emulated IT infrastructureManagement node 2 Emulated IT infrastructureManagement node n Emulated IT infrastructureVXLANVXLAN: : :VXLAN IP network
IWe emulate network connectivity on the same host using
network namespaces
IConnectivity across physical hosts is achieved usingVXLAN
tunnelswith Docker swarm

8/40
Emulating Network Conditions
ITrac shaping using
IAllows to congure:
IDelay
ICapacity
IPacket Loss
IJitter
IQueueing delays
Ietc.
User space: : :Application processesKernelTCP/UDPIP/Ethernet/802.11
OS
TCP/IP
stack
Queueing
discipline
Device driver
queue (FIFO)NIC
Netem cong:
latency,
jitter, etc.Sockets

9/40
Emulating Clients
Client population: : :Arrival rateDepartureService time: : :
.
.
.
.
.
.
.
.
.w1w2w
jWjWorkows (Markov processes)
IHomogeneous client population
IClients Po()
IClient Exp()
IService (St)t=1;2;:::mc

10/40
Emulating The Attacker and The Defender
IAPI for automated
defender and attacker
actions
IAttacker actions:
IExploits
IReconnaissance
IPivoting
Ietc.
IDefender actions:
IShut downs
IRedirect
IIsolate
IRecover
IMigrate
Ietc.
Markov Decision Processs1;1s1;2s1;3:::s1;4s2;1s2;2s2;3:::s2;4Digital Twin : : :
Virtual
network
Virtual
devices
Emulated
services
Emulated
actorsIT Infrastructure
Conguration
& change eventsSystem tracesVeried security policyOptimized security policy

11/40
Software framework
Metastore Python librariesManagementapirest apicligrpcManagementapiDigital twins
IMore details about the software framework
ISource code:https://github.com/Limmen/csle
IDocumentation:http://limmen.dev/csle/
IDemo:https://www.youtube.com/watch?v=iE2KPmtIs2A
IInstallation:
https://www.youtube.com/watch?v=l_g3sRJwwhc

12/40
System Identication
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Digital Twin
Target
Infrastructure
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
ImplementationSimulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems

13/40
System Model
HC;CrashedHealthyCompromised
Model
complexity
Static attacker
Small set of responses
Dynamic attacker
Small set of responses
Dynamic attacker
Large set of responses
IIntrusion response can bemodeled in many ways
IAs aparametric optimization problem
IAs anoptimal stopping problem
IAs adynamic program
IAs agame
Ietc.

14/40
Related Work on Learning Automated Intrusion Response
External validity
model
complexityGoal
Georgia et al. 2000.
(Next generation
intrusion detection:
reinforcement learning)
Xu et al. 2005.
(An RL approach to
host-based
intrusion detection)
Servin et al. 2008.
(Multi-agent RL for
intrusion detection)
Malialis et al. 2013.
(Decentralized
RL response to
DDoS attacks)
Zhu et al. 2019.
(Adaptive
Honeypot engagement)
Apruzzese et al. 2020.
(Deep RL to
evade botnets)
Xiao et al. 2021.
(RL approach to APT)etc. 2022-2023Our work 2020-2023

15/40
Intrusion Response through Optimal Stopping
ISuppose
IThe
IWe only have, e.g., block the gateway
IFormulate intrusion response asoptimal stopping
Intrusion eventtime-stept=1Intrusion ongoingtt=TEarly stopping times
Stopping times that
aect the intrusionEpisode

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLogins

16/40
Intrusion Response from the Defender's Perspective
204060801001201401601802000:51tb2040608010012014016018020050100150200tAlerts2040608010012014016018020051015tLoginsWhen to take a defensive action?

17/40
The Defender's Optimal Stopping Problem (1/3)
IInfrastructure is adiscrete-time dynamical system(st)
T
t=1
IDefender observes anoisy observation process(ot)
T
t=1
ITwo options at each timet:C)ontinue S)stop
IFind theoptimal stopping time
?
:

?
2arg max

E
"
1
X
t=1

t1
R
C
stst+1
+
1
R
S
ss
#
whereR
S
ss
0&R
C
ss
0are the stop/continue rewards andis
= infft:t>0;at=Sg
tot

18/40
The Defender's Optimal Stopping Problem (2/3)
IObjective:stop the attack as soon as possible
ILet thestate spacebeS=fH;C;;g
HC;StoppedHealthyCompromised

19/40
The Defender's Optimal Stopping Problem (3/3)
ILet theobservation process(ot)
T
t=1
representidsalerts0 2000400060008000
probability
cve-2010-0426
0 2000400060008000
cve-2015-3306
0 2000400060008000
cve-2015-5602
0 2000400060008000
cve-2016-10033
0 2000400060008000
cwe-89
0 2000400060008000
O
probability
cve-2017-7494
0 2000400060008000
O
cve-2014-6271
0 5000100001500020000
O
ftpbrute force
0 5000100001500020000
O
sshbrute force
0 5000100001500020000
O
telnetbrute force
intrusion no intrusion intrusion model normal operation model
IEstimate the observation distributionbased onMsamples
from the twin
IE.g., compute
b
Zas estimate ofZ
Ib
Z!
a.s
ZasM! 1(Glivenko-Cantelli theorem)

20/40
Optimal Stopping Strategy
IThe defender can compute thebelief
bt,P[St=Cjb1;o1;o2; : : :ot]
IStopping strategy:
(b) : [0;1]! fS;Cg
DefenderBelief

20/40
Optimal Threshold Strategy
Theorem
There exists an optimal defender strategy of the form:

?
(b) =S()b
?

?
2[0;1]
i.e., the stopping set isS= [
?
;1]
b01belief spaceB= [0;1]S1
?
1

21/40
OptimalMultipleStopping
ISuppose the defender can takeL1response actions
IFind theoptimal stopping times
?
L
;
?
L1
; : : : ;
?
1
:
(
?
l)l=1;:::;L2arg max
1;:::;L
E1;:::;L
"
L1
X
t=1

t1
R
C
stst+1
+
L1
R
S
s
L
s
L
+
L11
X
t=L+1

t1
R
C
stst+1
+
L11
R
S
s
L1
s
L2
+: : :+
11
X
t=2+1

t1
R
C
stst+1
+
11
R
S
s
1
s
1
#
whereldenotes the stopping time withlstops remaining.
tot
?
L

?
L1

?
L2
: : :

22/40
OptimalMulti-Threshold Strategy
Theorem
IStopping sets are nestedSl1Slfor l=2; : : :L.
IIf(ot)t1is totally positive of order 2 (TP2), there exists an
optimal defender strategy of the form:

?
l(b) =S()b
?
l; l=1; : : : ;L
where
?
l
2[0;1]is decreasing in l.
b01belief spaceB= [0;1]S1S2
.
.
.SL
?
1

?
2

?
L
: : :

23/40
Optimal StoppingGame
ISuppose the attacker is decides when to start
and abortits intrusion.
AttackerDefendert=1t=T1;11;21;32;1tStoppedGame episodeIntrusion
IFind theoptimal stopping times
maximize
D;1;:::;D;L
minimize
A;1;A;2
E[J]
whereJis the defender's objective.

24/40
Best-Response Multi-Threshold Strategies (1/2)
Theorem
IThe
~D;l(b) =S()b~l; l=1; : : : ;L
IThe
~A;l(b) =C()~D;l(Sjb)~H;l;l=1; : : : ;L;s=H
~A;l(b) =S()~D;l(Sjb)
~
C;l;l=1; : : : ;L;s=C

25/40
Best-Response Multi-Threshold Strategies (2/2)
bDefender01S
(D)
1;2;l
S
(D)
2;2;l
.
.
.S
(D)
L;2;l
~1~2~L: : :bAttacker01S
(A)
C;1;D;l
S
(A)
C;L;D;l
~C;1
~C;L
~H;1: : :~H;L: : :S
(A)
H;1;D;l
S
(A)
H;L;D;l

26/40
Ecient Computation of Best Responses
Algorithm 1:Threshold Optimization
1Input:Objective functionJ, number of thresholdsL,
parametric optimizerPO
2Output:A approximate best response strategy^
3Algorithm
4 [0;1]
L
5 For each2, dene(bt)as
6 (bt),
(
Sifbti
Cotherwise
7 J E
[J]
8 ^ PO(;J)
9 return^
IExamples of: CEM, BO,
CMA-ES, DE, SPSA, etc.

27/40
Threshold-Fictitious Play to Approximate an Equilibrium
~A2BA(D)A D~D2BD(A) ~
0
A
2BA(
0
D
)
0
A

0
D
~
0
D
2BD(
0
A
): : :
?
A
2BA(
?
D
)
?
D
2BD(
?
A
)
Fictitious play: iterative averaging of best responses.
ILearn best responsestrategies iteratively
IAverage best responses toapproximate the equilibrium

28/40
Comparison against State-of-the-art Algorithms0 10 20 30 40 50 60
training time (min)
0
50
100
Reward per episode against Novice
0 10 20 30 40 50 60
training time (min)
Reward per episode against Experienced
0 10 20 30 40 50 60
training time (min)
Reward per episode against Expert
PPO ThresholdSPSA Shiryaev's Algorithm (= 0:75) HSVI upper bound

29/40
Learning Curves in Simulation and Digital Twin0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0:0
0:5
1:0
P[intrusion interrupted]
0:0
0:5
1:0
1:5
P[early stopping]
5
10
Duration of intrusion
50
0
50
100
experienced 0
50
100
150
0:0
0:5
1:0
0:0
0:5
1:0
1:5
0
5
10
0 20 40 60
training time (min)
50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0:0
0:5
1:0
0 20 40 60
training time (min)
0:0
0:5
1:0
1:5
2:0
0 20 40 60
training time (min)
0
5
10
15
20
;lsimulation ;lemulation (x+ y)1 baseline Snort IPS upper bound 0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
5:0
2:5
0:0
2:5
5:0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(1;l; 2;l) emulation (1;l; 2;l) simulation Snort IPS ot1 upper bound

29/40
Learning Curves in Simulation and Digital Twin0
50
100
Novice
Reward per episode
0
50
100
Episode length (steps)
0:0
0:5
1:0
P[intrusion interrupted]
0:0
0:5
1:0
1:5
P[early stopping]
5
10
Duration of intrusion
50
0
50
100
experienced 0
50
100
150
0:0
0:5
1:0
0:0
0:5
1:0
1:5
0
5
10
0 20 40 60
training time (min)
50
0
50
100
expert
0 20 40 60
training time (min)
0
50
100
150
0 20 40 60
training time (min)
0:0
0:5
1:0
0 20 40 60
training time (min)
0:0
0:5
1:0
1:5
2:0
0 20 40 60
training time (min)
0
5
10
15
20
;lsimulation ;lemulation (x+ y)1 baseline Snort IPS upper bound 0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
5:0
2:5
0:0
2:5
5:0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(1;l; 2;l) emulation (1;l; 2;l) simulation Snort IPS ot1 upper bound
Stopping is abouttiming; now we consider

30/40
GeneralIntrusion Response Game
ISuppose the defender and the attacker
can takeLactionsper node
IG=hfgwg [ V;Ei:
representing the virtual infrastructure
IV: set of
IE: set of
IZ: set of
r&d zone App serversHoneynetdmz
admin
zone workow Gatewayidps
quarantine
zonealerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

30/40
GeneralIntrusion Response Game
ISuppose the defender and the attacker
can takeLactionsper node
IG=hfgwg [ V;Ei:
representing the virtual infrastructure
IV: set of
IE: set of
IZ: set of
r&d zone App serversHoneynetdmz
admin
zone workow Gatewayidps
quarantine
zonealerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

31/40
State Space
IEachi2 Vhas a state
vi;t= (v
(Z)
t;i
|{z}
D
;v
(I)
t;i
;v
(R)
t;i
|{z}
A
)
ISystem statest= (vt;i)i2VSt
IMarkovian time-homogeneous
dynamics:
st+1f( jSt;At)
At= (A
(A)
t;A
(D)
t)are the actions.
s1s2s3s4s5s4
.
.
.
.
.
.
.
.
.

31/40
State Space
IEachi2 Vhas a state
vi;t= (v
(Z)
t;i
|{z}
D
;v
(I)
t;i
;v
(R)
t;i
|{z}
A
)
ISystem statest= (vt;i)i2VSt
IMarkovian time-homogeneous
dynamics:
st+1f( jSt;At)
At= (A
(A)
t;A
(D)
t)are the actions.
s1s2s3s4s5s4
.
.
.
.
.
.
.
.
.

31/40
State Space
IEachi2 Vhas a state
vi;t= (v
(Z)
t;i
|{z}
D
;v
(I)
t;i
;v
(R)
t;i
|{z}
A
)
ISystem statest= (vt;i)i2VSt
IMarkovian time-homogeneous
dynamics:
st+1f( jSt;At)
At= (A
(A)
t;A
(D)
t)are the actions.
s1s2s3s4s5s4
.
.
.
.
.
.
.
.
.

32/40
Observations
Iidpss inspect network trac and
generate alert vectors:
ot,

ot;1; : : : ;o
t;jVj

2N
jVj
0
ot;iis the number of alerts related to
nodei2 Vat time-stept.
Iot= (ot;1; : : : ;o
t;jVj)is a realization
of the random vectorOtwith joint
distributionZ
idpsidps idps idpsalerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

32/40
Observations
Iidpss inspect network trac and
generate alert vectors:
ot,

ot;1; : : : ;o
t;jVj

2N
jVj
0
ot;iis the number of alerts related to
nodei2 Vat time-stept.
Iot= (ot;1; : : : ;o
t;jVj)is a realization
of the random vectorOtwith joint
distributionZ
idpsidps idps idpsalerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

33/40probability
ZO1
ZO2
ZO3
ZO4
ZO5
ZO6
ZO7
ZO8
probability
ZO9
ZO10
ZO11
ZO12
ZO13
ZO14
ZO15
ZO16
probability
ZO17
ZO18
ZO19
ZO20
ZO21
ZO22
ZO23
ZO24
probability
ZO25
ZO26
ZO27
ZO28
ZO29
ZO30
ZO31
ZO32
probability
ZO33
ZO34
ZO35
ZO36
ZO37
ZO38
ZO39
ZO40
probability
ZO41
ZO42
ZO43
ZO44
ZO45
ZO46
ZO47
ZO48
probability
ZO49
ZO50
ZO51
ZO52
ZO53
ZO54
ZO55
ZO56
250500750
O
probability
ZO57
250500750
O
ZO58
250500750
O
ZO59
250500750
O
ZO60
250500750
O
ZO61
250500750
O
ZO62
250500750
O
ZO63
250500750
O
ZO64
Distributions of # alerts weighted by priorityZOi
(OijS
(D)
i
;A
(A)
i
) per nodei2 V
no intrusion intrusion

34/40
The (General) Intrusion Response Problem
maximizeD2D
minimizeA2A
E
(D;A)[J]
E
(D;A)denotes the expectation of the random vectors
(St;Ot;At)
t2f1;:::;Tgwhen following the strategy prole(D; A)

35/40
The Curse of Dimensionality
ISolving the game is computationally intractable. The state,
action, and observation spaces of the gamegrow
exponentiallywithjVj.
1234510
4
10
5
2
10
5
jSjjOjjAijjVj
Growth ofjSj,jOj, andjAijin function of the number of nodesjVj

35/40
The Curse of Dimensionality
IWhile (1) has a value (Thm
1)),
observation spaces of the gamegrow exponentiallywithjVj.
1234510
4
10
5
2
10
5
jSjjOjjAijjVj
Growth ofjSj,jOj, andjAijin function of the number of nodesjVj
We tackle the scability challenge withdecomposition

36/40
Intuitively..
r&d zone App serversHoneynetdmz
admin
zone workow Gatewayidps
quarantine
zonealerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626365
The optimal
action here...
Does not directly
depend on the state or
action of a node
down here

36/40
Intuitively..
r&d zone App serversHoneynetdmz
admin
zone workow Gatewayidps
quarantine
zonealerts Defender::: AttackerClients21312456789101113141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626365
The optimal
action here...
But they are
not completely
independent either.
How can we
exploit this
structure?
Does not directly
depend on the state
or action of a node
down here

37/40
Scalable Learning through Decomposition12345678910246810linearmeasured# parallel processesnjVj= 10SpeedupSn
Speedup of best response computationfor the decomposed game;Tn
denotes the completion time withnprocesses; the speedup is calculated
asSn=
T1
Tn
; the error bars indicate standard deviations from 3
measurements.

38/40
Learning Equilibrium Strategies0 20 40 60 80 100
running time (h)
0
5
b
= 0:4
Approximate exploitability
b

0 20 40 60 80 100
running time (h)
0:0
0:5
1:0
Defender utility per episode
dfspsimulation dfspdigital twinupper bound oi;t>0 random defense
Learning curves obtained during training ofdfspto nd optimal
(equilibrium) strategies in the intrusion response game;red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.

39/40
Comparison with NFSP0 10 20 30 40 50 60 70 80
running time (h)
0:0
2:5
5:0
7:5
Approximate exploitability
dfsp nfsp
Learning curves obtained during training ofdfspandnfspto nd
optimal (equilibrium) strategies in the intrusion response game;the red
curve relate to dfspand the purple curve relate tonfsp; all curves
show simulation results.

40/40
Conclusions
IWe develop aframeworkto
automatically learn
IWe apply the framework to an
intrusion response use case.
IWe derive properties of
security strategies.
IWe evaluate strategies on adigital
twin.
IQuestions!demonstration
s1;1s1;2s1;3:::s1;ns2;1s2;2s2;3:::s2;n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation
Target
System
Model Creation&
System IdenticationStrategy Mapping

Selective
Replication
Strategy
Implementation
Simulation &
Learning