Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

KimHammar 9 views 47 slides Jun 27, 2024

Slide 1 of 47

About This Presentation

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formula...

Size: 6.76 MB

Language: en

Added: Jun 27, 2024

Slides: 47 pages

Slide Content

1/35
Intrusion Tolerance for Networked Systems
through Two-Level Feedback Control
IEEE DSN 2024, Brisbane, Australia
International Conference on Dependable Systems and Networks
Kim Hammar and Rolf Stadler
[email protected]@kth.se
KTH Royal Institute of Technology
June 27, 2024

2/35
Use Case: Intrusion Tolerance
::: ClientsapigatewaysCompute nodesStorage nodes
Service
replica 1
Service
replica 2
Service
replica 3
Service
replica 4 Client interface & load balancer
IAreplicated systemoers a service to a client population.
IThe system should provide.

2/35
Use Case: Intrusion Tolerance
::: AttackerClientsapigatewaysCompute nodesStorage nodes
Service
replica 1
Service
replica 2
Service
replica 3
Service
replica 4 Client interface & load balancer
IAn
IThe system shouldtolerate intrusions.

3/35
Intrusion Tolerance (Simplied)
Intrusion eventTime of full recoveryTimeRecovery timeSurvivabilityLoss
Normal
performanceSystem performanceTolerance
Cumulative
performance loss
(want to minimize)

4/35
Increasing Demand for Intrusion-Tolerant Systems
IAs ourreliance on online services grows, there is an
increasing demand for intrusion-tolerant systems.
IExample applications:
Flight control
computer
Sensors and
actuators
Power grids
e.g.,scadasystems
1
.
Safety-critical IT systems
e.g., banking systems,
e-commerce applications
2
,
healthcare systems, etc.
Real-time control systems
e.g., ight control computer
3
.
1
Amy Babay et al. 2018 48th
Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2018, pp. 255266.doi:
10.1109/DSN.2018.00036.
2
Jukka Soikkeli et al. IEEE
Transactions on Dependable and Secure Computing20.2 (2023), pp. 11541168.doi:
10.1109/TDSC.2022.3151462.
3
J.H. Wensley et al.
Proceedings of the IEEE66.10 (1978), pp. 12401255.doi:10.1109/PROC.1978.11114.

5/35
Theoretical Foundations of Intrusion Tolerance
L. LamportW. WeibullR.E. BarlowW.ShewhartL. AdlemanA. ShamirR. RivestD. DolevN. LynchE. DijkstraJ. GrayB. LiskovReliabilityTheoryDistributedSystemsCryptography
Intrusion-
tolerant
systems

6/35
Our Contribution
ReliabilityTheoryDistributedSystemsCryptographyControlanddecisiontheory
Intrusion-
tolerant
systemsThis paper

7/35
Building Blocks of An Intrusion-Tolerant System
HC;CrashedHealthyCompromisedCrashCrashRecoveryCompromise204060801000:20:40:60:8125 replicas50 replicas100 replicastReliability : : :Replicated systemClient interfaceRequestResponse Consensus protocol12345
1. Intrusion-tolerant consensus protocol
A quorum needs to reach agreement
to toleratefcompromised replicas.
2. Replication strategy
Cost-reliability trade-o.
3. Recovery strategy
Compromises will occur ast! 1.

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- No recoveries
Published 1995

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- No recoveries
Published 1998

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
-Periodicrecoveries
Published 2002

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2004

8/35
Prior Work on Intrusion-Tolerant Systems
-Adaptive replicationbased on heuristics
- Periodic recoveries
Published 2006

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2006

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
-Supports both periodic and reactive recoveries
- Does not provide reactive recovery strategies
Published 2007

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2011

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2018

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2023

8/35
Prior Work on Intrusion-Tolerant Systems
- Fixed number of replicas
- Periodic recoveries
Published 2023Can we do better by leveraging decision theory and optimal control?

9/35
ThetoleranceArchitecture
Two-levelrecoveryand replicationcontrol with feedback.
toleranceNode 1Privileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardwareNode 2Privileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardwareNodeNtPrivileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardware: : :Consensus protocolSystem controllerState estimateEvict or addState estimateEvict or addState estimateEvict or add : : :Service requestsResponsesClients AttackerIntrusion attempts

10/35
Denition 1 (Correct service)
The system providescorrect serviceif the healthy replicas satisfy
the following properties:
Each request is eventually executed: (Liveness)
Each executed request was sent by a client. (Validity)
Each replica executes the same request sequence. (Safety)

11/35
Proposition 1 (Correctness oftolerance)
A system that implements thetolerancearchitectureprovides
correct serviceif
Network links are authenticated.
At most f nodes are compromised or crashed simultaneously.
Nt2f+1.
The system is partially synchronous.

12/35
Intrusion Tolerance as a Two-Level Control Problem
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients
IThe.
IThe.

12/35
Assumption 1
The probability that the system controller fails is negligible.
Assumption 2
Compromise and crash events are statistically independent across
nodes.
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients

13/35
The Local Control Problem
IPartially observed Markov decision processi.
IController actions: (R)ecover and (W)ait. ai;t2 fR;Wg.INode states: SN=f(H)ealthy;(C)ompromised;;g.si;t2 SN.
IState transition function: f(si;tjsi;t;ai;t).IpC;i: crash probability,pA;i: intrusion probability.IObservationoi;tzi(jsi;t): e.g.,idsalerts at timet.
HC;CrashedHealthyCompromisedpC;ipC;ia
(C)
i
=Ra
(A)
i
=A

14/35
Node Controller Strategy
IThe controller computes thebelief
bi;t(s),P[Si;t=Cjht]:
ht,(bi;1;ai;1;oi;2;ai;2;oi;3; : : : ;ai;t1;oi;t):
IController strategy:
: [0;1]! fW;Rg:
ControllerBelief

15/35
Node Controller Objective
ICost:Ji,T
(R)
i
+F
(R)
i
.
IT
(R)
i
is the averagetime-to-recovery.
IF
(R)
i
is therecovery frequency.
I >1 is a scaling factor.
IBounded-time-to-recovery constraint: The time between two
recoveries can be at mostR.
1020304050607080901000:51p=0:1p=0:05p=0:025p=0:01p=0:005tFailure (crash or compromise) probability.
pis the failure probability per time-step.

16/35
Threshold Structure of the Optimal Control Strategy
0:10:20:30:40:50:60:70:80:910:40:60:81alpha vectorsE[Jijbi;1]bi;1
?
wait regionrecovery region
The controller's optimal cost function.
Theorem 2
There exists an optimal control strategy that satises

?
i;t(bi;t) =R()bi;t
?
i;t 8t;
where
?
i;t
2[0;1]is a threshold.

17/35
Ecient Computation of Optimal Recovery Strategies
Algorithm 1:Threshold Optimization
1Input:Objective functionJi, parametric optimizerpo.
2Output:An approximate optimal control strategy^i;.
3Algorithm
4 [0;1].
5 For each2, denei;(bi;t)as
6 i;(bi;t),
(
R ifbi;t
W otherwise.
7 J Ei;
[Ji].
8 ^i; po(;J).
9 return^i;.
IExamples of: cem,bo,
cma-es,de,spsa, etc.

18/35
Ecient Computation of Optimal Recovery Strategies
51015202510
0
10
1
10
2
10
3
cemdebospsadynamic programmingRTime (min)
Mean compute time to obtain an optimal recovery strategy for dierent
values of the bounded-time-to-recovery constraintR.

19/35
The Benet of Optimal Recovery Control24681012141618200:20:30:4Optimal strategyPeriodic strategyDKL(no intrusionkintrusion)Ji(4)Benet of optimal recovery
Optimal recovery control
tional cost intrusion detection model is
available.
Key insight

20/35
Intrusion Tolerance as a Two-Level Control Problem
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients

21/35
The Global Control Problem
IConstrained Markov decision process.
IStates: SS=f0;1; : : : ;smaxg, the number of healthy nodes.IController actions: Add a
(C)
t2 f0;1gnodes.
IDynamicsf:IMarkovstrategy:
:SS! f0;1g:
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients

22/35
System Controller Objective
ICost:J,limT!1
P
T
t=1
at
T
.
IConstraint: T
(A)
A, whereT
(A)
is the availability.
A Allowed service downtime per year
0:9 36 days
0:95 18 days
0:99 3 days
0:999 8 hours
0:9999 52 minutes
0:99999 5 minutes
1 0 minutes

23/35
System Reliability Analysis
ITheMean-time-to-failure(mttf) is the
of a state wherestf:
E[T
(F)
jS1=s1] =E
(St)t1
h
infft1jStfg jS1=s1
i
:
102030405060708090100100200300pi=0:05pi=0:025pi=0:01N1E[T
(F)
]
Themttfin function of the number of initial nodesN1and failure
probability per nodepi.

24/35
Theorem 3 (Optimal Control Strategy Existence)
Assuming
(A)The Markov chain induced by any control strategy is
unichain.
(B)The.
Then the following holds.
1.There.
2.The optimal strategy has a.
3.An optimal replication control strategy can be computed by
usinglinear programming.

25/35
Ecient Computation of Optimal Replication Control
Strategies
481632641282565121024204810
0
10
2
Maximum number of nodessmaxTime (s)
Mean compute time to obtain an optimal replication control strategy.

26/35
The Benet of Optimal Replication Control2004006008001;0000:51Optimal strategyN1= 10N1= 100tAvailabilityBenet of optimal replication control
Optimal replication control canguarantee a high service
availability in expectation. The
cation is mainly prominent for long-running systems.
Key insight

27/35
Summary of the Control-Theoretic Model
IIntrusion recovery control.
IPartially observed Markov decision process.
IThreshold structure
IEcient computation throughstochastic approximation.
IReplication control.
IConstrained Markov decision process.
IThreshold structure
IEcient computation throughlinear programming.
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients

28/35
Experiment Setup - Testbed

29/35
ThetoleranceArchitecture
Two-levelrecoveryand replicationcontrol with feedback.
toleranceNode 1Privileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardwareNode 2Privileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardwareNodeNtPrivileged domain Application domain
Service
replicaNode controller
ids
alerts
reco-
veryVirtualization layerHardware: : :Consensus protocolSystem controllerState estimateEvict or addState estimateEvict or addState estimateEvict or add : : :Service requestsResponsesClients AttackerIntrusion attempts
IA replicated
IAreadoperation that returns the service state.
IAwriteoperation that updates the state.

30/35
Intrusion-Tolerant Consensus Protocol (minbft)
ClientReplica 1(leader)Replica 2Replica 3requestpreparecommitreplyReplica 1(leaderv)(leaderv+1)Replica 2Replica 3crash
request
view-changeview-changenew-viewReplica 1Replica 2Replica 3checkpointControllerReplica 1(compromised)Replica 2Replica 3recover
request
statestateNew replicaReplica 1(leaderv)(leaderv+1)Replica 2Replica 3join-requestjoinnew-viewjoin-reply
System
controllerReplica 1(leaderv)(leaderv+1)Replica 2Replica 3evict-requestevictnew-viewexit-replya) Normal operationb) View changec) Checkpointd) State transfere) Joinf) Evict

31/35
Experiment Setup - Emulated Intrusions
Replica ID Intrusion steps
1 tcp synscan,ftpbrute force
2 tcp synscan,sshbrute force
3 tcp synscan,telnetbrute force
4 icmpscan, exploit ofcve-2017-7494
5 icmpscan, exploit ofcve-2014-6271
6 icmpscan, exploit ofcwe-89 ondvwa
7 icmpscan, exploit ofcve-2015-3306
8 icmpscan, exploit ofcve-2016-10033
9 icmpscan,sshbrute force, exploit ofcve-2010-0426
10 icmpscan,sshbrute force, exploit ofcve-2015-5602
Table 1:

32/35
Experiment Setup - Background Trac
Background services Replica ID(s)
ftp,ssh,mongodb,http,teamspeak1
ssh,dns,http 2
ssh,telnet,http 3
ssh,samba,ntp 4
ssh 5;7;8;10
dvwa,irc,ssh 6
teamspeak,http,ssh 9
Table 2:

33/35
Estimated Distributions of Intrusion Alerts0 2000 4000 6000 8000
bz
i
(
o
i
j
a
(A) i
)
cve-2010-0426
0 2000 4000 6000 8000
cve-2015-3306
0 2000 4000 6000 8000
bz
i
(
o
i
j
a
(A) i
)
cve-2015-5602
0 2000 4000 6000 8000
cve-2016-10033
0 2000 4000 6000 8000
bz
i
(
o
i
j
a
(A) i
)
cwe-89
0 2000 4000 6000 8000
cve-2017-7494
0 2000 4000 6000 8000
oi2 O
bz
i
(
o
i
j
a
(A) i
)
cve-2014-6271
0 5000 10000 15000 20000
oi2 O
ftp,ssh,telnetbrute force
attack (a
(A)
i
=A) false alarms (a
(A)
i
=F)
IWe estimate the observation distributionzwith the
empirical distribution
b
Zbased onMsamples.
Ibz!
a.s
zasM! 1(Glivenko-Cantelli theorem).

34/35
Comparison with State-of-the-art Intrusion-Tolerant
Systems
00:5110
1
10
2
00:10:200:5110
1
10
2
00:10:200:511525110
1
10
2
toleranceno-recoveryperiodicperiodic-adaptive1525100:10:215251Maximum time-to-recoveryRMaximum time-to-recoveryRMaximum time-to-recoveryRAverage availabilityT
(A)
Average time-to-recoveryT
(R)
Average recovery frequencyF
(R)
N1=3N1=6N1=9
Comparison between our optimal control strategies and the baselines;
x-axes indicate values ofR; rows relate to the number of initial nodes
N1.

35/35
Conclusion
IWe present acontrol-theoretic model of intrusion
tolerance.
IWe establish.
IWe.
IOur control-theoretic strategies have
guarantees and signicantly better practical performance
the heuristic control strategies used in state-of-the-art
intrusion-tolerant systems.
: : :1(b1)2(b2)3(b3)4(b4)Nt
(bNt
)
Belief
transmissionsNode controllers
Replicated
systemSystem controller(b1; : : : ;bNt
)b1b2b3b4bNt
: : :AttackerClients

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Earthquakes_Type of Faults_Science G8.pptx

Quiz #1 Science 10 in the first quarter for jhs

Astronomy history from long ago till doday

Great history of astronomy from long ago till today

EARTHQUAKE-DRILL.powerpoint.............

History of astronomy from old times to the present times