Adaptive Security Policies via Belief Aggregation and Rollout

KimHammar 17 views 61 slides Mar 03, 2025
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

Control problems in network security are characterized by large combinatorial state spaces, non-stationarity, and partial observability. We present a control framework for such problems. It comprises two main components: offline policy approximation through belief aggregation and online policy impro...


Slide Content

Adaptive Security Policies via Belief Aggregation and Rollout
netcon seminar
Kim Hammar,[email protected]
March 3, 2025
Joint work with Yuchao Li, Tansu Alpcan, Emil Lupu, and Dimitri Bertsekas
Based on work-in-progress papers?:
Adaptive Security Policies via Belief Aggregation and Rollout(K.H, Y.L, T.A, E.L)
Feature-Based Belief Aggregation for POMDPs(Y.L, K.H, D.B)
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Problem
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Problem
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Problem
measurementssecurity controlsµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Problem
measurements zksecurity controls ukµstatexk
Finite partially observed Markovian decision problem (pomdp).
Hidden statesi∈X={1, . . . ,n}, transition probabilitypij(u).
Observationz∈Zis generated with probabilityp(z|j,u).
Controlu∈U.
Goal: find a policyµthat minimizes the discounted cost
E
(

X
k=0
α
k
g(xk,uk,xk+1)
)
.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Cost
Operational costAttack eventTimeRecovery timeResponse timeSurvivabilityToleranceLossService/security
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Challenge 1: Curse of Dimensionality
Problem complexitygrows exponentiallywith system size.
246810121410
1
5·10
3
10
4
control space size|U|state space size|X|observation space size|Z|K: # system components
Benchmarkpomdp(cage-2) has over 10
47
states and 10
25
observations.
Scalability challenge
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Challenge 2: Changing Dynamics
Networked systems.
▶Components fail, bandwidth fluctuates, load patterns shift, software is updated, etc.
204060801001201401601802002202402040System loadAverage loadAverage load non-periodictime
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Challenge 2: Changing Dynamics
Networked systems.
▶Components fail, bandwidth fluctuates, load patterns shift, software is updated, etc.
204060801001201401601802002202402040System loadAverage loadAverage load non-periodictime
Need an efficient way to adapt the security policyµwhen changes occur.
Policy adaption
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Our Approximation Framework for Large-Scale POMDPs
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Our Approximation Framework for Large-Scale POMDPs
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Our Approximation Framework for Large-Scale POMDPs
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Achieves cage-2benchmark.
▶pomdpwith over 10
47
states and 10
25
observations.
Has theoreticalperformance guarantees.
▶Contrasts with other approximation frameworks, e.g.,deep rlandllms.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Belief Space Formulation of the POMDP
b= (b(1), . . . ,b(n))is a probability distribution over the state space.
Belief state
Letbbe the state, we then obtain dynamics
bk=F(bk−1,uk−1,zk) (Belief estimator)
ˆg(b,u) =
n
X
i=1
b(i)
n
X
j=1
g(i,u,j) (Cost)
ˆp(z|b,u) =
n
X
i=1
b(i)
n
X
j=1
pij(u)p(z|j,u) (Disturbance distribution).
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Belief Space Formulation of the POMDP
b= (b(1), . . . ,b(n))∈Bis a probability distribution over the state space.
Belief state
Letbbe the state, we then obtain dynamics
bk=F(bk−1,uk−1,zk) (Belief estimator)
ˆg(b,u) =
n
X
i=1
b(i)
n
X
j=1
g(i,u,j) (Cost)
ˆp(z|b,u) =
n
X
i=1
b(i)
n
X
j=1
pij(u)p(z|j,u) (Disturbance distribution).
Computing an optimal policy ispspace-hard.
Challenges
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Belief Space Formulation of the POMDP
b= (b(1), . . . ,b(n))∈Bis a probability distribution over the state space.
Belief state
Letbbe the state, we then obtain dynamics
bk=F(bk−1,uk−1,zk) (Belief estimator)
ˆg(b,u) =
n
X
i=1
b(i)
n
X
j=1
g(i,u,j) (Cost)
ˆp(z|b,u) =
n
X
i=1
b(i)
n
X
j=1
pij(u)p(z|j,u) (Disturbance distribution).
Computing an optimal policy ispspace-hard.
Enumerating thedimensionofBis intractable |X| ≥10
47
incage-2).
Challenges
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Offline POMDP Approximation via Problem Simplification
Original problemSimplified problemSimplified solution rNQppbKkPPP Aggregation
Dynamic
programmingApproximation via interpolation
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Two-Level Aggregation for Offline Approximation
We usetwo-level aggregationto simplify thepomdpinto an mdpwith finite
state space, which we solve using.
Aggregate feature belief space˜QFeature spaceFState spaceX˜qxi
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (1/3)
Two main options to construct thefeature spaceF:
▶It can be manually designed based on
▶It can be automatically constructed via a
Feature spaceF
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (2/3)
Eachfeature statex∈ Fis associated with a Ix⊂X.
Feature spaceFState spaceXxiIx
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (2/3)
Eachfeature statex∈ Fis associated with a Ix⊂X.
Feature spaceFState spaceX
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (2/3)
State spaceXϕjyfeature statestateyjipij(u)
State aggregation
For each statej∈X, we associate
anaggregation probabilitydistribution{ϕjy|y∈ F}, whereϕjy=1 for allj∈Iy.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (2/3)
Feature spaceFState spaceXxijdxidxjIx
State aggregation
For each statej∈X, we associate
anaggregation probabilitydistribution{ϕjy|y∈ F}, whereϕjy=1 for allj∈Iy.
Feature disaggregation
For everyfeature state x∈ F, we associate
adisaggregation probabilitydistribution{dxi|i∈X}, wheredxi=0 for alli̸∈Ix.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (2/3)
State spaceXϕjyfeature statestatekdykyjipij(u)
State aggregation
For each statej∈X, we associate
anaggregation probabilitydistribution{ϕjy|y∈ F}, whereϕjy=1 for allj∈Iy.
Feature disaggregation
For everyfeature state x∈ F, we associate
adisaggregation probabilitydistribution{dxi|i∈X}, wheredxi=0 for alli̸∈Ix.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
We obtain thedynamic system:
ijxyFeature statesFOriginal statesXdxiϕjy
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
▶We obtain the desired: |F |<<|X|.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

State Aggregation (3/3)
ijkly0y1y2pij(u0),g(i,u0,j)pkl(u1),g(k,u1,l)dy0idy1kϕly2
ϕjy1
=⇒Well-defined aggregate problemwith F.
How can we lift this belief space?
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Feature Belief Aggregation (1/3)
Belief spaceBFeature belief spaceQ
Aggregate feature
beliefs˜Q⊂Q
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Feature Belief Aggregation (2/3)
For eachfeature beliefq∈Q, we associate abelief aggregation probabilitydistribution
{ψq˜q|˜q∈
˜
Q}, whereψ˜q˜q=1.
Feature belief spaceQqψq˜q˜q
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Feature Belief Aggregation (2/3)
Example (nearest-neighbor aggregation):
ψq˜q=1⇐⇒˜q= arg min
˜q∈˜Q
∥q−˜q∥.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Feature Belief Aggregation (3/3)
Dynamic belief system:
b(i) =
X
x∈F
˜q(x)dxi for alli∈X(b←q,˜q)
q(y) =
n
X
j=1
b(j)ϕjy for ally∈ F(b→q)
˜q∼ψq˜q (q→˜q). (1c)
ijxyFeature statesFOriginal statesXdxiϕjybb

˜q˜q

Aggregate beliefs˜QOriginal beliefsBEq. (1a)Eq. (1b) and Eq. (1c)
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

POMDP Approximation
LetV

be the.
We obtain a costapproximation of the original POMDPby
˜
J(b) =
X
˜q∈˜Q
ψΦ(b)˜qV

(˜q), (interpolation formula) ,
whereΦ :B7→Qis defined as
Φ(b)(y) =
n
X
j=1
b(j)ϕjy for ally∈ F.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

POMDP Approximation
LetV

be the.
We obtain a costapproximation of the original POMDPby
˜J(b) =
X
˜q∈˜Q
ψΦ(b)˜qV

(˜q), (interpolation formula) ,
whereΦ :B7→Qis defined as
Φ(b)(y) =
n
X
j=1
b(j)ϕjy for ally∈ F.
Similarly, abase policyµfor thepomdpcan be obtained as
µ(b)∈arg min
u
Eb

˘
ˆg(b,u) +α˜J(b

)

for allb∈B.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

POMDP Approximation
LetV

be the.
We obtain a costapproximation of the original POMDPby
˜J(b) =
X
˜q∈
˜
Q
ψΦ(b)˜qV

(˜q), (interpolation formula) .
B(˜q)B
˜
Q˜qψϕ(b)˜q
ϵ=maximum variation
ofJ

(b)withinB(˜q)
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

POMDP Approximation
LetV

be the.
We obtain a costapproximation of the original POMDPby
˜
J(b) =
X
˜q∈
˜
Q
ψΦ(b)˜qV

(˜q), (interpolation formula) .
B(˜q)B˜Q˜qψϕ(b)˜q
ϵ=maximum variation
ofJ

(b)withinB(˜q)
Proposition (Approximation error bound)
Under hard aggregation, the
˜
J is bounded as
|˜J(b)−J

(b)| ≤
ϵ
1−α
∀b∈B(˜q),˜q∈˜Q.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Experimental Illustration of the Error Bound
0.5168J

(b)˜J(b) (|˜Q|=100)˜J(b) (|˜Q|=5)b(1)
Comparison between the optimal cost-to-goJ

of thepomdpand the approximate cost-to-go˜J
for varying|˜Q|. The numerical results are based on an examplepomdpwith|X|=2,F=X,˜Q
defined via grid points, andψbased on the nearest-neighbor mapping.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

The Big Picture
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

The Big Picture
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

The Big Picture
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Particle Filtering
Exact computation of the beliefbhas O(|X|
2
), which isintractable
for realistic systems. (Incage-2,|X| ≥10
47
.)
Challenge
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Particle Filtering
Exact computation of the beliefbhas O(|X|
2
), which isintractable
for realistic systems. (Incage-2,|X| ≥10
47
.)
Challenge
To manage this complexity, we use aparticle filterto estimatebas
bbk(xk) =
1
M
M
X
i=1
1
x
k=bx
(i)
k
, (2)
where{bx
1
k, . . . ,bx
M
k}are p(zk|bx
i
k,uk−1).
particleprobability
The complexity of Eq. (2)can be adjusted to available compute resourcesby tuning
M. Strong law of large numbers implies,
lim
M→∞
bb=b.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
u1,ku2,kˆg(bk+1,u1,k)bkbk+1bk+2
2-step lookahead optimization with
rollout and terminal cost approximation˜J(bk+2+m)Terminal cost approximation
Rollout with
base policyµarg min
u
i,k
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout and Lookahead Optimization for Online Policy Adaption
Lookahead optimization
We transform the base policy to anadaptedrollout policy˜µviaℓ-step lookahead
˜µ(bk)∈ arg min
u
k,µ
k+1,...,µ
k+ℓ−1
Ez
k+1,...,z
k+ℓ
ȷ
ˆg(bk,uk) +
t+ℓ−1
X
j=k+1
α
j−k
ˆg(bj, µj(bj)) +α
ℓ˜
Jµ(bk+ℓ)
ff
.
Rollout
The cost-to-go in the lookahead minimization isestimated viam-step rolloutwith the
base policyµandterminal cost approximation
˜
J:
˜Jµ(bk)
1
L
L
X
j=1
k+m
X
l=k
α
l−k
ˆg(b
j
l
, µ(b
j
l
)) +α
m−k˜
J(b
j
k+m
).
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout Policy Improvement
Proposition (Bertsekas, 2019)
1
If the rollout policy evaluation is exact, i.e., if˜Jµ=Jµ, then the rollout policy is
guaranteed to improve the base policy.
2
Thesub-optimality of the rollout policy˜µis boundedas
∥J˜µ−J

∥ ≤


1−α

˜
Jµ−J

∥.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Rollout Policy Improvement
Proposition (Bertsekas, 2019)
1
If the rollout policy evaluation is exact, i.e., if
˜
Jµ=Jµ, then the rollout policy is
guaranteed to improve the base policy.
2
Thesub-optimality of the rollout policy˜µis boundedas
∥J˜µ−J

∥ ≤


1−α

˜
Jµ−J

∥.
5101520253010
0.5
10
1
10
1.5
Base policyµRollout policy˜µCumulative costJ(x0),N=m=10, ℓ=1Number of aggregate beliefs|˜Q|
8.5x
reduction
Performance of rollout for an examplepomdp.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Framework Summary
networked system
System
metrics
zkStatexkAdapted security policyControlukBeliefbkBase policyµ, cost˜Jrolloutparticle filter˜µ
(1) Evaluation of the base
policy through rollouts
(2) Policy adaption through
lookahead optimizationParticleProbability Belief spaceAggregate belief spaceOptimizationofflineonline
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Experimental Evaluation Against the CAGE-2 Benchmark
Standard benchmark CAGE-2.
▶Problem: find an effective security policy to network intrusions.
▶pomdpwith over 10
47
states and 10
25
observations.
▶Leaderboardwith more than 35 different methods.
Current state-of-the-art: deep reinforcement learning (variants of ppo).
µθ(u5|z)µθ(u4|z)µθ(u3|z)µθ(u2|z)µθ(u1|z)Jθ(z)ObservationzSecurity controlsParameterized security policyµθ
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Instantiation of Our Framework for CAGE-2
We define˜Qbased on intuition, where|˜Q|=427500.
We defineψbased on the
We solve the aggregate problem usingvalue iteration.
▶This gives us the base policyµand cost
˜
J.
We useℓ=2lookahead stepsandM=50.
enterprise zoneoperational zoneuser zone observationzkcontrolukµSecurity policy AttackerClients21323411234
Thecage-2system.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Experimental Results (1/2)
Method Offline/Online compute (min/s)State estimation Cost
µ 8.5/0.01 particle filter 15.19±0.82
ppo 1000/0.01 latest observation 280±114
ppo 1000/0.01 particle filter 119±58
ppg 1000/0.01 latest observation 338±147
ppg 1000/0.01 particle filter 299±108
dqn 1000/0.01 latest observation 479±267
dqn 1000/0.01 particle filter 462±244
cardiff 300/0.01 latest observation13.69±0.53
cardiff 300/0.01 particle filter 13.31±0.87
pomcp 0/15 particle filter 30.88±1.41
pomcp 0/30 particle filter 29.51±2.00
ours (m=0) 8.5/0.95 particle filter 13.24±0.57
ours (m=10) 8.5/8.29 particle filter 13.23±0.62
ours (m=20) 8.5/14.80 particle filter 13.23±0.57
Numbers indicate the mean and the standard deviation from 1000 evaluations.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Experimental Results (2/2)
Method Offline/Online compute (min/s)State estimation Cost
µ 8.5/0.01 particle filter 61.72±3.96
ppo 1000/0.01 latest observation 341±133
ppo 1000/0.01 particle filter 326±116
ppg 1000/0.01 latest observation 328±178
ppg 1000/0.01 particle filter 312±163
dqn 1000/0.01 latest observation 516±291
dqn 1000/0.01 particle filter 492±204
cardiff 300/0.01 latest observation 57.45±2.44
cardiff 300/0.01 particle filter 56.45±2.81
pomcp 0/15 particle filter 53.08±3.78
pomcp 0/30 particle filter 53.18±3.42
ours (m=0) 8.5/0.95 particle filter 51.87±1.42
ours (m=10) 8.5/8.29 particle filter 38.81±1.68
ours (m=20) 8.5/14.80 particle filter 37.89±1.54
Numbers indicate the mean and the standard deviation from 1000 evaluations.
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025

Conclusion
We present ascalable frameworkfor computing, which
has formal.
It consists of three components:
Two-Level Belief Aggregation
and Dynamic ProgrammingPolicy Adaption via Rollout
Belief Estimation
via Particle Filteringofflineonlineonline
Theoretical and experimental details will be available in preprints soon.
Source code is availableat:
https://github.com/Limmen/rollout_aggregation; and
https://github.com/Limmen/csle
Work in progress!?
Kim Hammar Adaptive Security Policies via Aggregation and Rollout March, 2025