Automated Security with a Foundation Model

KimHammar 4 views 56 slides Oct 20, 2025
Slide 1
Slide 1 of 56
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56

About This Presentation

As the ubiquity and evolving nature of cyberattacks pose a growing concern to society, the automation of security processes and functions has been recognized as an important part of the response to this threat. In fact, since the early 2000s, researchers have studied automated security through model...


Slide Content

1/29
Automated Security
with a Foundation Model
Visit to the City University of Hong Kong
October 20, 2025
Dr. Kim Hammar
[email protected]

2/29
Next Generation of Security Systems MeasurementsControlsLearning
▶What role will
of security systems?

3/29
Different Types of
▶Based on the
▶Trained on.
▶Billions of parameters.
▶Examples:
▶Large language models (e.g., DeepSeek).
▶Time series models (e.g., Chronos).
▶Speech and audio models (e.g., Whisper).
▶Multi-modal models (e.g., Sora).Input
Embedding
Add & NormMasked
Multi-Head
Attention
Add & NormFeed
Forward
LinearSoftmaxInputsOutput
Probabilities
Stacked
L
Positional
Encoding

3/29
Autonomous Security Systems MeasurementsControlsLearning
▶Systems with.
▶Responds to threats and incidents autonomously.
▶Longstanding goal

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation SystemAutomatic Control
Strategy Evaluation &
Model Estimation
Mathematical Model &
Optimization

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control !Becomes a bottleneck

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control !Becomes a bottleneck
We use

5/29
Outline
▶Automated security with a foundation model.
▶Overview of our framework.
▶Theoretical analysis.
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response.
▶Comparison with frontier models.

5/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

5/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.
▶We detect likely hallucinations by evaluating.
▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.
▶We detect likely hallucinations by evaluating.
▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.
▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.▶Abstain from actions with low consistency.▶Refine actions via

7/29
Generating Candidate Actions
▶Generate
▶Can think of the LLM as a base strategy.large language modeloutput layervocabularytokenizer“root account lost on node”“root”“account”“lost”“on”“node”“isolate”“target”“node”“isolate”“target”“node”<eos>promptembeddingstokensembeddingsresponse

8/29
Lookahead Simulationa0s0

8/29
Lookahead Simulations1s0,0

8/29
Lookahead Simulationa1s0,0,1

8/29
Lookahead Simulations2s0,0,1,1

8/29
Lookahead Simulationa
2
0
a
1
0
a
3
0
▶For each candidate action
i
t, we
subsequent states and actions.
▶We

9/29
Evaluating the
▶We use.Large Language ModelSelf-inconsistent

10/29Abstaining
▶Let(a),]
of a given action.
▶We use this function to
consistency, as expressed by the following decision rule:
ργ(at) =
(
1 (abstain),(a t)
0 (not abstain),(a t)
where,].

11/29
In-Context Learning
If an action does not meet the, we abstain
from it,
select a new action throughDigital Twin...
Virtual
network
Virtual
devices
Emulated
services
Emulated
actorsTarget system Selective replicationFeedback tEvaluate actionActionContext, state

12/29
SummaryLarge Language ModelPlanContext
.
.
.
.
.
.
.
.
.
.
.
.a
1
a
2
a
N
Chain-of-thoughts
External
verificationFeedbackLogs & alerts
Candidate
actionsLookaheadConsistency evaluation λ > γConformal abstention
Compare consistency
against thresholdAction
Networked
system

13/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

14/29
Conformal Abstention
Leta i}
n
i1
be a.
Proposition 1
▶Assume the actions in the calibration dataseta i}
n
i1
are i.i.d.
▶Let˜a
▶Let0,]
probability.
Define the threshold
˜
ȷ
γ




|{i(a i)
n

⌈(n)(1)⌉
n
ff
,
where
Pnot abstain from˜a)

15/29
Regret Bound
Proposition 2 (Informal)
▶Let Kdenote the
▶Assume that the
posterior
▶Assume bandit feedback.
We have
RK≤
q
|A|K
where C
is the number of ICL iterations.

16/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

17/29
Use Case: Incident Response

17/29
Use Case: Incident Response

17/29
Use Case: Incident Response Security alerts tResponse actions tState tLearningResponse strategy
▶Problem: 0,1, . . .
secure and operational state after a cyberattack.

18/29
Response ObjectiveIntrusion eventTime of full recoveryTimeRecovery timeSurvivabilityLoss
Normal
performanceSystem performanceTolerance
Cumulative
performance loss
(want to minimize)

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.
Actions have to be tailored to the specific incident.
Challenge 2: Large and unstructured action space.

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.
Actions have to be tailored to the specific incident.
Challenge 2: Large and unstructured action space.
Delays in initiating the response can lead to costs.
Challenge 3: Time-sensitive.

20/29
Current Practice
▶Incident response is.
▶We have a
▶Pressing need for new decision support systems!

20/29
Current Practice
▶Incident response is.
▶We have a▶Pressing need for new decision support systems!

20/29
Current Practice
▶Incident response is.
▶We have a▶Pressing need for new decision support systems!

20/29
Experiment SetupIncidentsGround-truthLoss LGradient θLResponsescotllm θ?LogsKnowledgebase
Threat
intelligence
QueryRetrieveFine-tuned
θ

System
architecture
Recovery
trajectory
filter
hallucinations
Candidate
responsesResponse1. Fine-tuning2. Information retrieval3. Planning

21/29
Instruction Fine-Tuning
▶We fine-tune the
68,.
▶Minimize the
L
1
M
M
X
i1
miX
k1
lnθ

y
i
k|
i
,
i
1, . . . ,
i
k1

,
where iis the length of the vector
i
.010020030040050060070080011.5Learning rate 0.00095Learning rate 0.000095Training time (min)Training loss

22/29
Retrieval-Augmented Generation
▶We use regular expressions to extract
indicators of compromise
▶e.g., IP addresses, vulnerability
identifiers, etc.
▶We use the IOCs to
about the incident
intelligence APIs, e.g.,.
▶We
the context of the LLM.?LogsKnowledgebase
Threat
intelligence
QueryRetrieve

23/29
Experimental Evaluation
▶We evaluate our system on 4 public datasets.
Dataset System Attacks
CTU-Malware-2014 Windows xp sp2 servers Various malwares and ransomwares.
CIC-IDS-2017 Windows and Linux servers Denial-of-service, web attacks, SQL injection, etc.
AIT-IDS-V2-2022 Linux and Windows servers Multi-stage attack with reconnaissance, cracking, and escalation.
CSLE-IDS-2024 Linux servers SambaCry, Shellshock, exploit of CVE-2015-1427, etc.impact54initial access4command and control3execution3collection3lateral movement2privilege escalation2exfiltration1reconnaissance
Distribution of MITRE ATT&CK tactics in the evaluation datasets.

24/29
Baselines
▶We compare our system against
▶Compared to the frontier models,.
System Number of parametersContext window size
our system 14 billion 128,
deepseek-r1,
gemini 2.5 pro≥
openai o3≥,

25/29
Evaluation Results0102012.0216.2117.2817.097.6211.1212.2611.992.53.34.214.480102013.0919.5118.4214.397.8814.3313.479.333.193.295.295.7905101511.9513.0812.7113.347.088.017.628.284.597.127.857.950102010.8215.5319.0919.198.4710.4713.9914.051.771.932.122.790102012.2116.7118.921.427.0611.6613.8216.30.440.811.591.39our systemgemini 2.5 proopenai o3deepseek-r1AverageCTU-Malware-2014CIC-IDS-2017AIT-IDS-V2-2022CSLE-IDS-2024Recovery time% Ineffective actions% Failed recoveries

26/29
Ablation Study0510152013.4614.6814.2215.2112.7813.914.4115.4612.4114.16with RAGwithout RAGRecovery time01020304013.4625.6814.2224.1212.7821.3314.4129.9712.4127.28with fine-tuningwithout fine-tuningRecovery time010203013.4620.8714.2217.3112.7816.214.4125.1812.4124.81with lookaheadwithout lookaheadRecovery time05101512.0213.4613.0914.2211.9512.7810.8214.4112.2112.41with ICLwithout ICLRecovery timeAverageCTU-2014CIC-2017AIT-2022CSLE-2024

27/29
Scalability11.522.533.54200400Sequential implementationParallel implementationCompute time (sec)Number of candidate actions
▶The
it requires making multiple inferences with the LLM.
▶The computation can be parallelized across multiple GPU.

28/29
Conclusion
▶Foundation models will play a key role in cybersecurity.
▶Effective at tackling the scalability challenge.
▶Remarkable knowledge management capabilities.
▶We present a
▶Allows to control the hallucination probability.
▶Significantly outperforms frontier LLMs.posteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description

29/29
References
▶Paper
▶https://arxiv.org/abs/2508.05188
▶(A new paper will be released soon.)
▶Code
▶https://github.com/Limmen/csle
▶Demonstration
▶https://www.youtube.com/watch?v=XXo4Y6LCWk4
▶Data & Weights
▶https://huggingface.co/datasets/kimhammar/
CSLE-IncidentResponse-V1
▶https:
//huggingface.co/kimhammar/LLMIncidentResponse