Automated Security with a Foundation Model

1/29
Automated Security
with a Foundation Model
Visit to the City University of Hong Kong
October 20, 2025
Dr. Kim Hammar
[email protected]

2/29
Next Generation of Security Systems MeasurementsControlsLearning
▶What role will
of security systems?

3/29
Different Types of
▶Based on the
▶Trained on.
▶Billions of parameters.
▶Examples:
▶Large language models (e.g., DeepSeek).
▶Time series models (e.g., Chronos).
▶Speech and audio models (e.g., Whisper).
▶Multi-modal models (e.g., Sora).Input
Embedding
Add & NormMasked
Multi-Head
Attention
Add & NormFeed
Forward
LinearSoftmaxInputsOutput
Probabilities
Stacked
L
Positional
Encoding

3/29
Autonomous Security Systems MeasurementsControlsLearning
▶Systems with.
▶Responds to threats and incidents autonomously.
▶Longstanding goal

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation SystemAutomatic Control
Strategy Evaluation &
Model Estimation
Mathematical Model &
Optimization

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget SystemSystem Identification
Strategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control !Becomes a bottleneck

4/29
Methodologys1,1s1,2s1,3...s1,ns2,1s2,2s2,3...s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. Emulation SystemTarget System
System IdentificationStrategy Mapping
π
Selective
Replication
Strategy
ImplementationSimulation System
Mathematical Model &
Optimization
Strategy Evaluation &
Model EstimationAutomatic Control !Becomes a bottleneck
We use

5/29
Outline
▶Automated security with a foundation model.
▶Overview of our framework.
▶Theoretical analysis.
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response.
▶Comparison with frontier models.

5/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.
▶We detect likely hallucinations by evaluating.
▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.
▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.▶Abstain from actions with low consistency.
▶Refine actions via

6/29
Automated Security with aposteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description
▶We use the
▶We evaluate actions through.▶We detect likely hallucinations by evaluating.▶Abstain from actions with low consistency.▶Refine actions via

7/29
Generating Candidate Actions
▶Generate
▶Can think of the LLM as a base strategy.large language modeloutput layervocabularytokenizer“root account lost on node”“root”“account”“lost”“on”“node”“isolate”“target”“node”“isolate”“target”“node”<eos>promptembeddingstokensembeddingsresponse

8/29
Lookahead Simulationa0s0

8/29
Lookahead Simulations1s0,0

8/29
Lookahead Simulationa1s0,0,1

8/29
Lookahead Simulations2s0,0,1,1

8/29
Lookahead Simulationa
2
0
a
1
0
a
3
0
▶For each candidate action
i
t, we
subsequent states and actions.
▶We

9/29
Evaluating the
▶We use.Large Language ModelSelf-inconsistent

10/29Abstaining
▶Let(a),]
of a given action.
▶We use this function to
consistency, as expressed by the following decision rule:
ργ(at) =
(
1 (abstain),(a t)
0 (not abstain),(a t)
where,].

11/29
In-Context Learning
If an action does not meet the, we abstain
from it,
select a new action throughDigital Twin...
Virtual
network
Virtual
devices
Emulated
services
Emulated
actorsTarget system Selective replicationFeedback tEvaluate actionActionContext, state

12/29
SummaryLarge Language ModelPlanContext
.
.
.
.
.
.
.
.
.
.
.
.a
1
a
2
a
N
Chain-of-thoughts
External
verificationFeedbackLogs & alerts
Candidate
actionsLookaheadConsistency evaluation λ > γConformal abstention
Compare consistency
against thresholdAction
Networked
system

13/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

14/29
Conformal Abstention
Leta i}
n
i1
be a.
Proposition 1
▶Assume the actions in the calibration dataseta i}
n
i1
are i.i.d.
▶Let˜a
▶Let0,]
probability.
Define the threshold
˜
ȷ
γ

|{i(a i)
n
≥
⌈(n)(1)⌉
n
ff
,
where
Pnot abstain from˜a)

15/29
Regret Bound
Proposition 2 (Informal)
▶Let Kdenote the
▶Assume that the
posterior
▶Assume bandit feedback.
We have
RK≤
q
|A|K
where C
is the number of ICL iterations.

16/29
Outline
▶Automated security with a foundation model
▶Overview of our framework.
▶Theoretical analysis
▶Controlling the hallucination bound.
▶Regret bound.
▶Case study: Incident Response
▶Comparison with frontier models.

17/29
Use Case: Incident Response

17/29
Use Case: Incident Response Security alerts tResponse actions tState tLearningResponse strategy
▶Problem: 0,1, . . .
secure and operational state after a cyberattack.

18/29
Response ObjectiveIntrusion eventTime of full recoveryTimeRecovery timeSurvivabilityLoss
Normal
performanceSystem performanceTolerance
Cumulative
performance loss
(want to minimize)

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.
Actions have to be tailored to the specific incident.
Challenge 2: Large and unstructured action space.

19/29
Challenges
The operator has to select response actions based on
tial indicators of compromise, such as alerts and logs.
Challenge 1: Partial observability.
Actions have to be tailored to the specific incident.
Challenge 2: Large and unstructured action space.
Delays in initiating the response can lead to costs.
Challenge 3: Time-sensitive.

20/29
Current Practice
▶Incident response is.
▶We have a
▶Pressing need for new decision support systems!

20/29
Current Practice
▶Incident response is.
▶We have a▶Pressing need for new decision support systems!

20/29
Experiment SetupIncidentsGround-truthLoss LGradient θLResponsescotllm θ?LogsKnowledgebase
Threat
intelligence
QueryRetrieveFine-tuned
θ
′
System
architecture
Recovery
trajectory
filter
hallucinations
Candidate
responsesResponse1. Fine-tuning2. Information retrieval3. Planning

21/29
Instruction Fine-Tuning
▶We fine-tune the
68,.
▶Minimize the
L
1
M
M
X
i1
miX
k1
lnθ
“
y
i
k|
i
,
i
1, . . . ,
i
k1
”
,
where iis the length of the vector
i
.010020030040050060070080011.5Learning rate 0.00095Learning rate 0.000095Training time (min)Training loss

22/29
Retrieval-Augmented Generation
▶We use regular expressions to extract
indicators of compromise
▶e.g., IP addresses, vulnerability
identifiers, etc.
▶We use the IOCs to
about the incident
intelligence APIs, e.g.,.
▶We
the context of the LLM.?LogsKnowledgebase
Threat
intelligence
QueryRetrieve

23/29
Experimental Evaluation
▶We evaluate our system on 4 public datasets.
Dataset System Attacks
CTU-Malware-2014 Windows xp sp2 servers Various malwares and ransomwares.
CIC-IDS-2017 Windows and Linux servers Denial-of-service, web attacks, SQL injection, etc.
AIT-IDS-V2-2022 Linux and Windows servers Multi-stage attack with reconnaissance, cracking, and escalation.
CSLE-IDS-2024 Linux servers SambaCry, Shellshock, exploit of CVE-2015-1427, etc.impact54initial access4command and control3execution3collection3lateral movement2privilege escalation2exfiltration1reconnaissance
Distribution of MITRE ATT&CK tactics in the evaluation datasets.

24/29
Baselines
▶We compare our system against
▶Compared to the frontier models,.
System Number of parametersContext window size
our system 14 billion 128,
deepseek-r1,
gemini 2.5 pro≥
openai o3≥,

25/29
Evaluation Results0102012.0216.2117.2817.097.6211.1212.2611.992.53.34.214.480102013.0919.5118.4214.397.8814.3313.479.333.193.295.295.7905101511.9513.0812.7113.347.088.017.628.284.597.127.857.950102010.8215.5319.0919.198.4710.4713.9914.051.771.932.122.790102012.2116.7118.921.427.0611.6613.8216.30.440.811.591.39our systemgemini 2.5 proopenai o3deepseek-r1AverageCTU-Malware-2014CIC-IDS-2017AIT-IDS-V2-2022CSLE-IDS-2024Recovery time% Ineffective actions% Failed recoveries

26/29
Ablation Study0510152013.4614.6814.2215.2112.7813.914.4115.4612.4114.16with RAGwithout RAGRecovery time01020304013.4625.6814.2224.1212.7821.3314.4129.9712.4127.28with fine-tuningwithout fine-tuningRecovery time010203013.4620.8714.2217.3112.7816.214.4125.1812.4124.81with lookaheadwithout lookaheadRecovery time05101512.0213.4613.0914.2211.9512.7810.8214.4112.2112.41with ICLwithout ICLRecovery timeAverageCTU-2014CIC-2017AIT-2022CSLE-2024

27/29
Scalability11.522.533.54200400Sequential implementationParallel implementationCompute time (sec)Number of candidate actions
▶The
it requires making multiple inferences with the LLM.
▶The computation can be parallelized across multiple GPU.

28/29
Conclusion
▶Foundation models will play a key role in cybersecurity.
▶Effective at tackling the scalability challenge.
▶Remarkable knowledge management capabilities.
▶We present a
▶Allows to control the hallucination probability.
▶Significantly outperforms frontier LLMs.posteriorlookaheadconsistencyActionsOutcomesFeedbackExternal verificationIn-context learningAction
Conformal
abstention(priorTask description

29/29
References
▶Paper
▶https://arxiv.org/abs/2508.05188
▶(A new paper will be released soon.)
▶Code
▶https://github.com/Limmen/csle
▶Demonstration
▶https://www.youtube.com/watch?v=XXo4Y6LCWk4
▶Data & Weights
▶https://huggingface.co/datasets/kimhammar/
CSLE-IncidentResponse-V1
▶https:
//huggingface.co/kimhammar/LLMIncidentResponse

Automated Security with a Foundation Model

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Automated Security with a Foundation Model

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx