Log-based anomaly detection using BiLSTM-Autoencoder

MohammedBekkouche 11 views 21 slides Oct 28, 2025
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Log-based anomaly detection aims to identify abnormal system behaviors, frequent early indicators of system failures, by analyzing log data generated during system execution. Automating this process through machine learning is essential, as the complexity of modern systems makes manual log inspectio...


Slide Content

Log-based anomaly detection using
BiLSTM-Autoencoder
Mohammed Bekkouche
1
Melissa Meski
1
Yousra Khodja
1
Sidi
Mohammed Benslimane
1
Enrico Tronci
2
1
LabRI-SBA Laboratory,´Ecole Sup´erieure en Informatique, Sidi Bel Abbes, Algeria
2
Computer Science Department, Sapienza University of Rome, Italy
ICNAS 2025
October 29-30, 2025
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Structure
1
Introduction
2
Approach for Log-Based Anomaly Detection
3
Evaluation
4
Related Work
5
Conclusion
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Introduction
Introduction
Context and Problem Statement
Modern computing systems rely on large-scale distributed architectures
composed of thousands of machines.
These systems generate massive log data, essential for monitoring, diagnosis,
and maintenance.
Logs capture events and internal states, offering insights into system
behavior.
Manual inspection is infeasible due to log volume and complexity.
Machine learning enables automated anomaly detection, improving accuracy
and scalability.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Introduction
Introduction
Motivations and Contributions
Reconstruction-based methods (autoencoders) are effective in an
unsupervised
Normal sequences
Anomalous sequences
Notable works: DeepLog and LogAnomaly
Both use LSTM-based autoencoders to model sequential dependencies in
logs and are suited for real-time systems where future context is unavailable.
HDFS (Hadoop Distributed File System) dataset, widely used for evaluating
log-based anomaly detection approaches.
Our proposal:
Unlike DeepLog and LogAnomaly, which rely on
LSTMs, our BiLSTM autoencoder leverages both
context, enabling richer sequence modeling and more accurate
anomaly detection.
Results on the HDFS dataset confirm this advantage.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
Approach for Log-Based Anomaly Detection
Log Parsing:
Identifying constant patterns
and extracting variable parameters
Input:
Output:
(log events), extracted variable
parameters
Feature Extraction:
- Identifier-based partitioning
- Converting log sequences
into feature vectors
(using TF-IDF or Word2Vec)
Input:
Output:
vectors
Anomaly Detection:
Identifying abnormal log sequences
Input:
A machine learning model
Output:
or not) for each vector
Figure:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
Log-based Anomaly Detection Framework
Log Parsing
Transforms.
Separates variable components from constants.
Example (HDFS dataset):
Raw log: 789 of size 67108864 from
/10.251.42.84
Parsed template:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
Log-based Anomaly Detection Framework
Feature Extraction
Encode raw logs into numerical vectors for ML models.
Logs grouped into
Techniques:
TF-IDF: sparse vectors reflecting event importance.
Word2Vec: dense vectors capturing semantic similarity.
Anomaly Detection
Performed on the feature vectors.
Models predict if a sequence is normal or abnormal.
Techniques range from:
Classical: SVM, Random Forest, Decision Trees.
Clustering: K-means, DBSCAN.
Deep learning: Autoencoders, LSTMs, Transformers.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
Autoencoder Models for Anomaly Detection
We utilize
We propose an enhanced version using
(BiLSTM).
The following describes both models:
1
LSTM Autoencoder
2
BiLSTM Autoencoder
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
LSTM Autoencoder
Autoencoder type for.
This approach use LSTM encoder-decoder autoencoder for anomaly
detection.
Learns compressed latent representation
High reconstruction error
Applied in log anomaly detection:
DeepLog
LogAnomaly
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
BiLSTM Autoencoder
Encoder:
captures past & future
dependencies.
Latent vector (bottleneck):
compressed representation of
sequence.
Decoder: LSTM or BiLSTM
reconstructs original input.
Training on normal logs
accurate reconstruction.
Anomalous logs
reconstruction error.
Input Sequence
(TF-IDF / Word2Vec)BiLSTM Encoder
(Forward & Backward)
Latent
VectorLSTM or BiLSTM DecoderReconstructed Sequence
Reconstruction
Error (E)
Compare to Threshold
EAnomaly Score or LabelForward passBackward passForward LSTM
Optional
Backward LSTM
Figure:
Log-Based Anomaly Detection.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Approach for Log-Based Anomaly Detection
Threshold Optimization (F1-Sensitive Tuning)
Decision based on reconstruction error
Instead of fixed threshold:
Use small labeled validation set.
Optimize threshold.
Balances:
Precision (avoid false positives).
Recall (avoid missing anomalies).
Improves robustness and practicality with limited labels.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Experiment Setting
Datasets:
Logs grouped into sequences using id).
Produced, including.
Two dataset versions:
Reduced HDFS:
Complete HDFS:
Split strategies: Sequential Split (SS) and Uniform Split (US).
All experiments used
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Baselines, Metrics, and Setup
Baselines:
Unsupervised: PCA, Isolation Forest, Invariants Mining.
Supervised: Decision Tree, SVM, DT+SVM.
Deep Learning: DeepLog, LogAnomaly.
Evaluation Metrics:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1-Score = Harmonic mean of Precision and Recall
Experimental Setup:
Intel Core i5-1135G7, 8 GB RAM, Ubuntu 22.04.
Implemented in Python with scikit-learn and PyTorch.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Results on the Reduced HDFS Dataset
Table:
HDFS dataset
Approach PrecisionRecallF1-Score
PCA 0.9660.3630.528
iForest 0.9850.4270.596
IM 0.9670.5610.710
DT 0.9900.4270.596
SVM 0.9870.4120.578
DT+SVM 0.9860.4330.602
LSTM+AE 0.7140.7640.737
BiLSTM+AE 0.8510.9490.898
LSTM+AE+W2V 0.7610.9490.844
BiLSTM+AE+W2V 0.9200.9550.938
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Discussion – Reduced HDFS Results
LSTM autoencoder
supervised ML.
Word2Vec features
similarities.
BiLSTM autoencoder
both TF-IDF and Word2Vec.
Best configuration:.
Gains explained by:
Temporal modeling of log sequences (LSTM/BiLSTM).
Bidirectional context (BiLSTM) captures past & future dependencies.
Semantic embeddings (Word2Vec) provide richer feature
representations.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Results on the Complete HDFS Dataset
Table:
HDFS dataset
Approach PrecisionRecallF1-Score
PCA 0.9980.6880.814
iForest 0.9830.7030.820
IM 0.8800.9500.910
DeepLog 0.9200.9100.910
LogAnomaly 0.9600.9400.950
BiLSTM+AE 0.9830.9480.965
BiLSTM+AE+W2V 0.9871.0000.993
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Evaluation
Discussion – Complete HDFS Results
BiLSTM+AE+Word2Vec
Higher accuracy than BiLSTM+AE alone.
Surpasses DeepLog and LogAnomaly.
Accuracy on the complete dataset is higher than on the reduced
dataset
Key advantages:
Bidirectional modeling
Semantic embeddings (Word2Vec): richer feature representation.
More training data: broader exposure to normal patterns improves
generalization.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Related Work
Related Work – Paradigms
Machine learning methods for log-based anomaly detection fall into
two main categories:
Reconstruction-based methods:
Train models to reproduce normal log sequences.
Low reconstruction error
High reconstruction error
Assumption: models trained only on normal logs fail on anomalous
ones.
Binary classification-based methods:
Models output probability scores for anomaly detection.
Single-output:
Dual-output:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Related Work
Related Work – Our Contribution
Existing reconstruction-based methods (e.g., DeepLog, LogAnomaly):
Rely on unidirectional modeling of log sequences.
Use sparse frequency-based features (e.g., TF-IDF).
Our approach:
Introduces a
Captures both past and future dependencies.
Enhances contextual understanding of logs.
Employs:
Captures semantic similarities between log messages.
Provides richer input representations than TF-IDF.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Conclusion
Conclusion
System logs are key for understanding system behavior and detecting
issues.
Machine learning advances log-based anomaly detection by
automating analysis and improving accuracy.
LSTM autoencoders are effective in real-time settings with only past
context available.
Proposed BiLSTM autoencoder captures both forward and backward
dependencies, enhancing anomaly detection on static log datasets.
Future directions:
Leverage Transformers for deeper log pattern understanding.
Combine deep learning with clustering for stronger detection.
Develop real-time anomaly detection for faster, more reliable responses.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025

Conclusion
Thank You
Questions?
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025