Log-based anomaly detection using BiLSTM-Autoencoder
MohammedBekkouche
11 views
21 slides
Oct 28, 2025
Slide 1 of 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
About This Presentation
Log-based anomaly detection aims to identify abnormal system behaviors, frequent early indicators of system failures, by analyzing log data generated during system execution. Automating this process through machine learning is essential, as the complexity of modern systems makes manual log inspectio...
Log-based anomaly detection aims to identify abnormal system behaviors, frequent early indicators of system failures, by analyzing log data generated during system execution. Automating this process through machine learning is essential, as the complexity of modern systems makes manual log inspection extremely challenging. Among the most promising solutions are autoencoder-based techniques, which learn to reconstruct normal log patterns and detect irregularities as anomalies. However, traditional autoencoders typically process log data in a unidirectional (forward) manner, which is suitable for real-time systems but may not adequately exploit contextual dependencies in fixed log datasets.
In this article, we propose to use a BiLSTM autoencoder model for log-based anomaly detection. This variant extends the standard autoencoder by incorporating Bidirectional Long Short-Term Memory (BiLSTM) layers, enabling the model to capture both forward and backward dependencies in log sequences. Such a design is particularly suitable for non-real-time analysis of fixed datasets, such as HDFS (Hadoop Distributed File System) logs, where real-time constraints are relaxed.
To optimize detection precision and recall, and thus the F1-Score, we introduce a threshold selection strategy based on F1-sensitive tuning, which systematically adjusts the anomaly threshold to maximize the F1-Score, resulting in enhanced detection performance.
Experimental evaluation demonstrates that our model outperforms traditional machine learning methods and achieves higher accuracy than two notable deep learning-based approaches: DeepLog and LogAnomaly. Furthermore, using Word2Vec compared to TF-IDF (Term Frequency–Inverse Document Frequency) for log representation further enhances the model’s performance, yielding improved detection accuracy.
Size: 320.55 KB
Language: en
Added: Oct 28, 2025
Slides: 21 pages
Slide Content
Log-based anomaly detection using
BiLSTM-Autoencoder
Mohammed Bekkouche
1
Melissa Meski
1
Yousra Khodja
1
Sidi
Mohammed Benslimane
1
Enrico Tronci
2
1
LabRI-SBA Laboratory,´Ecole Sup´erieure en Informatique, Sidi Bel Abbes, Algeria
2
Computer Science Department, Sapienza University of Rome, Italy
ICNAS 2025
October 29-30, 2025
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Structure
1
Introduction
2
Approach for Log-Based Anomaly Detection
3
Evaluation
4
Related Work
5
Conclusion
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Introduction
Introduction
Context and Problem Statement
Modern computing systems rely on large-scale distributed architectures
composed of thousands of machines.
These systems generate massive log data, essential for monitoring, diagnosis,
and maintenance.
Logs capture events and internal states, offering insights into system
behavior.
Manual inspection is infeasible due to log volume and complexity.
Machine learning enables automated anomaly detection, improving accuracy
and scalability.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Introduction
Introduction
Motivations and Contributions
Reconstruction-based methods (autoencoders) are effective in an
unsupervised
Normal sequences
Anomalous sequences
Notable works: DeepLog and LogAnomaly
Both use LSTM-based autoencoders to model sequential dependencies in
logs and are suited for real-time systems where future context is unavailable.
HDFS (Hadoop Distributed File System) dataset, widely used for evaluating
log-based anomaly detection approaches.
Our proposal:
Unlike DeepLog and LogAnomaly, which rely on
LSTMs, our BiLSTM autoencoder leverages both
context, enabling richer sequence modeling and more accurate
anomaly detection.
Results on the HDFS dataset confirm this advantage.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
Approach for Log-Based Anomaly Detection
Log Parsing:
Identifying constant patterns
and extracting variable parameters
Input:
Output:
(log events), extracted variable
parameters
Feature Extraction:
- Identifier-based partitioning
- Converting log sequences
into feature vectors
(using TF-IDF or Word2Vec)
Input:
Output:
vectors
Anomaly Detection:
Identifying abnormal log sequences
Input:
A machine learning model
Output:
or not) for each vector
Figure:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
Log-based Anomaly Detection Framework
Log Parsing
Transforms.
Separates variable components from constants.
Example (HDFS dataset):
Raw log: 789 of size 67108864 from
/10.251.42.84
Parsed template:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
Log-based Anomaly Detection Framework
Feature Extraction
Encode raw logs into numerical vectors for ML models.
Logs grouped into
Techniques:
TF-IDF: sparse vectors reflecting event importance.
Word2Vec: dense vectors capturing semantic similarity.
Anomaly Detection
Performed on the feature vectors.
Models predict if a sequence is normal or abnormal.
Techniques range from:
Classical: SVM, Random Forest, Decision Trees.
Clustering: K-means, DBSCAN.
Deep learning: Autoencoders, LSTMs, Transformers.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
Autoencoder Models for Anomaly Detection
We utilize
We propose an enhanced version using
(BiLSTM).
The following describes both models:
1
LSTM Autoencoder
2
BiLSTM Autoencoder
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
LSTM Autoencoder
Autoencoder type for.
This approach use LSTM encoder-decoder autoencoder for anomaly
detection.
Learns compressed latent representation
High reconstruction error
Applied in log anomaly detection:
DeepLog
LogAnomaly
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
BiLSTM Autoencoder
Encoder:
captures past & future
dependencies.
Latent vector (bottleneck):
compressed representation of
sequence.
Decoder: LSTM or BiLSTM
reconstructs original input.
Training on normal logs
accurate reconstruction.
Anomalous logs
reconstruction error.
Input Sequence
(TF-IDF / Word2Vec)BiLSTM Encoder
(Forward & Backward)
Latent
VectorLSTM or BiLSTM DecoderReconstructed Sequence
Reconstruction
Error (E)
Compare to Threshold
EAnomaly Score or LabelForward passBackward passForward LSTM
Optional
Backward LSTM
Figure:
Log-Based Anomaly Detection.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Approach for Log-Based Anomaly Detection
Threshold Optimization (F1-Sensitive Tuning)
Decision based on reconstruction error
Instead of fixed threshold:
Use small labeled validation set.
Optimize threshold.
Balances:
Precision (avoid false positives).
Recall (avoid missing anomalies).
Improves robustness and practicality with limited labels.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Experiment Setting
Datasets:
Logs grouped into sequences using id).
Produced, including.
Two dataset versions:
Reduced HDFS:
Complete HDFS:
Split strategies: Sequential Split (SS) and Uniform Split (US).
All experiments used
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Baselines, Metrics, and Setup
Baselines:
Unsupervised: PCA, Isolation Forest, Invariants Mining.
Supervised: Decision Tree, SVM, DT+SVM.
Deep Learning: DeepLog, LogAnomaly.
Evaluation Metrics:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1-Score = Harmonic mean of Precision and Recall
Experimental Setup:
Intel Core i5-1135G7, 8 GB RAM, Ubuntu 22.04.
Implemented in Python with scikit-learn and PyTorch.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Results on the Reduced HDFS Dataset
Table:
HDFS dataset
Approach PrecisionRecallF1-Score
PCA 0.9660.3630.528
iForest 0.9850.4270.596
IM 0.9670.5610.710
DT 0.9900.4270.596
SVM 0.9870.4120.578
DT+SVM 0.9860.4330.602
LSTM+AE 0.7140.7640.737
BiLSTM+AE 0.8510.9490.898
LSTM+AE+W2V 0.7610.9490.844
BiLSTM+AE+W2V 0.9200.9550.938
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Discussion – Reduced HDFS Results
LSTM autoencoder
supervised ML.
Word2Vec features
similarities.
BiLSTM autoencoder
both TF-IDF and Word2Vec.
Best configuration:.
Gains explained by:
Temporal modeling of log sequences (LSTM/BiLSTM).
Bidirectional context (BiLSTM) captures past & future dependencies.
Semantic embeddings (Word2Vec) provide richer feature
representations.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Results on the Complete HDFS Dataset
Table:
HDFS dataset
Approach PrecisionRecallF1-Score
PCA 0.9980.6880.814
iForest 0.9830.7030.820
IM 0.8800.9500.910
DeepLog 0.9200.9100.910
LogAnomaly 0.9600.9400.950
BiLSTM+AE 0.9830.9480.965
BiLSTM+AE+W2V 0.9871.0000.993
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Evaluation
Discussion – Complete HDFS Results
BiLSTM+AE+Word2Vec
Higher accuracy than BiLSTM+AE alone.
Surpasses DeepLog and LogAnomaly.
Accuracy on the complete dataset is higher than on the reduced
dataset
Key advantages:
Bidirectional modeling
Semantic embeddings (Word2Vec): richer feature representation.
More training data: broader exposure to normal patterns improves
generalization.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Related Work
Related Work – Paradigms
Machine learning methods for log-based anomaly detection fall into
two main categories:
Reconstruction-based methods:
Train models to reproduce normal log sequences.
Low reconstruction error
High reconstruction error
Assumption: models trained only on normal logs fail on anomalous
ones.
Binary classification-based methods:
Models output probability scores for anomaly detection.
Single-output:
Dual-output:
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Related Work
Related Work – Our Contribution
Existing reconstruction-based methods (e.g., DeepLog, LogAnomaly):
Rely on unidirectional modeling of log sequences.
Use sparse frequency-based features (e.g., TF-IDF).
Our approach:
Introduces a
Captures both past and future dependencies.
Enhances contextual understanding of logs.
Employs:
Captures semantic similarities between log messages.
Provides richer input representations than TF-IDF.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Conclusion
Conclusion
System logs are key for understanding system behavior and detecting
issues.
Machine learning advances log-based anomaly detection by
automating analysis and improving accuracy.
LSTM autoencoders are effective in real-time settings with only past
context available.
Proposed BiLSTM autoencoder captures both forward and backward
dependencies, enhancing anomaly detection on static log datasets.
Future directions:
Leverage Transformers for deeper log pattern understanding.
Combine deep learning with clustering for stronger detection.
Develop real-time anomaly detection for faster, more reliable responses.
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025
Conclusion
Thank You
Questions?
Mohammed Bekkouche (ESI-SBA)Log-based anomaly detectionICNAS 2025