Feature Engineering & Data Preprocessing Data Loading & Cleaning: - Traffic signal data was sourced from the MMITSS dataset, which includes features like Phase Rings and Cumulative Current State 1 & 2. - Attack Label Calculation: Data points where Cumulative Current State exceeds the 99th percentile were labeled as attacks. Code Snippet: threshold_c1 = df['CumulativeCurrentState1'].quantile(0.99) df['AttackLabel'] = (df['CumulativeCurrentState1'] > threshold_c1).astype(int)
Data Transformation One-Hot Encoding: - Categorical features such as Major and Minor Streets were converted using one-hot encoding. Normalization: - MinMaxScaler was used to normalize numerical data, ensuring all features are scaled between 0 and 1 for optimal LSTM performance. Code Snippet: scaler = MinMaxScaler() df_scaled = scaler.fit_transform(df[['CumulativeCurrentState1', 'CumulativeCurrentState2']])
LSTM Model Architecture Sequential LSTM Model: - The model consists of 3 LSTM layers, each with 50 units and dropout layers to prevent overfitting. - Final Dense layer with a sigmoid activation function for binary classification. Code Snippet: model = Sequential([ LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)), Dropout(0.2), LSTM(50, return_sequences=True), Dropout(0.2), LSTM(50), Dropout(0.2), Dense(1, activation='sigmoid') ])
Model Training & Validation Training Process: - The model was trained on 80% of the dataset and validated on the remaining 20%. - The Adam optimizer and binary cross-entropy loss function were used. Training Parameters: - 5 epochs with a batch size of 32 were used for training. Graph: Training loss and accuracy across epochs.
Model Performance Evaluation Confusion Matrix: - The confusion matrix reveals true positives, true negatives, false positives, and false negatives. Key Metrics: - Precision, recall, and F1-scores close to 1.0 show strong model performance. Graph: Confusion matrix heatmap.
Exploratory Data Analysis (EDA) Insights Cumulative Current State 1 Over Time: - Visualizes periods of high and low activity in traffic signals. Distribution of Attack Labels: - Dataset imbalance with fewer attack instances. Graph: Time series plot of Cumulative Current State 1.
Detailed Feature Correlation Correlation Analysis: - Key features like Cumulative Current State 1, Duration Indication Major Street are highly correlated with anomalies. Graph: Feature correlation matrix showing the relationships.
Pairplot of Key Features Key Features Analysis: - Pairplot shows clear separation between normal and attack data points for features like Cumulative Current State 1. Graph: Pairplot of key features segmented by attack labels.
Model Results Summary Model Accuracy & Loss: - High accuracy achieved during both training and testing phases. - Decreasing loss indicates fewer prediction errors. Graph: Accuracy and loss curves across epochs.
Discussion & Implications Implications: - Early detection of anomalies can improve traffic management efficiency. - Scalability: The model can be integrated into real-time traffic management systems. Limitations: - Dataset imbalance may affect model performance. Graph: Anomaly detection in real-time traffic systems.
Conclusion & Future Research Key Findings: - LSTM models are effective for anomaly detection in time-series traffic data. - High precision and recall were achieved. Future Research: - Address dataset imbalance with oversampling or data augmentation. - Explore additional features such as weather conditions.
Title Slide Traffic Signal Anomaly Detection Using LSTM Models Presented by: [Your Name/Team] Date: [Presentation Date] Affiliation: Transportation Department / Relevant Organization
Introduction Introduction: - Traffic signal systems are critical for managing urban traffic flow. - Signal anomalies, whether caused by system failures or malicious attacks, can lead to traffic jams, accidents, and inefficiency. - The goal of this project is to create a machine learning model capable of detecting these anomalies.
Agenda 1. Feature Engineering & Data Preprocessing 2. Data Transformation 3. LSTM Model Architecture 4. Model Training & Validation 5. Model Performance Evaluation 6. Exploratory Data Analysis Insights 7. Detailed Feature Correlation 8. Pairplot of Key Features 9. Model Results Summary 10. Discussion & Implications 11. Conclusion & Future Research
Problem Statement Problem Statement: - Malfunctions in traffic signal systems can arise from technical difficulties or hacking attempts. - These malfunctions can lead to inefficient traffic management, longer delays, and safety hazards. - Traditional methods for detecting anomalies, such as rule-based systems and manual monitoring, are inefficient. - We propose a machine learning-based approach using Long Short-Term Memory (LSTM) networks to detect anomalies in real time.
Feature Engineering & Data Preprocessing Data Loading & Cleaning: - Traffic signal data was sourced from the MMITSS dataset, which includes features like Phase Rings and Cumulative Current State 1 & 2. - Attack Label Calculation: Data points where Cumulative Current State exceeds the 99th percentile were labeled as attacks. Code Snippet: threshold_c1 = df['CumulativeCurrentState1'].quantile(0.99) df['AttackLabel'] = (df['CumulativeCurrentState1'] > threshold_c1).astype(int)
Data Transformation One-Hot Encoding: - Categorical features such as Major and Minor Streets were converted using one-hot encoding. Normalization: - MinMaxScaler was used to normalize numerical data, ensuring all features are scaled between 0 and 1 for optimal LSTM performance. Code Snippet: scaler = MinMaxScaler() df_scaled = scaler.fit_transform(df[['CumulativeCurrentState1', 'CumulativeCurrentState2']])
LSTM Model Architecture Sequential LSTM Model: - The model consists of 3 LSTM layers, each with 50 units and dropout layers to prevent overfitting. - Final Dense layer with a sigmoid activation function for binary classification. Code Snippet: model = Sequential([ LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)), Dropout(0.2), LSTM(50, return_sequences=True), Dropout(0.2), LSTM(50), Dropout(0.2), Dense(1, activation='sigmoid') ])
Model Training & Validation Training Process: - The model was trained on 80% of the dataset and validated on the remaining 20%. - The Adam optimizer and binary cross-entropy loss function were used. Training Parameters: - 5 epochs with a batch size of 32 were used for training. Graph: Training loss and accuracy across epochs.
Model Performance Evaluation Confusion Matrix: - The confusion matrix reveals true positives, true negatives, false positives, and false negatives. Key Metrics: - Precision, recall, and F1-scores close to 1.0 show strong model performance. Graph: Confusion matrix heatmap.
Exploratory Data Analysis (EDA) Insights Cumulative Current State 1 Over Time: - Visualizes periods of high and low activity in traffic signals. Distribution of Attack Labels: - Dataset imbalance with fewer attack instances. Graph: Time series plot of Cumulative Current State 1.
Detailed Feature Correlation Correlation Analysis: - Key features like Cumulative Current State 1, Duration Indication Major Street are highly correlated with anomalies. Graph: Feature correlation matrix showing the relationships.
Pairplot of Key Features Key Features Analysis: - Pairplot shows clear separation between normal and attack data points for features like Cumulative Current State 1. Graph: Pairplot of key features segmented by attack labels.
Model Results Summary Model Accuracy & Loss: - High accuracy achieved during both training and testing phases. - Decreasing loss indicates fewer prediction errors. Graph: Accuracy and loss curves across epochs.
Discussion & Implications Implications: - Early detection of anomalies can improve traffic management efficiency. - Scalability: The model can be integrated into real-time traffic management systems. Limitations: - Dataset imbalance may affect model performance. Graph: Anomaly detection in real-time traffic systems.
Conclusion & Future Research Key Findings: - LSTM models are effective for anomaly detection in time-series traffic data. - High precision and recall were achieved. Future Research: - Address dataset imbalance with oversampling or data augmentation. - Explore additional features such as weather conditions.