Design of Intrusion Detection System for Cloud

kamalakantas 2 views 22 slides Oct 09, 2025
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Basics of IDS


Slide Content

Intrusion Detection System SEP 2025 Kainuru Balaji

Logo What is IDS? Presentation overview Core IDS Architecture Deployment Models The Two Main Detection Methods Machine Learning for Anomaly Detection Limitations And Future Scope

Logo What is IDS? An application that monitors network traffic and searches for known threats and suspicious or malicious activity. The IDS sends alerts to IT and security teams when it detects any security risks and threats. There are different types of IDS , they are classified into two types: 1.Based On Information Source 2.Based On Detection IDS (Intrusion Detection Systems) are used on network-based, host-based, and cloud-based devices to provide protection against security risks and threats.

Core IDS Architecture Data Collection (Sensors or Agents): It is a part which is responsible for gathering the raw data that will be analyzed . 2. The Analysis Engine: It takes the raw data from the sensors and analyzes it to identify potential security incidents. This is where the "detection" happens. Signature-Based detection and Anomaly-Based Detection. 3 . The Signature/Knowledge Base: For Signature-Based , this is a database of thousands of attack signatures. This database must be regularly updated by the vendor For Anomaly-Based , this is the "baseline" model of normal behavior that was established during the learning phase. Baseline contains like Normal traffic Volume, Typical protocols, user’s normal login times etc.. 4. The Alerting & Reporting System: Once the Analysis Engine identifies a potential threat, it needs to notify someone. This component is responsible for generating and delivering alerts.

Logo Deployment Models Based On Information Source: 1. Network intrusion detection system (NIDS) : A NIDS solution is deployed at strategic points within an organization’s network to monitor incoming and outgoing traffic. This IDS approach monitors and detects malicious and suspicious traffic coming to and going from all devices connected to the network. 2. Host intrusion detection system (HIDS) : Installed on individual devices that are connected to the internet and an organization’s internal network. To monitor the incoming and outgoing traffic of the host. It can stop attacks towards Host and Organization also For example : A device (Host) got a malware packet inside into their device then IDS triggered and sends the alert to the Host then it stops spreading of malware to whole Organization

Logo NIDS(Network IDS) HOW IT WORKS: It first captures raw data packets from the network. These packets are then decoded and reassembled into complete, understandable data streams, like emails or web pages. Analyze the packets by using detection methods.If it contains any malware or harmful then it sends alert to the user. Pros and Cons of NIDS: Pros : Broad Network Visibility Lower Cost of Ownership Operating System Independent Cons : High Volume of Alerts Cannot Analyze Host-Level Activity Blindness to Encrypted Traffic

Logo HIDS(Host IDS) HOW IT WORKS: A software agent is installed on the host machine.A software agent is installed on the host machine.The agent continuously collects data from a wide variety of sources on that specific host like System logs , Network logs ,File Integrity Monitoring etc..The collected data is analyzed by the agent to identify threats. For analyzes we use same detection methods. If the analysis engine detects a threat, it generates a detailed alert. This alert is logged locally and typically sent to a centralized HIDS management server or a SIEM (Security Information and Event Management) system. Pros and Cons of NIDS: Pros : Visibility into Encrypted Traffic Reduced Data Volume High Specificity and Accuracy Cons : Complex Deployment and Maintenance Lack of Network-Wide Visibility Operating System Dependent

Logo The Two Main Detection Methods Signature-based intrusion detection system (SIDS) : Monitors all packets in Organization and compares with the attacks signature on database know as Threats. Anomaly-based intrusion detection system (AIDS) : Monitors traffic on a network and compares it with a predefined baseline that is considered "normal." It detects anomalous activity and behavior across the network, including bandwidth, devices, ports, and protocols. An AIDS solution uses machine-learning techniques to build a baseline of normal behavior and establish a corresponding security policy. This ensures businesses can discover new, evolving threats that solutions like SIDS cannot.

Method 1: Signature-Based Detection What is Signature-Based Detection? The most common and foundational method for identifying malware. It works by comparing data (files, network packets, etc.) against a massive database of known threat "signatures. What is Signature? A signature is a unique digital fingerprint of a malicious file or activity. It's a distinct pattern of data. This can be : File Hash which is a unique ID calculated from the file's contents. A specific sequence of bytes within a file's code. .

Signature-Based: How it Works, Pros & Cons How It Works: The process involves vendors continuously collecting malware, extracting their unique signatures, and pushing them as database updates to your security software for scanning and comparison. Pros and Cons : Pros : High Accuracy & Reliability Fast and Efficient Cons : Blind to New Attacks Dependent on Constant Updates Easily Evaded by Attackers

Method 2 : Anomaly-based Detection What is Anomaly-based Detection? Anomaly-based detection is a security technique that identifies unusual behavior in a system or network by comparing current activity against a normal baseline. If something deviates significantly from the usual pattern, it is flagged as a potential threat. Uses ML (Machine Learning ) in it What is ML and Normal baseline? Machine Learning (ML): It means teaching a computer to learn patterns from data so it can make decisions or predictions in real-world situations. Normal Baseline: It’s the usual or expected behavior of a system (like normal network traffic, packet flow, or common source and destination ports) that serves as a reference point to spot anything unusual or suspicious.

Anomaly-Based: How it Works, Pros & Cons How It Works: There are two main ways to train an IDS using Machine Learning: supervised and unsupervised . In unsupervised learning , the system learns patterns on its own using methods like K-Means Clustering, Isolation Forest, or AutoEncoders. In supervised learning , the system is trained with labeled data (normal vs. attack) using techniques such as Decision Trees, Support Vector Machines, Random Forests, K-Nearest Neighbors, and Neural Networks. Pros and Cons : Pros : It can detect New attacks like ZERO Day attacks. Cons : False Positives can occur more . Needs Time to train.

Deep Dive: Machine Learning for Anomaly Detection Unsupervised learning : is like letting the IDS figure things out by itself, spotting unusual patterns without being told what’s good or bad. Supervised learning : is like teaching the IDS with examples of what’s normal and what’s an attack, so it learns to tell them apart.

Unsupervised Learning: Clustering (K-Means) K-Means Clustering: K-Means is a method that groups similar data points together into clusters, kind of like sorting items into boxes based on how alike they are. How it Works? Step 1: Select the K Centroids from the raw data. Step 2: From the Centroids, it starts clustering the packets (data) with the help of distances (from Centroid to the point). Step 3: Stop the clustering when it gets the same clustering in the iteration loop. Step 4: When the anomaly packet enters the network, the IDS uses the distance from the clusters to the anomaly packet; if it does not match or it is far, it is triggered as an anomaly and alerts the user. Here Distance can be size , destination Port etc…

Supervised Machine Learning(Decision Trees) A Decision Tree is like a flowchart that helps the system make decisions step by step, asking simple questions at each branch until it reaches an answer. What is Decision Trees? How it works? The initial step is the same as K-Means, where it checks the raw data and their key numericals (it has easier detection because it already has data labels). The raw data is converted into a tree using Gini Impurity. Basically, we just have to check whether the value is True or False . Small Example

The Challenge: How Do We Know if an IDS is Good? We have so Machine Learning used in the IDS how can be detect that are working correctly? If an IDS shows an accuracy of 99.99%, it may not be a truly reliable IDS, as it could be allowing both anomalous and normal packets into the network. Such high accuracy might be the result of an imbalance in the dataset. We use the confusion matrix, precision, and recall to evaluate how effectively the model is performing.

The Confusion Matrix Confusion Matrix: Matrix used to evaluate the performance of classification Model (Only for Supervised ) It compares the classification data with Predicted classification data gives the report. True Positive False Positive False Negative True Negative Actual Normal Actual Anomaly Predicted Normal Predicted Anomaly True Positive (TP): The model correctly detects anomalies. True Negative (TN): The model correctly identifies normal packets. False Positive (FP): Normal packets wrongly classified as anomalies. False Negative (FN): Anomalies wrongly classified as normal (dangerous in IDS).

Key Metrics (Precision & Recall) Precision : Precision is a metric that tells us how many of the packets the IDS p redicted as anomalies are actually anomalies. Precision=TP/TP+FP​ Recall: It tells us how many of the actual anomalies were detected by the IDS. Recall = TP/TP+FN

Standard Datasets for Research NSL-KDD: A refined version of the classic KDD '99 dataset. Widely used for baseline comparisons, though it is now considered dated. CIC-IDS-2017: A modern and highly respected dataset. Contain realistic network packets and wide variety of packets. UNSW-NB15: Another modern dataset known for its complex and realistic attack scenarios.

Limitations & The Road Ahead The Zero-Day Attack Problem: Signature-based and supervised ML models can't detect what they haven't seen before. This remains the holy grail of intrusion detection. The Alert Fatigue Problem: Anomaly-based systems often generate too many False Positives. Security analysts become overwhelmed and start ignoring alerts, defeating the purpose of the IDS.

The Future: Adaptive Security with Reinforcement Learning What is Reinforcement Learning (RL)? An agent learns through trial-and-error, receiving "rewards" for good actions and "penalties" for bad ones. Application in IDS: An RL agent could learn to automatically update firewall rules. It could adapt its detection policies in real-time based on new traffic patterns. It creates a truly adaptive defense system that evolves over time.

Thank you
Tags