Course Objective The course is intended to: To learn the fundamentals of cryptography . To learn the key management techniques and authentication approaches . To explore the network and transport layer To understand the application layer security standards. To learn the real time security practices. 1
Course Outcomes CO 5: Design comprehensive firewall and IDS architectures to protect network assets and mitigate security risks, considering factors such as traffic patterns and regulatory requirements. Unit 5: Firewalls and Intrusion Detection Systems: Intrusion Detection Password Management, Firewall Characteristics Types of Firewalls, Firewall Basing, Firewall Location and Configurations. Blockchains , Cloud Security and IoT security 2
Intrusion Detection Systems and Firewall Goals Expressiveness: What kinds of policies can we write? Effectiveness : How well does it detect attacks while avoiding false positives? Efficiency: How many resources does it take, and how quickly does it decide? Ease of use: How much training is necessary? Can a non-security expert use it? Security: Can the system itself be attacked? Transparency: How intrusive is it to use? 3
Firewalls and its types Firewall-Definition: A firewall is a network security device that monitors incoming and outgoing network traffic and decides whether to allow or block specific traffic based on a defined set of security rules Dimensions: Host vs. Network Stateless vs. Stateful Network L ayer 4
Firewall Goals Provide defense in depth by: Blocking attacks against hosts and services Control traffic between zones of trust 5
Logical Viewpoint 6 Inside Outside Firewall For each message m, either: Allow with or without modification Block by dropping or sending rejection notice Queue m ?
Placement Host-based Firewall Network-Based Firewall 7 Host Firewall Outside Firewall Outside Host B Host C Host A Features: Faithful to local configuration Travels with you Features: Protect whole network Can make decisions on all of traffic (traffic-based anomaly)
Recall: Protocol Stack 9 Application (e.g., SSL) Transport ( e.g., TCP , UDP) Network (e.g., IP ) Link Layer (e.g., ethernet ) Physical Application message - data TCP data TCP data TCP data TCP Header data TCP IP data TCP IP ETH ETH Link (Ethernet) Header Link (Ethernet) Trailer IP Header
Stateless Firewall Filter by packet header fields IP Field (e.g., src , dst ) Protocol (e.g., TCP, UDP, ...) Flags (e.g., SYN, ACK) Application Transport Network Link Layer Firewall Outside Inside Example : only allow incoming DNS packets to nameserver A.A.A.A. Allow UDP port 53 to A.A.A.A Deny UDP port 53 all Fail-safe good practice e.g., ipchains in Linux 2.2 10
Need to keep state 11 Inside Outside Listening Store SN c , SN s Wait SN C rand C AN C Syn SYN/ACK : SN S rand S AN S SN C Established ACK : SN SN C +1 AN SN S Example: TCP Handshake Firewall Desired Policy: Every SYN/ACK must have been preceded by a SYN
Stateful Inspection Firewall Added state (plus obligation to manage) Timeouts Size of table State Application Transport Network Link Layer Outside Inside e.g., iptables in Linux 2.4 12
Stateful More Expressive 13 Inside Outside Listening Store SN c , SN s Wait SN C rand C AN C Syn SYN/ACK : SN S rand S AN S SN C Established ACK : SN SN C +1 AN SN S Example: TCP Handshake Firewall Record SN c in table Verify AN s in table
State Holding Attack 14 Firewall Attacker Inside Syn Syn Syn ... 1. Syn Flood 2. Exhaust Resources 3. Sneak Packet Assume stateful TCP policy
Fragmentation 15 Octet 1 Octet 2 Octet 3 Octet 4 Ver IHL TOS Total Length ID DF MF Frag ID ... Data Frag 1 Frag 2 Frag 3 IP Hdr DF=0 MF=1 ID=0 Frag 1 IP Hdr DF=0 MF=1 ID=n Frag 2 IP Hdr DF=1 MF=0 ID=2n Frag 3 say n bytes DF : Don’t fragment (0 = May, 1 = Don’t) MF: More fragments (0 = Last, 1 = More) Frag ID = Octet number
Reassembly 16 Data Frag 1 Frag 2 Frag 3 IP Hdr DF=0 MF=1 ID=0 Frag 1 IP Hdr DF=0 MF=1 ID=n Frag 2 IP Hdr DF=1 MF=0 ID=2n Frag 3 Frag 1 Frag 2 Frag 3 Byte n Byte 2n
Example 2,366 byte packet enters a Ethernet network with a default MTU size of 1500 Packet 1: 1500 bytes 20 bytes for IP header 24 Bytes for TCP header 1456 bytes will be data DF = 0 (May fragment), and MF=1 (More fragments) Fragment offset = 0 Packet 2: 910 bytes 20 bytes for IP header 24 bytes for the TCP header 866 bytes will be data DF = 0 (may fragment), MF = 0 (Last fragment) Fragment offset = 182 (1456 bytes/8) 17
Anomaly Detection 30 Distribution of “normal” events IDS New Event Attack Safe
Example: Working Sets 31 Alice Days 1 to 300 reddit xkcd slashdot fark working set of hosts Alice Day 300 outside working set reddit xkcd slashdot fark 18487
Anomaly Detection Pros Does not require pre-determining policy (an “unknown” threat) Cons Requires attacks are not strongly related to known traffic Learning distributions is hard 32
Automatically Inferring the Evolution of Malicious Activity on the Internet David Brumley Carnegie Mellon University Shobha Venkataraman AT&T Research Oliver Spatscheck AT&T Research Subhabrata Sen A T&T Research
<ip 1 ,+> <ip 2 ,+> <ip 3 ,+> <ip 4 ,-> 34 Tier 1 E K A ... Spam Haven Labeled IP’s from spam assassin, IDS logs, etc. Evil is constantly on the move Goal: Characterize regions changing from bad to good ( Δ -good) or good to bad ( Δ -bad)
Research Questions Given a sequence of labeled IP’s Can we identify the specific regions on the Internet that have changed in malice? Are there regions on the Internet that change their malicious activity more frequently than others? 35
36 Spam Haven Tier 1 Tier 1 Tier 2 D X Tier 2 B C K Per-IP often not interesting A ... DSL CORP A X Challenges Infer the right granularity E Previous work: Fixed granularity Per-IP Granularity (e.g., Spamcop )
37 Spam Haven Tier 1 Tier 1 Tier 2 D E Tier 2 B C W A ... DSL CORP A BGP granularity (e.g., Network-Aware clusters [ KW’00]) Challenges Infer the right granularity X X Previous work: Fixed granularity
B C 38 Spam Haven Tier 1 Tier 1 Tier 2 D X Tier 2 B C K Coarse granularity A ... DSL CORP A K Challenges Infer the right granularity E E CORP Idea: Infer granularity Medium granularity Well-managed network: fine granularity
39 Spam Haven Tier 1 Tier 1 Tier 2 D E Tier 2 B C W A ... DSL SMTP Challenges Infer the right granularity We need online algorithms A fixed-memory device high-speed link X
Research Questions Given a sequence of labeled IP’s Can we identify the specific regions on the Internet that have changed in malice? Are there regions on the Internet that change their malicious activity more frequently than others? 40 Δ -Change Δ -Motion We Present
Background IP Prefix trees TrackIPTree Algorithm 41
42 Spam Haven Tier 1 Tier 1 Tier 2 D E Tier 2 B C W A ... DSL CORP A X 1.2.3.4/32 8.1.0.0/16 IP Prefixes: i /d denotes all IP addresses i covered by first d bits Ex: 8. 1 .0.0-8. 1 .255.255 Ex: 1 host (all bits)
43 One Host Whole Net 0.0.0.0/0 0.0.0.0/1 128.0.0.0/1 128.0.0.0/2 192.0.0.0/2 0.0.0.0/2 64.0.0.0/2 An IP prefix tree is formed by masking each bit of an IP address. 0.0.0.0/32 0.0.0.1/32 0.0.0.0/31 128.0.0.0/3 160.0.0.0/3 128.0.0.0/4 144.0.0.0/4
44 0.0.0.0/0 0.0.0.0/1 128.0.0.0/1 0.0.0.0/2 64.0.0.0/2 128.0.0.0/2 192.0.0.0/2 A k- IPTree Classifier [VBSSS’09] is an IP tree with at most k-leaves, each leaf labeled with good (“+”) or bad (“-”). 128.0.0.0/3 160.0.0.0/3 128.0.0.0/4 152.0.0.0/4 0.0.0.0/32 0.0.0.1/32 0.0.0.0/31 6-IPTree + - + - + + Ex: 64.1.1.1 is bad Ex: 1.1.1.1 is good
Δ-Change Algorithm Approach What doesn’t work Intuition Our algorithm 46
47 Goal: identify online the specific regions on the Internet that have changed in malice. /0 /1 /16 /17 /18 T 1 for epoch 1 + - + /0 /1 /16 /17 /18 T 2 for epoch 2 + + - Δ -Bad: A change from good to bad Δ -Good: A change from bad to good Δ -Good: A change from bad to good Epoch 1 IP stream s 1 Epoch 2 IP stream s 2 ....
48 Goal: identify online the specific regions on the Internet that have changed in malice. /0 /1 /16 /17 /18 T 1 for epoch 1 + - + /0 /1 /16 /17 /18 T 2 for epoch 2 + + - False positive: Misreporting that a change occurred False Negative: Missing a real change
49 Goal: identify online the specific regions on the Internet that have changed in malice. Idea: divide time into epochs and diff Use TrackIPTree on labeled IP stream s 1 to learn T 1 Use TrackIPTree on labeled IP stream s 2 to learn T 2 Diff T 1 and T 2 to find Δ -Good and Δ -Bad /0 /1 /16 /17 /18 T 1 for epoch 1 + - - /0 /1 /16 T 2 for epoch 2 - Different Granularities! ✗
50 Goal: identify online the specific regions on the Internet that have changed in malice. Δ -Change Algorithm Main Idea: Use classification errors between T i -1 and T i to infer Δ -Good and Δ -Bad
51 T i-2 T i-1 T i TrackIPTree TrackIPTree S i-1 S i Fixed Ann. with class. error S i-1 T old,i-1 Ann. with class. error S i T old,i compare (weighted) classification error (note both based on same tree) Δ -Good and Δ -Bad Δ -Change Algorithm
Evaluation What are the performance characteristics? Are we better than previous work? Do we find cool things? 56
57 Performance In our experiments, we : let k=100,000 (k- IPTree size) processed 30-35 million IPs (one day’s traffic) using a 2.4 Ghz Processor Identified Δ -Good and Δ -Bad in <22 min using <3MB memory
58 2.5x as many changes on average! How do we compare to network-aware clusters? (By Prefix)
Spam 59 Grum botnet takedown
60 Botnets 22.1 and 28.6 thousand new DNSChanger bots appeared 38.6 thousand new Conficker and Sality bots
Caveats and Future Work “For any distribution on which an ML algorithm works well, there is another on which is works poorly.” – The “No Free Lunch” Theorem 61 Our algorithm is efficient and works well in practice. ....but a very powerful adversary could fool it into having many false negatives. A formal characterization is future work. !
62 Let Ω be the set of all possible events. For example: Audit records produced on a host Network packets seen Ω
63 Ω I Set of intrusion events I Intrusion Rate: Example : IDS Received 1,000,000 packets. 20 of them corresponded to an intrusion. The intrusion rate Pr [I] is: Pr [I] = 20/1,000,000 = .00002
64 Ω I A Set of alerts A Alert Rate: Defn : Sound
65 Ω I A Defn : Complete
66 Ω I A Defn : False Positive Defn : False Negative Defn : True Positive Defn : True Negative
67 Ω I A Defn : Detection rate Think of the detection rate as the set of intrusions raising an alert normalized by the set of all intrusions .
68 Ω I A 18 4 2
69 Ω I A Think of the Bayesian detection rate as the set of intrusions raising an alert normalized by the set of all alerts . (vs. detection rate which normalizes on intrusions.) Defn : Bayesian Detection rate Crux of IDS usefulness !
70 Ω I A 2 4 18 About 18% of all alerts are false positives!
Challenge We’re often given the detection rate and know the intrusion rate, and want to calculate the Bayesian detection rate 99% accurate medical test 99% accurate IDS 99% accurate test for deception ... 71
Fact: 72 Proof:
Calculating Bayesian Detection Rate Fact: So to calculate the Bayesian detection rate: One way is to compute: 73
Example 1,000 people in the city 1 is a terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000 Suppose we have a new terrorist facial recognition system that is 99% accurate. 99/100 times when someone is a terrorist there is an alarm For every 100 good guys, the alarm only goes off once. An alarm went off. Is the suspect really a terrorist? 74 City (this times 10)
Example Answer: The facial recognition system is 99% accurate . That means there is only a 1% chance the guy is not the terrorist. 75 (this times 10) City Wrong!
Formalization 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001 99/100 times when someone is a terrorist there is an alarm. P[A|T] = .99 For every 100 good guys, the alarm only goes off once. P[A | not T] = .01 Want to know P[T|A] 76 City (this times 10)
1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001 99/100 times when someone is a terrorist there is an alarm. P[A|T] = .99 For every 100 good guys, the alarm only goes off once. P[A | not T] = .01 Want to know P[T|A] 77 City (this times 10) Intuition : Given 999 good guys, we have 999*.01 ≈ 9-10 false alarms False alarms
78 Unknown Unknown
Recall to get Pr [A] Fact: 79 Proof:
..and to get Pr [ A∩ I ] Fact: 80 Proof:
81 ✓ ✓
82
Visualization: ROC (Receiver Operating Characteristics Curve) Plot true positive vs. false positive for a binary classifier at various threshold settings 83
For IDS Let I be an intrusion, A an alert from the IDS 1,000,000 msgs per day processed 2 attacks per day 10 attacks per message 84 False positives False positives True positives 70% detection requires FP < 1/100,000 80% detection generates 40% FP From Axelsson , RAID 99
Why is anomaly detection hard Think in terms of ROC curves and the Base Rate fallacy. Are real things rare? If so, hard to learn Are real things common? If so, probably ok. 85
Conclusion Firewalls 3 types: Packet filtering, Stateful , and Application Placement and DMZ IDS Anomaly vs. policy-based detection Detection theory Base rate fallacy 86