Threat Hunting with Data Science

AustinTaylor8 5,718 views 46 slides Jun 24, 2017
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

After anomalous network traffic has been identified there can still be an abundance of results for an analyst to process. This presentation is for data scientist and network security professionals who want to increase the signal-to-noise.

Flare is a network analytic framework designed for data scie...


Slide Content

Hunting with Data Science
Increasing the Signal-to-Noise Ratio
www.austintaylor.io
@HuntOperator
June 23, 2017

Who Am I?
@HuntOperator

Austin Taylor
Security Researcher @
IronNet Cybersecurity
Cyber Warfare Operator
@ USAF (MDANG)
www.austintaylor.io
@HuntOperator

Semantics
@HuntOperator

Threat Hunting
Cyber threat hunting is "the process of proactively
and iteratively searching through networks to
detect and isolate advanced threats that evade
existing security solutions."
@HuntOperator

Data Science
Data science is an interdisciplinary field
about scientific methods, processes, and systems
to extract knowledge or insights from data in
various forms, either structured or unstructured,
similar to data mining.
@HuntOperator

Threat Hunting
Data Science
——————————
+
Hunting with Data Science
@HuntOperator

Data Science Hunting Funnel
Produced Naturally
Machine Learning
Domain Knowledge
Potential Bad
Normal
Anomalous
Interesting
Bad
011010000111010101101110011101000110111101110000

011001010111001001100001011101000110111101110010 Network Traffic
@HuntOperator
.001%
3-5%
10%
100%

Cyber Kill Chain
@HuntOperator

Beaconing
DGA
@HuntOperator
You are here
Cyber Kill Chain

Beaconing
@HuntOperator

Beaconing
•Post-Infection
•Early network-related indication of infection
•Used by malware to “phone home” to
command and control server
@HuntOperator

Detection Challenges
•Hardset Intervals
•Varying window sizes
•Legit Services
•Windows update
•Virus definition updates
@HuntOperator

Beaconing Detection
@HuntOperator

Beaconing: Detection
@HuntOperator
https://github.com/austin-taylor/flare
•Free Open Source Software
•Designed for data scientists, 

security researchers
•Written in Python
•Used for rapid prototyping and development of behavioral
analytics
•Intended to make identifying malicious behavior in networks
as simple as possible.

Beaconing: Detection
@HuntOperator
https://github.com/austin-taylor/flare
[beacon]
es_host=localhost # IP address of ES Host, which we forwarded to localhost
es_index=logstash-flow-* # ES index
es_port=9200 # Logstash port (we forwarded earlier)
es_timeout=480 # Timeout limit for elasticsearch retrieval
min_occur=50 # Minimum of 50 network occurrences to appear in traffic
min_interval=30 # Minimum interval of 30 seconds per beacon
min_percent=30 # Beacons must represent 30% of network traffic per dyad
window=3 # Accounts for jitter... For example, if 60 second beacons
# occurred at 58 seconds or 62 seconds, a window of 3 would
# factor in that traffic.
threads=8 # Use 8 threads to process (Should be configured)
period=24 # Retrieve all flows for the last 24 hours.
kibana_version=5 # Your Kibana version. Currently works with 4 and 5
verbose=True # Display output while running script

Beaconing: Data Science
•Identify Beaconing
•Time
•IP address
•Ports
•Protocol
@HuntOperator

Simple: src_ip, dest_ip, dest_port -> hash
More Complex: Discrete Fourier Transform (DFT)/
Fast Fourier transform (FFT)
Beaconing: Data Science
@HuntOperator

Scenario 1
•A piece of malware has infected a
computer (192.168.0.53) on your
network and is trying to reach back to
its Command and Control (C2) server
(160.153.76.129) in periodic intervals
@HuntOperator

HUNT!
@HuntOperator

Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html beacons.html
108 

events to process
@HuntOperator

Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound
beacons_filtered.html
31 

events to process
@HuntOperator

Beaconing: Hunt
flare_beacon -c configs/selks4.ini -html —group —whois —focus_outbound
beacons_filtered.html
–group: This will group the results making it visually easier to identify anomalies.
–whois: Enriches IP addresses with WHOIS information through ASN Lookups.
–focus_outbound: Filters out multicast, private and broadcast addresses from destination IPs
What was applied?
@HuntOperator

Beaconing: Hunt
• bytes_toserver: Total sum of bytes sent from IP address to Server
• dest_degree: Amount of source IP addresses that communicate to the same destination
• occurrences: Number of network occurrences between dyads identified as beaconing.
• percent: Percent of traffic between dyads considered beaconing.
• interval: Intervals between each beacon in seconds @HuntOperator

Beaconing: Hunt
Validate Results
@HuntOperator

Beaconing: Hunt
Drilling in
@HuntOperator

Beaconing: Hunt
@HuntOperator

Beaconing: Hunt
@HuntOperator

Beaconing: Hunt
@HuntOperator

Beaconing: Hunt
@HuntOperator

Beaconing: Hunt
@HuntOperator

Domain Generation
Algorithms (DGA)
@HuntOperator

Domain Generation
Algorithms (DGA)
•Deterministic value
•Generate large number of domain names
•Easy to burn
•Cheap to register
•Used as a rendezvous point by attacker
@HuntOperator
vtlfccmfxlkgifuf.com
Why DGA?

DGA Example
@HuntOperator

Source: Aditya K. Sood, Sherali Zeadally, "A Taxonomy of Domain-Generation Algorithms", IEEE Security & Privacy, vol. 14, no. , pp. 46-53, July-Aug. 2016, doi:10.1109/MSP.2016.76
@HuntOperator
We want to detect this

@HuntOperator

Scenario 2
•A piece of malware has infected a
computer on your network and is
making request to domains using DGA
in an attempt to communicate to a
Command and Control Server

HUNT!
@HuntOperator

DNS Records
Record Count: 15408
@HuntOperator

Import Flare Tools
•DGA Classifier
•Random Forrest Classifier
•N-Grams
•Uses labelled data
•Alexa - Top 1M most popular visited websites
•Must pay for service now.
•Umbrella/Majestic are free alternatives
•Domain TLD Extract - Extracts the Top Level Domain to be checked against Alexa
•Also calculate degree from here
@HuntOperator

Filter Results
Still too many results…
@HuntOperator

Filter Results
@HuntOperator
Still too many results…
And yet…

Filter Results
Down to 78
And finally…
@HuntOperator

Filter Results
Apply Alexa Check…
and…
57 Results!
@HuntOperator

Pass to Analyst
•Identify Process Generating Traffic
•Isolate infected host
•Begin endpoint investigation…
@HuntOperator

Thank you!
www.austintaylor.io
@HuntOperator
Questions?