M01-L02 - Introduction to Machine Learning for Cybersecurity.pptx
MinhaoCheng2
0 views
26 slides
Oct 07, 2025
Slide 1 of 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
About This Presentation
Intro for Cybersecurity Analytics Studio Course
Size: 756.02 KB
Language: en
Added: Oct 07, 2025
Slides: 26 pages
Slide Content
CYBER 362 Cybersecurity Analytics Studio M01– L02: Introduction to Machine Learning for Cybersecurity 1
Learning Objectives What is Machine Learning Types of Machine Learning Systems Applications of ML Readings: Review Chio , C. and Freeman, D. Chapter 1: pages 9 -14, and other resources in Canvas. 2
Recap from last week … Challenges in Protection Evolving threats Budgets Diversity of security tools Scarcity of cybersecurity experts Collaboration and inf. Sharing Regulatory framework – policies .. Poor governance Tools for automation Social-technical issues Solutions Accelerated training of experts Smarter tools leveraging AI/ML More funding Better governance Policy framework and updated policies More coordination and collaboration 3
What is machine learning? Some definitions of ML: Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. Arthur Samuel, 1959. A computer program is said to learn from experience E with respect to some task T and some performance measure P , if its performance on T , as measured by P , improves with experience E . Tom Mitchell, 1997. 4
What is machine learning? Example: A spam filter is a ML program that learns to flag spam after being given examples of spam emails and examples of non-spam emails (ham). Examples of each category can be flagged by users and are used to train the filter, called the training data set - the experience E is this training data Task T is to flag spam for new emails (not seen before) P is the ratio of correctly classified emails 5
What is machine learning? Spam filter system design approaches Traditional approach Complex rules and difficult to maintain – look for sender’s name and words in subject line, and certain words in body of email: ‘free’, ‘amazing’, ‘credit card’ etc. If –then-else type code, and case T-> a, etc. ML approach Automatically learns words and phrases that are good predictors of spam by looking for unusually frequent word patterns that occur in spam compared with ham Shorter, easier to maintain and probably more accurate 6
What is machine learning? ML is suitable for Problems that are too complex to solve using the traditional approaches or no good solution is known Dynamic environments: ML system can adapt to new data Getting insights on complex problems and large amounts of data Example: Speech recognition – large corpuses of words spoken by millions of people in noisy environments and dozens of languages 7
The Big picture Artificial Intelligence is the science of making things smart or, human tasks performed by machines (e.g., image recognition, speech recognition (NLP), etc.). Machine learning (ML) consists of algorithms and processes that learn from past data and experiences and then predict future outcomes Information mining Pattern discovery (data mining) Drawing inferences from data 8
The Big picture 9
The Big picture ML can reduce the efforts and/or time spent for both simple and difficult tasks like spam filtering, stock price prediction, etc. ML system learns constantly, makes decisions based on data rather than algorithms, and changes its behavior Deep Learning (DL) is a set of techniques for implementing machine learning that recognize patterns of patterns like image recognition. Multilayered Artificial Neural Networks (ANN) DL is strict subset of ML 10
Types of ML Systems Supervised Learning – training data fed to algorithm includes desired solutions called labels or targets. Typical supervised learning tasks Classification : categorize data based on a label – categorize mail as spam or ham Regression : predict a target numeric value such car price, stock price, etc. Some of the most important supervised learning algorithms Linear and Logistic Regression K-Nearest Neighbors Support Vector Machines Decision Trees and Random Forests Artificial Neural Networks (ANN) 11
Types of ML Systems Unsupervised Learning – the training data is unlabeled and the system tries to learn on its own in order to group similar instances together Some of the most important unsupervised learning algorithms Clustering K-Means DBSCAN (Density-Based Spatial Clustering of Applications with Noise) Mean-shift Etc. 12
Types of ML Systems Unsupervised Learning - example 13 An unlabeled training set for unsupervised learning Clustering - four clusters
Types of ML Systems Semi-supervised Learning – ML system is fed with a bit of partially labeled training data and a lot of unlabeled data and system learns Labeling data is costly and time-consuming, therefore common to have plenty of unlabeled instances but few label instances Semi-supervised can be used to propagate labels from labeled instances to similar unlabeled instances for example via clustering Example 14
Types of ML Systems Batch and Online learning – Batch learning – system is trained using all available data offline and cannot learn after it is put into production Takes lots of time and computing resources To update, must train a new version of system from scratch and then replace production system with new version at some point System cannot learn incrementally Process of training, evaluating and launching system can be automated Training every 24 hours or weekly 15
Types of ML Systems Batch and Online learning – Online learning – train the system incrementally by feeding it data instances sequentially either individually or in small groups called mini-batches Fast and lighter on resources, and system can learn from new data as it arrives Excellent for system that receive data as a continuous flow and need to adapt to changes rapidly such as Stock prices and security data Monitor system closely in case system is fed with bad data and performance deteriorates 16
Types of ML Systems Instance-Based Versus Model-Based Learning – Categorizes ML system based on how they generalize: moving from training to prediction Instance-based Learning – uses a similarity measure to compare new instances and instances that were seen during the training phase Example for spam filter similarity measure >=80% classify as spam or ham; and if similarity measure <80% classify otherwise 17
Types of ML Systems Instance-Based Versus Model-Based Learning – Model-Based Learning – build a model and use the model to make predictions Most common technique Need to specify a performance measure for your model Utility or fitness function that measures how good the model is Want to find model parameters that maximize the fitness function Cost function or loss function that measures how bad the model is Want to find model parameters that minimize the cost function (see Chio and Freeman page 35) Most common approach 18
Adversaries Using ML Machine learning will never be a silver bullet for cybersecurity Hackers are able to use machine learning to carry out all their nefarious endeavors Adversaries can also use ML to avoid detection and evade defenses Example: Spammers can probe spam filters by performing A/B 1 tests on email content A/B testing (also known as split testing or bucket testing) is a method of comparing two versions of a webpage or app against each other to determine which one performs better. 19
Adversaries Using ML Defenders and attackers use ML in fuzzing campaigns to speed up the process of finding vulnerabilities [4] Adversaries can fool you by learning about your personality and interest on social media, and then crafting a perfect phishing message for you Be careful what you post on social media Caveat: ML algorithms are not built with security in mind and they can be attacked Therefore important to maintain an awareness of such threat models when designing and building machine learning systems for security purposes 20
Applications Areas of ML Financial fraud detection and prevention Recommendation systems (sales and marketing) Virtual Personal assistants: Siri, Alexa, etc. Forecasting: Stock prices, house prices, etc. Search engine results Social media services Image recognition and classification, e.g. products on a production line Market segmentation Text classifications and translations – this is NLP Self-driving cars and automated transportation Cybersecurity 21
ML Applications in Cybersecurity Intrusion Detection Most studied application of Ai/ML in security Refers to any form of attack that may compromise network or host: probing, phishing DoS/DDoS, etc. ML classification problem Anomaly detection (AD) Establish a notion of ‘normality’ that describes most of the data (say, > 90%) based on a set of features - anything else is an outlier (abnormal) ML for AD has received significant attention due to the autonomy and robustness it offers in learning and adapting to profiles of normality as they change over time Can be used for intrusion detection 22
ML Applications in Cybersecurity Pattern recognition Relies on discovering specific patterns found in data AD can be used for detecting patterns in Network traffic flows – flow feature-based AD (SPI) Network traffic payloads in packets – payload-based AD (DPI) Examples Botnet and malware detection: botnet traffic analysis, classifying malware Intrusion detection systems: configuration is time consuming: use AI to reduce false alerts Spam detection and filtering Access Control - set of policies governing the ability of system users to access resources: data, software and hardware User and Entity behavior analytics – detect behaviors that do not represent human actions and compromised user account exploitation through suspicious user account activity. ML classification and clustering 23
Benefits of AI/ML in Security AI/ML lowers the cost of detecting and responding to breaches By understanding and reusing threat patterns and CTI to identify threats By the order of 10% AI/ML makes organizations respond faster to breaches and it has been reported that Overall time to detect threats and breaches reduced by up to 12% Reduces time to remediate a breach or implement patches by 12% AI/ML results in higher efficiency for cyber analysts because it improves detection accuracy Analysts spend lots of time ploughing through logs incident time sheets 24
Assignment 1 Pick one application area of ML in the real world from the previous slide or any other not included Research a company or organization that is using ML in the area that you have selected. Explain how ML is being applied in the application area you have selected by that organization. Based on you research, discuss the level of success that has been achieved in deploying ML in this area Identify the main challenges that organizations face in the application of ML in the area you have selected. Due at the end of Week 2. 25
References https://www.experian.com/blogs/ask-experian/cybercrime-the-1-5-trillion-problem/ https://www.talkingdrugs.org/report-global-illegal-drug-trade-valued-at-around-half-a-trillion-dollars https://www.talosintelligence.com/reputation_center/email_rep http://www.vdiscover.org/OS-fuzzing.html http://whatis.techtarget.com/definition/machine-learning Chio , C. and Freeman, D. Machine Learning and Security: Protecting Systems with Data and Algorithms, O’Reilly, 1st Edition, 2018. Review chapter 1. 26