Ramanujan College , University of Delhi New Delhi-110019, India. Topic: Machine Learning
What is Machine Learning? “Learning is any process by which a system improves performance from experience.” - Herbert Simon Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that improve their performance P at some task T with experience E . A well- defined learning task is given by < P , T , E >. 3
Traditional Programming Machine Learning Computer Data Program Output Computer Data Output Program Slide credit: Pedro Domingos 4
4 Machine Learning Traditional Programming Program/Rules Data Output/ Answer Machine Learning Output/Answers Data Program/Rules/model Makes sense. Use Machine Learning Sort these numbers in decreasing order 2, 4, 18, 1, 77, 0, 85 Does not make sense. Do not use Machine Learning (what are the risks if you do?) Rock paper scissors
Introduction distinguish two approaches knowledge-based: a computer program whose logic encodes a large number of properties of the world, usually developed by a team of experts over many years. machine learning: extract information directly from historical data and extrapolate to make predictions
When Do We Use Machine Learning? ML is used when: Human expertise does not exist (navigating on Mars) Humans can’t explain their expertise (speech recognition) Models must be customized (personalized medicine) Models are based on huge amounts of data (genomics) Learning isn’t always useful: There is no need to “learn” to calculate payroll Based on slide by E. Alpaydin 5
7 Slide credit: Geoffrey Hinton Some more examples of tasks that are best solved by using a learning algorithm Recognizing patterns: Facial identities or facial expressions Handwritten or spoken words Medical images Generating patterns: Generating images or motion sequences Recognizing anomalies: Unusual credit card transactions Unusual patterns of sensor readings in a nuclear power plant Prediction: Future stock prices or currency exchange rates
8 Slide credit: Pedro Domingos Sample Applications Web search Computational biology Finance E- commerce Space exploration Robotics Information extraction Social networks Debugging software [Your favorite area]
Samuel’s Checkers- Player “Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.” - Arthur Samuel (1959) 9
Defining the Learning Task Improve on task T, with respect to performance metric P, based on experience E T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand- written words P: Percentage of words correctly classified E: Database of human- labeled images of handwritten words T: Driving on four- lane highways using vision sensors P: Average distance traveled before a human- judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. E: Database of emails, some with human- given labels Slide credit: Ray Mooney 10
Types of Learning 23
Types of Learning Supervised (inductive) learning Given: training data + desired outputs (labels) Unsupervised learning Given: training data (without desired outputs) Semi- supervised learning Given: training data + a few desired outputs Reinforcement learning Rewards from sequence of actions Based on slide by Pedro Domingos 24
23 Supervised Machine Learning Given data like this, how can we learn to predict the prices of other houses? as a function of the size of their living areas? House price = F (living area) What is an instance space? To make our housing example more interesting, let’s consider a slightly richer dataset in which we also know the number of bedrooms in each house: House price = F (living area, # bedrooms) What is an instance space?
Output An item drawn from an output space Input An item drawn from an input space System 14 Supervised Learning We consider systems that apply a function to input items x and return an output . Dan Roth +/-
15 Supervised Learning In (supervised) machine learning, we deal with systems whose is learned from examples. Output An item drawn from an output space Input An item drawn from an input space System
16 Why use learning? We typically use machine learning when the function we want the system to apply is unknown to us, and we cannot “think” about it. The function could actually be simple.
Output An item drawn from a label space Input An item drawn from an instance space Learned Model 17 Supervised Learning Target function The space of all functions our algorithm “considers” is called the Hypothesis space.
18 Supervised learning: Training Give the learner examples in The learner returns a model Labeled Training Data … Learned model Learning Algorithm is the model we’ll use in our application ( Dan Roth, +) If (the…character of the …token is..) AND (the …. i s…. ) then Negative. Otherwise, Positive . An input example An element in the instance Space
19 Supervised learning: Testing Reserve some labeled data for testing Labeled Test Data …
20 Supervised learning: Testing Labeled Test Data Test Labels ... Raw Test Data
Apply the model to the raw test data Evaluate by comparing predicted labels against the test labels Test Labels Raw Test Data 21 Supervised learning: Testing Learned model Predicted Labels
Data In, Model Out D A T A 𝒟 MACHINE LEARNING M O D EL 𝑓 ⋅ acceleration 𝑎 = Force 𝐹 m ass 𝑚 We would like to recover a model like this!
Data In, Model Out MACHINE LEARNING M O D EL 𝑓 ⋅ acceleration 𝑎 = Force 𝐹 mass 𝑚 ML Design (hypothesis class, loss function, optimizer, hyperparameters, features, …) 𝑚 𝑎 = 𝑤 + 𝑤 1 𝐹 + 𝑤 2 𝑚 + 𝑤 3 𝐹 ∗ 𝑚 + 𝑤 4 ( 𝐹 ) Example hypothesis class: (for varying values of 𝑤 , … 𝑤 4 ) 𝑚 𝑎 = + 𝐹 + 𝑚 + 𝐹 ∗ 𝑚 + 1 ( 𝐹 ) L e ar n i ng = f i nd i ng “ g oo d” v a l u e s f o r t he weig h t s 𝑤 , 𝑤 1 , … , 𝑤 # D A T A 𝒟
24 Key Issues in Machine Learning Modeling How to formulate application problems as machine learning problems ? How to represent the data? Learning Protocols (where is the data & labels coming from?) Representation What functions should we learn (hypothesis spaces) ? How to map raw input to an instance space? Any rigorous way to find these? Any general approach? Algorithms What are good algorithms? How do we define success? Generalization vs. over fitting The computational problem
25 Using supervised learning What is our instance space? Gloss: What kind of features are we using? What is our label space? Gloss: What kind of learning task are we dealing with? What is our hypothesis space? Gloss: What kind of functions (models) are we learning? What learning algorithm do we use? Gloss: How do we learn the model from the labeled data? What is our loss function/evaluation metric? Gloss: How do we measure success? What drives learning?
Output An item drawn from a label space Input An item drawn from an instance space X Learned Model 26 1. The instance space Designing an appropriate instance space is crucial for how well we can predict .
27 1. The instance space When we apply machine learning to a task, we first need to define the instance space . Instances are defined by features: Boolean features: Is there a folder named after the sender? Does this email contain the word ‘class’? Does this email contain the word ‘waiting’? Does this email contain the word ‘class’ and the word ‘waiting’? Numerical features: How often does ‘learning’ occur in this email? How long is the email? How many emails have I seen from this sender over the last day/week/month? Bag of tokens Just list all the tokens in the input Does it add anything if you already have the previous two features?
28 as a vector space is an N-dimensional vector space (e.g. {0,1} N , ) Each dimension = one feature. Each is a feature vector (hence the boldface ). Think of = [ … ] as a point in : Patient Age Clump thickness Tumor Color Distance from optic nerve Cell type
29 Good features are essential The choice of features is crucial for how well a task can be learned. In many application areas (language, vision, etc.), a lot of work goes into designing suitable features. This requires domain expertise. We can’t teach you what specific features to use for your task. But we will touch on some general principles
Output An item drawn from a label space Input An item drawn from an instance space Learned Model 30 2. The label space The label space determines what kind of supervised learning task we are dealing with
31 Supervised learning tasks I Output labels are categorical: Binary classification: Two possible labels Multiclass classification: possible labels Output labels are structured objects (sequences of labels, parse trees, graphs, etc.) Structure learning: multiple labels that are related (thus constrained) Three events. When classifying the temporal relations between them we need to account for the relations between them.
32 Supervised learning tasks II Output labels are numerical: Regression (linear/polynomial): Labels are continuous-valued Learn a linear/polynomial function Ranking: Labels are ordinal Learn an ordering over input
23 Supervised Learning
Output An item drawn from a label space Input An item drawn from an instance space Learned Model 34 3. The model We need to choose what kind of model we want to learn
Types of Learning Supervised learning Input: Examples of inputs and desired outputs Output: Model that predicts output given a new input Unsupervised learning Input: Examples of some data (no “outputs”) Output: Representation of structure in the data Reinforcement learning Input: Sequence of agent interactions with an environment Output: Policy that maps agent’s observations to actions
23 Supervised Learning
23 Supervised Learning
38 Unsupervised learning: ?! In unsupervised learning the training data is unlabeled. The system tries to learn without a teacher. Unsupervised methods must rely solely on the intrinsic structure of data points to learn a good hypothesis. Thus, unsupervised methods do not need a teacher or domain expert who provides labels for data points. Two large families of unsupervised methods Clustering Feature Learning Two important applications of feature learning are dimensionality reduction and data visualization. Notation: In the clustering problem, we are given a training set, {x (1) , x (2) , ……. X (n) }, and want to group the data into a few cohesive “clusters.” but no labels y ( i ) are given. Unsupervised Learning
Unsupervised Learning Given x 1 , x 2 , ..., x n (without labels) The goal is to find groups of similar observations. Output hidden structure behind the x ’s – E.g., clustering 31
23 Unsupervised Learning The goal is to find groups of similar observations (Clustering).
[Source: Daphne Koller] Genes Individuals Unsupervised Learning Genomics application: group individuals by genetic similarity 32
Organize computing clusters Social network analysis Image credit: NASA/JPL- Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation Slide credit: Andrew Ng Unsupervised Learning 33
Unsupervised Learning Independent component analysis – separate a combined signal into its original sources 34 Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning Independent component analysis – separate a combined signal into its original sources 35 Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Here are some of the most important Unsupervised learning algorithms . Clustering K-means DBSCAN Hierarchical Cluster Analysis (HCA) Visualization and dimensionality reduction Principal Component Analysis (PCA) Kernel PCA Association rule learning Apriori Eclat Anomaly detection and novelty detection One-class SVM Isolation Forest Unsupervised Learning
In supervised learning, we saw algorithms that tried to make their outputs mimic the labels y given in the training set. In that setting, the labels gave an unambiguous “right answer” for each of the inputs x. In contrast, for many sequential decision-making and control problems, it is very difficult to provide this type of explicit supervision to a learning algorithm. Example: if we have just built a four-legged robot and are trying to program it to walk, then initially, we have no idea what the “correct” actions to take are to make it walk, and so we do not know how to provide explicit supervision for a learning algorithm to try to mimic. In the reinforcement learning framework, The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time A policy defines what action the agent should choose when it is in a given situation. Reinforcement Learning
Reinforcement Learning
Reinforcement Learning Examples: Credit assignment problem Game playing Robot in a maze Balance a pole on your hand 36
Machine learning is Programming 2.0 Traditional Programming Machine learning (ML) Revision:
Task specification in ML: programs → examples def compute_force(m, a): ‘’’ returns force (in N) needed to move mass m (in kg) at acceleration a (in m/s^2) ‘’’ F = m * a return F Mass m (kg) Acceleration a (m/s^2) Force F (N) 2.5 4 10 5 2 10 20 0.5 10 40 0.25 10 40 2.5 100 20 5 100 50 2 100 Here is a program to implement Newton’s second law of motion Here are some examples. Try to imitate them.
Task specification in ML: programs → examples def cow_or_turtle(image): ??? Here is a program to recognize an image as a cow or a turtle Here are some examples. Try to imitate them. “ c o w s ” “turtles”
Putting a trained ML system to use Here are some examples. Try to imitate them. “ c o w s ” “turtles” “ c o w ”
Putting a trained ML system to use Here are some examples. Try to imitate them. “ c o w s ” “turtles” “ t u rt le ”
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class P r o j e ct
F r am in g an ML p r o ble m ( Mi t chell ’ s P , T , D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class P r o j e ct
23 Supervised Learning:
ML for Social Good Applying AI for social good | McKinsey Elizabeth Bondi et al., SPOT poachers in action: Augmenting conservation drones with automatic detection in near real time, AAAI 2018
40 Framing a Learning Problem
Designing a Learning System Choose the training experience Choose exactly what is to be learned – i.e. the target function Choose how to represent the target function Choose a learning algorithm to infer the target function from the experience Environment/ Experience Learner Knowledge Performance Element Based on slide by Ray Mooney Training data Testing data 41
Training vs. Test Distribution We generally assume that the training and test examples are independently drawn from the same overall distribution of data – We call this “i.i.d” which stands for “independent and identically distributed” If examples are not independent, requires collective classification If test distribution is different, requires transfer learning Slide credit: Ray Mooney 42
ML in a Nutshell Tens of thousands of machine learning algorithms Hundreds new every year Every ML algorithm has three components: Representation Optimization Evaluation Slide credit: Pedro Domingos 43
44 Slide credit: Ray Mooney Various Function Representations Numerical functions Linear regression Neural networks Support vector machines Symbolic functions Decision trees Rules in propositional logic Rules in first- order predicate logic Instance- based functions Nearest- neighbor Case- based Probabilistic Graphical Models Naïve Bayes Bayesian networks Hidden- Markov Models (HMMs) Probabilistic Context Free Grammars (PCFGs) Markov networks
45 Slide credit: Ray Mooney Various Search/Optimization Algorithms Gradient descent Perceptron Backpropagation Dynamic Programming HMM Learning PCFG Learning Divide and Conquer Decision tree induction Rule learning Evolutionary Computation Genetic Algorithms (GAs) Genetic Programming (GP) Neuro- evolution
47 Slide credit: Pedro Domingos Evaluation Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K- L divergence etc.