Completely explained about knn algorithm I'm ml
Size: 1.34 MB
Language: en
Added: Apr 27, 2024
Slides: 44 pages
Slide Content
k-Nearest Neighbor & Instance-based Learning
Instance-based learning Instance-based learning is often termed lazy learning , as there is typically no “ transformation ” of training instances into more general “ statements ” Instance-based learning methods simply store the training examples . Generalizing beyond these examples is postponed until a new instance must be classified. Each time a new query instance is encountered, its relationship to the previously stored examples is examined in order to assign a target function value for the new instance. Instance-based methods are sometimes referred to as "lazy" learning methods because they delay processing until a new instance must be classified.
Instance-based learning Learning in these algorithms consists of simply storing the presented training data . When a new query instance is encountered , a set of similar related instances are retrieved from memory and used to classify the new query instance . One key difference between instance-based approach and other learning methods is that instance-based approaches can construct a different approximation to the target function for each distinct query instance that must be classified . This has significant advantages when the target function is very complex, but can still be described by a collection of less complex local approximations.
Instance-based learning One disadvantage of instance-based approaches is that the cost of classifying new instances can be high . This is due to the fact that nearly all computation takes place at classification time rather than when the training examples are first encountered.
k-Nearest Neighbor Pros : High accuracy, insensitive to outliers, no assumptions about data Cons : Computationally expensive, requires a lot of memory Works with : Numeric values, nominal values
k-NEAREST Neighbor LEARNING This algorithm assumes all instances correspond to points in the n-dimensional space R n . The nearest neighbors of an instance are defined in terms of the standard Euclidean distance. more precisely, let an arbitrary instance x be described by the feature vector where a r (x) denotes the value of the r th attribute of instance x. Then the distance between two instances x i and x j is defined to be d(x i , x j ) , where
Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher when objects are more alike. Often falls in the range [0,1]: Examples : Cosine, Jaccard, Tanimoto, Dissimilarity Numerical measure of how different two data objects are Lower when objects are more alike Minimum dissimilarity is often 0 Upper limit varies Proximity refers to a similarity or dissimilarity Src : “Introduction to Data Mining” by Vipin Kumar et al
Distance Metric Distance d (p, q) between two points p and q is a dissimilarity measure if it satisfies: 1. Positive definiteness: d (p, q) 0 for all p and q and d (p, q) = 0 only if p = q . 2. Symmetry: d (p, q) = d (q, p) for all p and q . 3. Triangle Inequality: d (p, r) d (p, q) + d (q, r) for all points p , q , and r . Examples: Euclidean distance Minkowski distance Mahalanobis distance Src : “Introduction to Data Mining” by Vipin Kumar et al
General approach to kNN Collect : Any method. Prepare : Numeric values are needed for a distance calculation. A structured data format is best. Analyze : Any method . Train : Does not apply to the kNN algorithm . Test : Calculate the error rate. Use : This application needs to get some input data and output structured numeric values. Next, the application runs the kNN algorithm on this input data and determines which class the input data should belong to. The application then takes some action on the calculated class.
k-NEAREST Neighbor LEARNING
Instance-Based Learning Idea : Similar examples have similar label. Classify new examples like similar training examples. Algorithm : Given some new example x for which we need to predict its class y Find most similar training examples Classify x “like” these most similar examples Questions : How to determine similarity? How many similar training examples to consider? How to resolve inconsistencies among the training examples?
Summary: k-NN The simplest machine learning algorithm. No explicit training . A non-parametric method: no parameter to be optimised. The algorithm relies on a distance measure. 12 Think: what could be the possible effect of the number of training samples used in k-NN?
k -NN Approach The simplest, most used instance-based learning algorithm is the k -NN algorithm k -NN assumes that all instances are points in some n -dimensional space and defines neighbors in terms of distance (usually Euclidean in R-space) k is the number of neighbors considered
1-Nearest Neighbor One of the simplest of all machine learning classifiers Simple idea: label a new point the same as the closest known point Label it red.
Distance Metrics Different metrics can change the decision surface Standard Euclidean distance metric: Two-dimensional: Dist(a,b) = sqrt((a 1 – b 1 ) 2 + (a 2 – b 2 ) 2 ) Multivariate: Dist(a,b) = sqrt(∑ (a i – b i ) 2 ) Dist( a,b ) =(a 1 – b 1 ) 2 + (a 2 – b 2 ) 2 Dist( a,b ) =(a 1 – b 1 ) 2 + (3a 2 – 3b 2 ) 2 Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
k – Nearest Neighbor Generalizes 1-NN to smooth away noise in the labels A new point is now assigned the most frequent label of its k nearest neighbors Label it red, when k = 3 Label it blue, when k = 7
Selecting the Number of Neighbors Increase k: Makes KNN less sensitive to noise Decrease k: Allows capturing finer structure of space Pick k not too large, but not too small (depends on data)
Neighbour Number k Hyper-parameter: neighbour number k. The process of determining the neighbour number k is called hyper-parameter selection or model selection. In binary classification, what can happen when k is set as an even number? We will talk more on hyper-parameter selection in the next lectures. 18
Neighbour Number k Hyper-parameter: neighbour number k. The process of determining the neighbour number k is called hyper-parameter selection or model selection. In binary classification, what can happen when k is set as an even number? We will talk more on hyper-parameter selection in the next lecture. 19 Think: what could be the possible effect of the neighbour number?
Effect of Training Samples Small number of training samples: insufficient information Large number of training samples: more information but, time and memory cost consuming (distance calculation, sorting) Noisy training samples: overfitting 20
Curse-of-Dimensionality Prediction accuracy can quickly degrade when number of attributes grows. Irrelevant attributes easily “swamp” information from relevant attributes When many irrelevant attributes, similarity/distance measure becomes less reliable Remedy Try to remove irrelevant attributes in pre-processing step Weight attributes differently Increase k (but not too much)
k NEAREST NEIGHBOR Requires 3 things: Feature Space(Training Data) Distance metric to compute distance between records The value of k the number of nearest neighbors to retrieve from which to get majority class To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record ?
K-Nearest Neighbor Features All instances correspond to points in an n-dimensional Euclidean space Classification is delayed till a new instance arrives Classification done by comparing feature vectors of the different points Target function may be discrete or real-valued
Summary: Neighbour Number k Hyper-parameter: neighbour number k. In binary classification, k needs to be an uneven number. Small k: we may model noise! Large k: neighbours will include too many samples from other classes. 24
Example Consider the following database defining a concept Ex Attribute Attribute Attribute Concept No. Size Colour Shape Satisfied 1 medium blue brick yes 2 small red wedge no 3 small red sphere yes 4 large red wedge no 5 large green pillar yes 6 large red pillar no 7 large green pillar yes
k NEAREST NEIGHBOR Requires 3 things: Feature Space(Training Data) Distance metric to compute distance between records The value of k the number of nearest neighbors to retrieve from which to get majority class To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record ?
K-Nearest Neighbor Features All instances correspond to points in an n-dimensional Euclidean space Classification is delayed till a new instance arrives Classification done by comparing feature vectors of the different points Target function may be discrete or real-valued
1-Nearest Neighbor
3-Nearest Neighbor
Example We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set with Default=Y.
We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set with Default=Y. With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The prediction for the unknown case is again Default=Y.
In Class Exercise
In Class Exercise
In Class Exercise Tissue Paper Quality Classification Test instance is 3,7
In Class Exercise KNN Solved Example to predict Sugar of Diabetic Patient given BMI and Age Test Example BMI=43.6, Age=40, Sugar=?
Hamming Distance Let's learn with the help of example; The restaurant A sells burger optional flavors; Pepper, Ginger and Chilly Using Hamming Distance, show how the majority voting would classify with respect to KNNs Query of test instance is Pepper: False, Ginger: True, Chilly: True
Hamming Distance
Hamming Distance The training examples contain three attributes, Pepper, Ginger, and Chilly. Each of these attributes takes either True or False as the attribute values. Liked is the target that takes either True or False as the value. In the k-nearest neighbor’s algorithm, first, we calculate the distance between the new example and the training examples. using this distance we find k-nearest neighbors from the training examples. To calculate the distance the attribute values must be real numbers. But in our case, the dataset set contains the categorical values. Hence we use hamming distance measure to find the distance between the new example and training examples.
Hamming Distance Let x1 and x2 be the attribute values of two instances. Then, in the hamming distance, if the categorical values are the same or matching that is x1 is the same as x2 then the distance is 0, otherwise 1. For example, If the value of x1 is blue and x2 is also blue then the distance between x1 and x2 is . If the value of x1 is blue and x2 is red then the distance between x1 and x2 is 1 .
KNN Example New examples: Example 1 (great, no, no, normal, no) Example 2 (mediocre, yes, no, normal, no) most similar: number 2 (1 mismatch, 4 match) yes Second most similar example: number 1 (2 mismatch, 3 match) yes Similarity metric: Number of matching attributes (k=2) Yes Most similar: number 3 (1 mismatch, 4 match) no Second most similar example: number 1 (2 mismatch, 3 match) yes Yes/No
Classification Performance Measures 44 Classification accuracy can be unreliable when assessing unbalanced data. For example, if there are 95 samples from class A and only 5 from class B in the data set, a particular classifier might classify all the observations as class A. Confusion matrix and other classification measures provide more information on the classification results. A self-learning practice based on the next slide. You will practise it in lab ex2.
Classification Performance Measures 45 Confusion matrix is a table with two rows and two columns that reports the number of false positives (FP), false negatives (FN), true positives (TP), and true negatives (TN). In multi-class classification, a confusion matrix can be computed for each class. Other classification measures: Sensitivity (recall): Specificity: Precision: F1-score: confusion matrix for “cat” class Self-learning material: https://en.wikipedia.org/wiki/Confusion_matrix
Summary In this chapter, we have learned the simplest machine learning algorithm k-NN. non-parametric no training We have also studied the behavior of the k-NN algorithm, e.g., how/why its performance changes over varying numbers of neighbors and training samples. In the next chapter, we will talk about the simplest parametric model in machine learning: a linear model trained using the least squares approach. 46 You are asked to use k-NN classifier for face recognition and observe its behaviour in lab ex2.