K-Nearest Neighbor CSE 4621 | Machine Learning| BScTE | Summer 22-23 Sabbir Ahmed Assistant Professor, Dept of CSE, IUT [email protected] 1
Nearest Neighbor Classifier 2 Memorize all data and labels Predict the label of the most similar training image
Distance Metric to compare images 3 L1 distance:
Memorize training data For each test image: Find nearest training image Return label of nearest image 4 Nearest Neighbor Classifier Q : With N examples, how fast is training? A : O(1) Q : With N examples, how fast is testing? A : O(N)
What does this look like? 5
What does this look like? 6
7 Nearest Neighbor Decision Boundaries
8 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions x
9 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels x
10 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would be assigned x x
11 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Decision boundary is the boundary between two classification regions Background colors give the category a test point would be assigned x x
12 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would x Decision boundary is the boundary between two classification regions be assigned x Decision boundaries can be noisy; affected by outliers
13 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would x Decision boundary is the boundary between two classification regions be assigned x Decision boundaries can be noisy; affected by outliers How to smooth out decision boundaries? Use more neighbors!
14 K-Nearest Neighbors K = 1 Instead of copying label from nearest neighbor, take majority vote from K closest points K = 3
15 K-Nearest Neighbors K = 1 Using more neighbors helps smooth out rough decision boundaries K = 3
16 K-Nearest Neighbors K = 1 Using more neighbors helps reduce the effect of outliers K = 3
K-Nearest Neighbors K = 1 When K > 1 there can be ties between classes. Need to break somehow! K = 3
K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance L2 (Euclidean) distance K = 1 K = 1
K-Nearest Neighbors: Web Demo http://vision.stanford.edu/teaching/cs231n-demos/knn/ Interactively move points around and see decision boundaries change Play with L1 vs L2 metrics Play with changing number of training points, value of K
Hyperparameters What is the best value of K to use? What is the best distance metric to use? These are examples of hyperparameters : choices about our learning algorithm that we don’t learn from the training data; instead we set them at the start of the learning process
Hyperparameters What is the best value of K to use? What is the best distance metric to use? These are examples of hyperparameters : choices about our learning algorithm that we don’t learn from the training data; instead we set them at the start of the learning process Very problem-dependent. In general need to try them all and see what works best for our data / task.
Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data Your Dataset
Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Your Dataset
Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data Your Dataset train test
Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data BAD : No idea how algorithm will perform on new data Your Dataset train test
Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data BAD : No idea how algorithm will perform on new data Your Dataset train test Idea #3 : Split data into train , val , and test ; choose hyperparameters on val and evaluate on test B e tt e r ! train validation test
Setting Hyperparameters Your Dataset fold 1 fold 2 fold 3 fold 4 fold 5 test Idea #4 : Cross-Validation : Split data into folds , try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but (unfortunately) not used too frequently in deep learning
Setting Hyperparameters Example of 5-fold cross-validation for the value of k. Each point: single outcome. The line goes through the mean, bars indicated standard deviation (Seems that k ~ 7 works best for this data)
Mathematical explanation of K-Nearest Neighbour It’s one of the Supervised learning algorithm mostly used for classification of data on the basis how it’s neighbour are classified. KNN stores all available cases and classifies new cases based on a similarity measure. K in KNN is a parameter that refers to the number of the nearest neighbours to include in the majority voting process. 29
How do we choose K? Sqrt(n), where n is a total number of data points(if in case n is even we have to make the value odd by adding 1 or subtracting 1 that helps in select better) When to use KNN? We can use KNN when Dataset is labelled and noise-free and it’s must be small because KNN is a “Lazy learner”. Let’s understand KNN algorithm with the help of an example 30
31
Here male is denoted with numeric value 0 and female with 1. Let’s find in which class of people Angelina will lie whose k factor is 3 and age is 5. So we have to find out the distance using d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points. 32
So let’s find out the distance between Ajay and Angelina using formula d=√((age2-age1)²+(gender2-gender1)²) d=√((5-32)²+(1-0)²) d=√729+1 d=27.02 Similarly, we find out all distance one by one. 33
34
So the value of k factor is 3 for Angelina. And the closest to 3 is 9,10,10.5 that is closest to Angelina are Zaira, Smith and Michael. Zaira 9 cricket Michael 10 cricket Smith 10.5 football So according to KNN algorithm, Angelina will be in the class of people who like cricket. So this is how KNN algorithm works. 35