k-nearest neighbour Machine Learning.pptx

K-Nearest Neighbor CSE 4621 | Machine Learning| BScTE | Summer 22-23 Sabbir Ahmed Assistant Professor, Dept of CSE, IUT [email protected] 1

Nearest Neighbor Classifier 2 Memorize all data and labels Predict the label of the most similar training image

Distance Metric to compare images 3 L1 distance:

Memorize training data For each test image: Find nearest training image Return label of nearest image 4 Nearest Neighbor Classifier Q : With N examples, how fast is training? A : O(1) Q : With N examples, how fast is testing? A : O(N)

What does this look like? 5

What does this look like? 6

7 Nearest Neighbor Decision Boundaries

8 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions x

9 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels x

10 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would be assigned x x

11 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Decision boundary is the boundary between two classification regions Background colors give the category a test point would be assigned x x

12 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would x Decision boundary is the boundary between two classification regions be assigned x Decision boundaries can be noisy; affected by outliers

13 Nearest Neighbor Decision Boundaries x 1 Nearest neighbors in two dimensions Points are training examples; colors give training labels Background colors give the category a test point would x Decision boundary is the boundary between two classification regions be assigned x Decision boundaries can be noisy; affected by outliers How to smooth out decision boundaries? Use more neighbors!

14 K-Nearest Neighbors K = 1 Instead of copying label from nearest neighbor, take majority vote from K closest points K = 3

15 K-Nearest Neighbors K = 1 Using more neighbors helps smooth out rough decision boundaries K = 3

16 K-Nearest Neighbors K = 1 Using more neighbors helps reduce the effect of outliers K = 3

K-Nearest Neighbors K = 1 When K > 1 there can be ties between classes. Need to break somehow! K = 3

K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance L2 (Euclidean) distance K = 1 K = 1

K-Nearest Neighbors: Web Demo http://vision.stanford.edu/teaching/cs231n-demos/knn/ Interactively move points around and see decision boundaries change Play with L1 vs L2 metrics Play with changing number of training points, value of K

Hyperparameters What is the best value of K to use? What is the best distance metric to use? These are examples of hyperparameters : choices about our learning algorithm that we don’t learn from the training data; instead we set them at the start of the learning process

Hyperparameters What is the best value of K to use? What is the best distance metric to use? These are examples of hyperparameters : choices about our learning algorithm that we don’t learn from the training data; instead we set them at the start of the learning process Very problem-dependent. In general need to try them all and see what works best for our data / task.

Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data Your Dataset

Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Your Dataset

Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data Your Dataset train test

Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data BAD : No idea how algorithm will perform on new data Your Dataset train test

Setting Hyperparameters Idea #1 : Choose hyperparameters that work best on the data BAD : K = 1 always works perfectly on training data Idea #2 : Split data into train and test , choose hyperparameters that work best on test data BAD : No idea how algorithm will perform on new data Your Dataset train test Idea #3 : Split data into train , val , and test ; choose hyperparameters on val and evaluate on test B e tt e r ! train validation test

Setting Hyperparameters Your Dataset fold 1 fold 2 fold 3 fold 4 fold 5 test Idea #4 : Cross-Validation : Split data into folds , try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but (unfortunately) not used too frequently in deep learning

Setting Hyperparameters Example of 5-fold cross-validation for the value of k. Each point: single outcome. The line goes through the mean, bars indicated standard deviation (Seems that k ~ 7 works best for this data)

Mathematical explanation of K-Nearest Neighbour It’s one of the Supervised learning algorithm mostly used for classification of data on the basis how it’s neighbour are classified. KNN stores all available cases and classifies new cases based on a similarity measure. K in KNN is a parameter that refers to the number of the nearest neighbours to include in the majority voting process. 29

How do we choose K? Sqrt(n), where n is a total number of data points(if in case n is even we have to make the value odd by adding 1 or subtracting 1 that helps in select better) When to use KNN? We can use KNN when Dataset is labelled and noise-free and it’s must be small because KNN is a “Lazy learner”. Let’s understand KNN algorithm with the help of an example 30

31

Here male is denoted with numeric value 0 and female with 1. Let’s find in which class of people Angelina will lie whose k factor is 3 and age is 5. So we have to find out the distance using d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points. 32

So let’s find out the distance between Ajay and Angelina using formula d=√((age2-age1)²+(gender2-gender1)²) d=√((5-32)²+(1-0)²) d=√729+1 d=27.02 Similarly, we find out all distance one by one. 33

34

So the value of k factor is 3 for Angelina. And the closest to 3 is 9,10,10.5 that is closest to Angelina are Zaira, Smith and Michael. Zaira 9 cricket Michael 10 cricket Smith 10.5 football So according to KNN algorithm, Angelina will be in the class of people who like cricket. So this is how KNN algorithm works. 35

Reference https://www.geeksforgeeks.org/mathematical-explanation-of-k-nearest-neighbour/ 36

k-nearest neighbour Machine Learning.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

k-nearest neighbour Machine Learning.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx