k-nearest neighbour Machine Learning.pdf

SabbirAhmed346057 36 views 36 slides Aug 25, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

ML


Slide Content

K-Nearest Neighbor
CSE 4621 | Machine Learning| BScTE| Summer 22-23
Sabbir Ahmed
Assistant Professor, Dept of CSE, IUT
[email protected]
1

Nearest Neighbor Classifier
2
Memorizealldata
andlabels
Predict the label of
themostsimilar
training image

Distance Metric to compare images
3
L1distance:

•Memorize training data
•For each test image:
•Find nearest training image
•Return label of nearest
image
4
NearestNeighborClassifier
Q:WithNexamples,
howfastistraining?
A:O(1)
Q:WithNexamples,
howfastis testing?
A:O(N)

What does this look like?
5

What does this look like?
6

7
NearestNeighborDecisionBoundaries

8
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
x
0

9
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
x
0

10
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
give the category
atestpointwould
beassigned
x
x
0

11
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Decisionboundary
is the boundary
between two
classificationregions
Backgroundcolors
give the category
atestpointwould
beassigned
x
x
0

12
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
givethecategory
atestpointwould
x
Decisionboundary
is the boundary
between two
classificationregions
beassigned
x
0
Decisionboundaries
can be noisy;
affectedbyoutliers

13
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
givethecategory
atestpointwould
x
Decisionboundary
is the boundary
between two
classificationregions
beassigned
x
0
Decisionboundaries
can be noisy;
affectedbyoutliers
How to smooth out
decision boundaries?
Usemoreneighbors!

14
K-NearestNeighbors
K=1
Insteadofcopyinglabel fromnearestneighbor,
takemajority votefrom K closestpoints
K=3

15
K-NearestNeighbors
K=1
Using more neighbors helps smooth
outrough decisionboundaries
K=3

16
K-NearestNeighbors
K=1
Using more neighbors helps
reduce theeffectofoutliers
K=3

K-NearestNeighbors
K=1
WhenK>1therecanbe
tiesbetweenclasses.
Needtobreaksomehow!
K=3

K-NearestNeighbors:DistanceMetric
L1(Manhattan)distance L2(Euclidean)distance
K=1K=1

K-NearestNeighbors:
WebDemo
http://vision.stanford.edu/teaching/cs231n-demos/knn/
Interactivelymovepointsaround
and seedecisionboundarieschange
PlaywithL1 vsL2 metrics
Playwith changingnumber of
trainingpoints,valueofK

Hyperparameters
WhatisthebestvalueofKtouse?
Whatisthebestdistancemetrictouse?
These areexamplesofhyperparameters:choices about our
learningalgorithmthatwedon’tlearnfromthetraining
data;insteadwesetthematthestartof thelearning process

Hyperparameters
WhatisthebestvalueofKtouse?
Whatisthebestdistancemetrictouse?
These areexamplesofhyperparameters:choices about our
learningalgorithmthatwedon’tlearnfromthetraining
data;insteadwesetthematthestartof thelearning process
Veryproblem-dependent.In generalneedtotrythemalland
seewhatworksbestforourdata/task.

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
YourDataset

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
YourDataset

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
YourDataset
train test

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
BAD:Noideahowalgorithm
willperformonnewdata
YourDataset
train test

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
BAD:Noideahowalgorithm
willperformonnewdata
YourDataset
train test
Idea #3: Split data into train, val, and test; choose
hyperparameters onvalandevaluate ontest
Better!
train validation test

SettingHyperparameters
YourDataset
fold1 fold2 fold3 fold4 fold5 test
Idea#4:Cross-Validation:Splitdatainto folds,tryeach
foldasvalidationandaveragetheresults
fold1 fold2 fold3 fold4 fold5 test
fold1 fold2 fold3 fold4 fold5 test
Useful forsmall datasets,but(unfortunately)not usedtoofrequentlyindeeplearning

SettingHyperparameters
Exampleof5-foldcross-validationfor
thevalueof k.
Each point:singleoutcome.
Thelinegoesthroughthemean,bars
indicatedstandarddeviation
(Seemsthatk~7works best
forthisdata)

Mathematical explanation of K-Nearest
Neighbour
•It’s one of the Supervised learning algorithm mostly used for classification
of data on the basis how it’s neighbour are classified.
•KNN stores all available cases and classifies new cases based on a
similarity measure.
•K in KNN is a parameter that refers to the number of the nearest
neighbours to include in the majority voting process.
29

•How do we choose K?
•Sqrt(n), where n is a total number of data points(if in case n is even we have to
make the value odd by adding 1 or subtracting 1 that helps in select better)
•When to use KNN?
•We can use KNN when Dataset is labelled and noise-free and it’s must be small
because KNN is a “Lazy learner”. Let’s understand KNN algorithm with the help of
an example
30

31

•Here male is denoted with numeric value 0 and female with 1.
•Let’s find in which class of people Angelina will lie whose k factor is 3 and
age is 5.
•So we have to find out the distance using
d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points.
32

•So let’s find out the distance between Ajay and Angelina using formula
d=√((age2-age1)²+(gender2-gender1)²)
d=√((5-32)²+(1-0)²)
d=√729+1
d=27.02
Similarly, we find out all distance one by one.
33

34

•So the value of k factor is 3 for Angelina. And the closest to 3 is 9,10,10.5
that is closest to Angelina are Zaira, Smith and Michael.

Zaira 9 cricket
Michael 10 cricket
Smith 10.5 football
•So according to KNN algorithm, Angelina will be in the class of people
who like cricket. So this is how KNN algorithm works.
35

Reference
•https://www.geeksforgeeks.org/mathematical-explanation-of-k-nearest-
neighbour/
36
Tags