k-nearest neighbour Machine Learning.pdf

K-Nearest Neighbor
CSE 4621 | Machine Learning| BScTE| Summer 22-23
Sabbir Ahmed
Assistant Professor, Dept of CSE, IUT
[email protected]
1

Nearest Neighbor Classifier
2
Memorizealldata
andlabels
Predict the label of
themostsimilar
training image

Distance Metric to compare images
3
L1distance:

•Memorize training data
•For each test image:
•Find nearest training image
•Return label of nearest
image
4
NearestNeighborClassifier
Q:WithNexamples,
howfastistraining?
A:O(1)
Q:WithNexamples,
howfastis testing?
A:O(N)

What does this look like?
5

What does this look like?
6

7
NearestNeighborDecisionBoundaries

8
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
x
0

9
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
x
0

10
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
give the category
atestpointwould
beassigned
x
x
0

11
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Decisionboundary
is the boundary
between two
classificationregions
Backgroundcolors
give the category
atestpointwould
beassigned
x
x
0

12
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
givethecategory
atestpointwould
x
Decisionboundary
is the boundary
between two
classificationregions
beassigned
x
0
Decisionboundaries
can be noisy;
affectedbyoutliers

13
NearestNeighborDecisionBoundaries
x
1
Nearestneighbors
intwodimensions
Points are training
examples; colors
givetraininglabels
Backgroundcolors
givethecategory
atestpointwould
x
Decisionboundary
is the boundary
between two
classificationregions
beassigned
x
0
Decisionboundaries
can be noisy;
affectedbyoutliers
How to smooth out
decision boundaries?
Usemoreneighbors!

14
K-NearestNeighbors
K=1
Insteadofcopyinglabel fromnearestneighbor,
takemajority votefrom K closestpoints
K=3

15
K-NearestNeighbors
K=1
Using more neighbors helps smooth
outrough decisionboundaries
K=3

16
K-NearestNeighbors
K=1
Using more neighbors helps
reduce theeffectofoutliers
K=3

K-NearestNeighbors
K=1
WhenK>1therecanbe
tiesbetweenclasses.
Needtobreaksomehow!
K=3

K-NearestNeighbors:DistanceMetric
L1(Manhattan)distance L2(Euclidean)distance
K=1K=1

K-NearestNeighbors:
WebDemo
http://vision.stanford.edu/teaching/cs231n-demos/knn/
Interactivelymovepointsaround
and seedecisionboundarieschange
PlaywithL1 vsL2 metrics
Playwith changingnumber of
trainingpoints,valueofK

Hyperparameters
WhatisthebestvalueofKtouse?
Whatisthebestdistancemetrictouse?
These areexamplesofhyperparameters:choices about our
learningalgorithmthatwedon’tlearnfromthetraining
data;insteadwesetthematthestartof thelearning process

Hyperparameters
WhatisthebestvalueofKtouse?
Whatisthebestdistancemetrictouse?
These areexamplesofhyperparameters:choices about our
learningalgorithmthatwedon’tlearnfromthetraining
data;insteadwesetthematthestartof thelearning process
Veryproblem-dependent.In generalneedtotrythemalland
seewhatworksbestforourdata/task.

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
YourDataset

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
YourDataset

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
YourDataset
train test

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
BAD:Noideahowalgorithm
willperformonnewdata
YourDataset
train test

SettingHyperparameters
Idea#1:Choosehyperparametersthat
workbest onthedata
BAD: K = 1 always works
perfectlyontrainingdata
Idea #2: Split data into train and test, choose
hyperparametersthatworkbestontestdata
BAD:Noideahowalgorithm
willperformonnewdata
YourDataset
train test
Idea #3: Split data into train, val, and test; choose
hyperparameters onvalandevaluate ontest
Better!
train validation test

SettingHyperparameters
YourDataset
fold1 fold2 fold3 fold4 fold5 test
Idea#4:Cross-Validation:Splitdatainto folds,tryeach
foldasvalidationandaveragetheresults
fold1 fold2 fold3 fold4 fold5 test
fold1 fold2 fold3 fold4 fold5 test
Useful forsmall datasets,but(unfortunately)not usedtoofrequentlyindeeplearning

SettingHyperparameters
Exampleof5-foldcross-validationfor
thevalueof k.
Each point:singleoutcome.
Thelinegoesthroughthemean,bars
indicatedstandarddeviation
(Seemsthatk~7works best
forthisdata)

Mathematical explanation of K-Nearest
Neighbour
•It’s one of the Supervised learning algorithm mostly used for classification
of data on the basis how it’s neighbour are classified.
•KNN stores all available cases and classifies new cases based on a
similarity measure.
•K in KNN is a parameter that refers to the number of the nearest
neighbours to include in the majority voting process.
29

•How do we choose K?
•Sqrt(n), where n is a total number of data points(if in case n is even we have to
make the value odd by adding 1 or subtracting 1 that helps in select better)
•When to use KNN?
•We can use KNN when Dataset is labelled and noise-free and it’s must be small
because KNN is a “Lazy learner”. Let’s understand KNN algorithm with the help of
an example
30

31

•Here male is denoted with numeric value 0 and female with 1.
•Let’s find in which class of people Angelina will lie whose k factor is 3 and
age is 5.
•So we have to find out the distance using
d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points.
32

•So let’s find out the distance between Ajay and Angelina using formula
d=√((age2-age1)²+(gender2-gender1)²)
d=√((5-32)²+(1-0)²)
d=√729+1
d=27.02
Similarly, we find out all distance one by one.
33

34

•So the value of k factor is 3 for Angelina. And the closest to 3 is 9,10,10.5
that is closest to Angelina are Zaira, Smith and Michael.

Zaira 9 cricket
Michael 10 cricket
Smith 10.5 football
•So according to KNN algorithm, Angelina will be in the class of people
who like cricket. So this is how KNN algorithm works.
35

Reference
•https://www.geeksforgeeks.org/mathematical-explanation-of-k-nearest-
neighbour/
36

k-nearest neighbour Machine Learning.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

k-nearest neighbour Machine Learning.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......