KNN Neural Network In minimax search, alpha-beta pruning can be applied to prune branches that don't influence the final decision

movocode 16 views 27 slides Oct 16, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

In minimax search, alpha-beta pruning can be applied to prune branches that don't influence the final decision


Slide Content

KNN
Prof.N.Nalini
VIT
Vellore

Nearest Neighbor Classifiers
•Basic idea: If it walks like a duck, quacks like a
duck, then it’s probably a duck

Training
Records
Test Record
Compute
Distance
Choose k of the
“nearest” records

Definition of Nearest Neighbor X X X
(a) 1-nearest neighbor(b) 2-nearest neighbor(c) 3-nearest neighbor
K-nearest neighbors of a record x - data points that have the k smallest distance to x

Nearest Neighbor Classifiers
•Instance-based/Lazy Learning.
•k-NN classification rule is to assign to a test
sample the majority category label of its k
nearest training samples
•In practice, k is usually chosen to be odd, so as
to avoid ties (if the no.of classes is 2)
•The k = 1 rule is generally called the nearest-
neighbor classification rule

Nearest Neighbor: Voronoi Diagram
Properties:
1)All possible points
within a sample's
Voronoi cell are the
nearest neighboring
points for that sample
2)For any sample, the
nearest sample is
determined by the
closest Voronoi cell
edge

K-Nearest Neighbor
Distance Metric

Discrete Value

Majority Class
Distance Metric

Large,(L,M)+(L,L)+(L,L)=0+1+1=2
Medium,(M,M)+(M,L)+(M,L)=1+0+0=1
V={Medium,Large}
So, Maximum is Large.
(157,54) =Large

Consider the above training dataset. Which class of Rakshan will lie whose k
factor is 5, height (CM) is 170 and weight (KG) is 57? Apply k-NN classifier using
Euclidean distance.

Regression
K-Nearest Neighbor
Predicting Continuous Values
•Replace by





K
xf
qf
k
i
i


1
)(
)(
ˆ 



k
i
i
Vv
xfvqf
1
))(,(maxarg)(
ˆ

= 1.2+1.8+2.1/3= 1.7
(157, 54) =1.7

How to find the optimal value of K in
KNN?

•There are no pre-defined statistical methods to find the
most favorable value of K.
•Initialize a random K value and start computing.
•Choosing a small value of K leads to unstable decision
boundaries.
•The substantial (large) K value is better for classification as
it leads to smoothening the decision boundaries.
•The optimal K value usually found is the square root of N,
where N is the total number of samples.(k = sqrt(N) or
k = sqrt(N)/2)
•Derive a plot between error rate and K denoting values in
a defined range. Then choose the K value as having a
minimum error rate.

Example – Hamming distance
Hamming
distance
Sqrt(d)
1+0+0=1 1 ->R2
1+1+1=3 1.732
0+0+0=0 0 ->R1
0+0+1=1 1 ->R3
1+1+1=3 1.732
K=3  False

Dataset
Test

Bag-of-Words (BoW) model for document
classification
•The bag-of-words model(vector space model) is a way of
representing text data when modeling text with machine
learning algorithms.
–Simple to understand and implement problems in areas of language modeling
and document classification.
–A text (such as a sentence or a document) is represented as the bag of its
words, disregarding grammar and even word order but keeping multiplicity.
–Commonly used in methods of document classification where the (frequency
of) occurrence of each word is used as a feature for training a classifier.
•Uses
–A vocabulary of known words
–A measure of the presence of known words
•Distance fn = sqrt( (q[i]-d
j[i])^2)

Example of Bag-of-Words Model
•Data collected:
•It was the best of times,
it was the worst of times,
it was the age of wisdom,
it was the age of foolishness
•Design the Vocabulary
–“it”
–“was”
–“the”
–“best”
–“of”
–“times”
–“worst”
–“age”
–“wisdom”
–“foolishness”

•Create Document Vectors
•Vocabulary: “it” “was” “the” “best”
“of” “times” “worst” “age” “wisdom”
“foolishness”
•Scores:
•"it was the best of times"
–[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
• “it was the worst of times"
– [1, 1, 1, 0, 1, 1, 1, 0, 0, 0]
•"it was the age of wisdom"
– [1, 1, 1, 0, 1, 0, 0, 1, 1, 0]
•"it was the age of foolishness"
– [1, 1, 1, 0, 1, 0, 0, 1, 0, 1]

K-NN Example - 3
•Email spam filtering models often use a bag-of-words representation for emails.
•In a bag-of-words representation, the descriptive features that describe a
document each represent how many times a particular word occurs in the
document.
•One descriptive feature is included for each word in a predefined dictionary. The
dictionary is typically defined as the complete set of words that occur in the
training dataset.
•Consider the 5 emails (data set) given below.
• “money, money, money”
• “free money for free gambling fun”
• “gambling for fun”
• “machine learning for fun, fun, fun”
• “free machine learning”

Distance fn = sqrt( (q[i]-d
j[i])^2)

K-NN Example – 3 (cont’d)
•The table below lists the bag-of-words representation for the following
five emails and a target feature, SPAM, whether they are spam emails or
genuine emails:
•What target level would a nearest neighbor model(1NN) using Euclidean
distance return for the following email: “machine learning for free”?
•What target level would a k-NN model with k = 3 and using Euclidean
distance return for the same query: “machine learning for free”?


Distance fn = sqrt( (q[i]-d
j[i])^2)

K-NN Example – 3 (cont’d)

K-NN Example – 3 (cont’d)
Sqrt(13)
Sqrt(6)
Sqrt(5)
Sqrt(10)
Sqrt(1)
K=3-- True K=1 - >false

Nearest Neighbour : Computational
Complexity
•Expensive
–To determine the nearest neighbour of a query point q, must
compute the distance to all N training examples
+Pre-sort training examples into fast data structures (kd-trees)
+Compute only an approximate distance (LSH)
+Remove redundant data
•Storage Requirements
–Must store all training data P
+Remove redundant data (condensing)
-Pre-sorting often increases the storage requirements
•High Dimensional Data
–“Curse of Dimensionality”
•Required amount of training data increases exponentially with
dimension
•Computational cost also increases dramatically
•Partitioning techniques degrade to linear search in high dimension
Tags