Classification Aalgorithms KNN and Protype-based classifiers.pptx

DrMTayyabChaudhry1 15 views 29 slides Sep 13, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Classification algorithms for Machine Learning


Slide Content

A Popular Classifier Challenges for Classification algorithms K-Nearest Neighbor (KNN) Slides Credit: Dr. Zulfiqar Habib Edited by: Dr. Allah Bux Sargana

Image Classification: The Problem Human vs Machine Perception: Images are represented as R d arrays of numbers, e.g., R 3 with integers between [0, 255], where d = 3 represents 3 color channels (RGB) What the machine (computer) sees

Image Classification: Challenges Viewpoint variation Illumination Michelangelo 1475-1564

Image Classification: Challenges Scale

Image Classification: Challenges Deformation Occlusion

Image Classification: Challenges Background clutter Intra-class variation Kilmeny Niland 1995

An Image Classifier no obvious way to hard-code the algorithm for recognizing a cat, or other classes. >> f = imread ('rabbit.jpg'); >> predict(f) ???? Unlike, e.g., sorting a list of numbers,

An Image Classifier: Data-driven approach Use Machine Learning to train an image classifier on some part of annotated data Evaluate the classifier on a withheld set of test images

The Image Classification Pipeline (Input) Dataset collection & labelling (Learning) Learning & training an image classifier (Evaluation) Testing of classifier on withheld images

The Image Classification Pipeline Input:  Our input consists of a set of  N  images, each labelled with one of  K  different classes. We refer to this data as the  training set . Learning:  Our task is to use the training set to learn what every one of the classes looks like. We refer to this step as  training a classifier , or  learning a model . Evaluation:  Evaluate the quality of the classifier by asking it to predict labels for a new set of images that it has never seen before. We will then compare the true labels of these images to the ones predicted by the classifier. Intuitively, we're hoping that a lot of the predictions match up with the true answers (which we call the  ground truth ).

The Machine Learning Framework y = f ( x ) Output Prediction function Image feature Training: given a training set of labelled examples {( x 1 , y 1 ), …, ( x N , y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f ( x )

Nearest Neighbor Classifier Assign label of nearest training data point to each test data point

Nearest Neighbor Classifier Assign label of nearest training data point to each test data point Partitioning of feature space for two-category 2D and 3D data

K Nearest Neighbor (KNN) Distance measure: Euclidean where X n and X m are the n - th and m - th data points The test sample (green dot) should be classified either to blue squares or to red triangles. If  k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. If  k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle).

KNN vs K-Means Clustering Q : What is the complexity of the 1-NN classifier w.r.t. training set of N images and test set of M images? at training time? at test time? KNN  represents a  supervised  classification algorithm that will give new data points accordingly to the k number or the closest data points, while K-Means clustering is an  unsupervised  clustering algorithm that gathers and groups data into k number of clusters.

Given: Fisher's Iris dataset:   DataIris (data size = 150) Species of Iris: petal sepal Setosa Versicolor Virginica The data set consists of 50 samples from each of three species of Iris. Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Example: Sample of data set

Example: Sample of data set No. Sepal. length Sepal. width Petal. length Petal. width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa … 76 6.6 3.0 4.4 1.4 versicolor … 150 5.9 3.0 5.1 1.8 virginica

Example Task: To classify a sample of 150 irises in the 3 following species: versicolor , virginica and setosa Number of given attributes: 4 From 4 characteristics measured on the flowers (the length of the sepal, the width of the sepal, the length of the petal and the width of the petal). In this example, only last 2 attributes are considered. Type of attribute to be predicted: Discrete with 3 classes

Example: Code… % Load the sample data, which includes Fisher's iris data of 5 measurements on a sample of 150 irises. >> load fisheriris >> whos Name Size Bytes Class Attributes meas 150x4 4800 double species 150x1 19300 cell >> species species = ' setosa ' ' setosa ' ------- ' versicolor ' ' versicolor ' ------- ' virginica ' ' virginica ' ------- >> meas meas = 5.1000 3.5000 1.4000 0.2000 4.9000 3.0000 1.4000 0.2000 4.7000 3.2000 1.3000 0.2000 4.6000 3.1000 1.5000 0.2000 5.0000 3.6000 1.4000 0.2000 -------- ------- ------- --------

>> x = meas (:, 3:4); % use data of last 2 columns for fitting >> y = species; % response data >> mdl = ClassificationKNN.fit (x, y) % 1NN mdl = ClassificationKNN : PredictorNames : {'x1' 'x2'} ResponseName : 'Y' ClassNames : {1x3 cell} ScoreTransform : 'none' NObservations : 150 Distance: ' euclidean ' NumNeighbors : 1 Example: Code…

% Predict the classification of an average flower >> flwr = mean(x) % an average flower flwr = 3.7580 1.1993 >> flwrClass = predict(mdl, flwr ) flwrClass = 'versicolor‘ Example: Code… >> gscatter (x(:, 1), x(:, 2), species) >> set(legend, 'location', 'best') >> line( flwr (1), flwr (2), 'marker', 'x', 'color', 'k', ' markersize ', 10, 'linewidth', 2) Petal length Petal width

% Predict another flower >> flwr = [5 1.55]; % Given (petal length, petal width) >> flwrClass = predict(mdl, flwr ) % prediction by 1NN flwrClass = ' virginica ‘ >> md5 = ClassificationKNN.fit (x,y,'NumNeighbors',5); % 5NN >> flwrClass = predict(md5, flwr ) % prediction by 5NN flwrClass = ' versicolor ' Example: Code Why different?

Species Length Width Distance virginica 5.0000 1.5000 0.0500 versicolor 4.9000 1.5000 0.1118 versicolor 4.9000 1.5000 0.1118 versicolor 5.1000 1.6000 0.1118 virginica 5.1000 1.5000 0.1118 Example: Analysis Value Count Percent virginica 2 40.00% versicolor 3 60.00%

K Nearest Neighbor (KNN) Find the k nearest images, have them vote on the label What is the best distance to use? What is the best value of k to use? i.e. how do we set the hyperparameters ? Very problem-dependent. Must try them all out and see what works best.

Prototype-based classifier

Prototype-based classifier

Prototype-based classifier

Prototype-based classifier

Prototype-based classifier
Tags