K-means Clustering || Data Mining

iffatfirozy 184 views 15 slides Mar 30, 2020
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

K- means clustering is one of the most important and easier clustering algorithms. Here, I'm sharing about k means algorithm and also the evaluation.


Slide Content

K-means Clustering: Algorithm, Evaluation Methods, and Graph

Hello! I am Iffat Firozy I am here because I love to teach . 2

We are given a data set of items, with certain features, and values for these features (like a vector). The task is to categorize those items into groups. To achieve this, we will use the kMeans algorithm; an unsupervised learning algorithm. 3

The above algorithm in pseudocode: Specify number of clusters K. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement. Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing. Compute the sum of the squared distance between data points and all centroids. Assign each data point to the closest cluster (centroid). Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster. 4

Flowchart of k-means clustering algorithm: 5

LETS’ SOLVE A PROBLEM 6

Problem on K-means clustering. Given are the points A = (1,2), B = (2,2), C = (2, 1), D = (-1, 4), E = (-2, -1), F = (-1,-1) a) Starting from initial clusters Cluster1 = {A} which contains only the point A and Cluster2 = {D} which contains only the point D, run the K-means clustering algorithm and report the final clusters. b) Draw the points on a 2-D grid and check if the clusters make sense. 7

Initially: 8 X Y A 1 2 B 2 2 C 2 1 D -1 4 E -2 -1 F -1 -1 CLUSTER X Y CENTROID ASSIGHNMENT K1 1 2 1,2 1 K2 -1 4 -1,4 2

For row B: Euclidean Distance: Here, K1 = = 1 K2= =3.60   9 CLUSTER X Y CENTROID ASSIGHNMENT K1 (1+2)/2 = 1.5 (2+2)/2= 2 1.5,2 1 K2 -1 4 -1,4 X Y A 1 2 B 2 2 C 2 1 D -1 4 E -2 -1 F -1 -1

For row C: Distance: Here, K1 = = 1.11 K2= =4.24   10 CLUSTER X Y CENTROID ASSIGHNMENT K1 (1.5+2)/2 = 1.75 (2+1)/2 = 1.5 1.75,1.5 1 K2 -1 4 -1,4 X Y A 1 2 B 2 2 C 2 1 D -1 4 E -2 -1 F -1 -1

For row E: Distance: Here, K1 = = 4.50 K2= =5.09   11 CLUSTER X Y CENTROID ASSIGHNMENT K1 (1.75-2)/2 = -0.125 (1.5-1)/2 = 0.25 -0.125, 0.25 1 K2 -1 4 -1,4 X Y A 1 2 B 2 2 C 2 1 D -1 4 E -2 -1 F -1 -4

For row F: Distance: Here, K1 = = 4.33 K2= =5   12 CLUSTER X Y CENTROID ASSIGHNMENT K1 (0.125-1)/2 = -.43 (.25-1)/2 = -.375 -.43, -1.85 1 K2 -1 4 -1,4 X Y A 1 2 B 2 2 C 2 1 D -1 4 E -2 -1 F -1 -1

Final Clustering & Assignments: 13 X Y ASSIGNMENT A 1 2 1 B 1.5 2 1 C 1.75 1.5 1 D -1 4 1 E .125 .25 1 F -..43 -.375 1

2 D Graph: 14 AFTER CLUSTERING BEFORE CLUSTERING

Thanks! Any questions? You can find me at: [email protected] 15
Tags