K-means machine learning clustering .pptx

asmaashalma456 21 views 21 slides Sep 21, 2024

Slide 1 of 21

About This Presentation

Machine learning

Size: 511.1 KB

Language: en

Added: Sep 21, 2024

Slides: 21 pages

Slide Content

Clustering Clustering: the process of portioning a set of data objects into subsets (clusters) where objects in a cluster are similar to one , yet dissimilar to objects in other clusters. Considered as unsupervised learning : no predefined classes (learning by observation vs learning by examples) Descriptive data mining

Clustering Y X OBJECT 1 1 A 1 2 B 3 4 C 4 5 D

Types of Clustering: Partitioning approach : construct various portions and then evaluate them by some criterion (i.e. minimize the sum of square errors). Hierarchical approach: create a hierarchal decomposition of the set of data using some criterion.

Partitioning approach Partitioning methods :: Partioning a dataset D of n objects into a set of k clusters. A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that cluster. The centroid can be defined in various ways such as by the mean or medoid of the objects (or points)

What is K-Means Clustering? It is an algorithm to group your objects based on attributes/features into K number of group. K is positive integer number. Cluster representative can be: mean / centroid (average of data point) median / medoid (a point closer to the mean)

Distance Function Euclidean Distance Manhatten Distance

Ex: Find centroid and medoid of cluster containing three two dimensional points (1,1) ,(2,3) and (6,2) Centroid (mean)= To find Kmedoid find the closest point to mean - For (1,1) = |3-1|+|2-1| = 3 For (2,3)=|3-2|+|2-3| = 2 - For (6,2)=|3-6|+|2-2|=3 K medoid = (2,3) Closest point

Partitioning approach The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. The quality of cluster Ci can be measured by the within cluster variation , which is the sum of squared error between all objects in Ci and the centroid ci , defined as

Main steps for K means

Example : Suppose we have 4 objects as your training data point and each object have 2 tributes. Each attribute represents coordinate of the object. Y X OBJECT 1 1 A 1 2 B 3 4 C 4 5 D

First step is to determine number of K. K=2 Initial centroids. c 1 = (1,1) and c2 = (2,1)

c2 = (2,1) c 1 = (1,1) = (1,1) =1 = (1,1) =0 min = (2,1) =0 min = (2,1) =1 = (4,3) =2.83 min = (4,3) =3.61 = (5,4) =4.24 min = (5,4) =5 c2 = (2,1) c 1 = (1,1) Calculate distance between objects and centroids.

New centroids , )= c 1 = (1,1) c 1 = (1,1)

Calculate distance between objects and new centroids. c2 = c 1 = (1,1) = (1,1) =3.14 = (1,1) =0 min = (2,1) =2.36 = (2,1) =1 min = (4,3) =0.47 min = (4,3) =3.61 = (5,4) =1.89 min = (5,4) =5 c 1 = (1,1)

New centroids , )= , )=

Calculate distance between objects and new centroids. c2 = c 1 = ( ,1) = (1,1) =4.3 = (1,1) =0.5 min = (2,1) =3.54 = (2,1) =0.5 min = (4,3) =0.71 min = (4,3) =3.20 = (5,4) =0.71 min = (5,4) =4.61

New centroids , )= , )= Centroids not changed then Stop

EX: The following is a set of one-dimensional points: {6; 12; 18; 24; 30; 42; 48}. For each of the following set of initial centroids, create two clusters by assigning each point to the nearest centroid, and then calculate the total squared error for each set of two clusters. Show both the clusters and the total squared error for each set of centroid. { 18 ; 45 } . { 15 ; 40 } .

Sol: First round of k means - Cluster assign 42,48) Recompute mean New centroid is the same as the previous centroid The final clusters are {6,12,18,24,30} {42,48}

= ( + + + + ) = 360 = ( + )= 18 Total square error is 360+18= 378

True or false K means is a hierarchical clustering method. In k means clustering the number of clusters produced is not known. A partition clustering is a division of data objects into overlapping clusters. K means results in optimal data clustering. A centroid must be an actual data objects.

K-means machine learning clustering .pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

K-means machine learning clustering .pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd