Kmeans clustering using machine learning

ssemwogerere_rajab 0 views 24 slides Oct 13, 2025

Slide 1 of 24

About This Presentation

K means clustering

Size: 351.07 KB

Language: en

Added: Oct 13, 2025

Slides: 24 pages

Slide Content

MACHINE LEARNING Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

Types of machine learning K-Means Clustering Gaussian Mixture Models Dirichlet Process

K-Means Clustering Clustering: is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset share some common trait.

K-means Clustering Types of Clustering Hierarchical Partitional Density Based Clustering Fuzzy logic Clustering

K-means Clustering a clustering algorithm in which the K-clusters are based on the closeness of data points to a reference point (centroid of a cluster ). I clusters n objects based on attributes into k partitions , where k < n . Terminology Centroid A reference point of a given cluster. They are used to label new data i.e. determine which cluster a new data point belongs to.

K-means Clustering How it works The algorithm performs two major steps; Data Assignment Selection of centroids Random selection Random Generation K-means ++ Assign data points to centroids based on distance Euclidean distance Manhattan distance Hamming distance Inner product space

Manhattan Distance Euclidean Distance

Illustration Objective: To create 2 clusters from the set of numbers.(K=2, n=10)

1 2 3 4 5 6 7 8 9 10

n 2 8 1 2 3 4 5 6 7 8 9 10 Centroid Initialization

1 2 3 4 5 6 7 8 9 10 Data Assignment 7 8 2 8 1 1

n 2 8 1 1 7 2 6 3 1 5 4 2 4 5 3 3 6 4 2 7 5 1 8 6 9 7 1 10 8 2

K-means Clustering (How it works) Centroid update step Centroids are recomputed based on mean of all data points assigned to a cluster(in step 1) Steps 1 and 2 are run iteratively until; Centroids don’t change i.e. distances is the same and data points do not change clusters Some maximum number of iterations is reached. Some other condition is fulfilled( e.g. minimum distance is achieved)

n 3 8 1 2 7 2 1 6 3 5 4 1 4 5 2 3 6 3 2 7 4 1 8 5 9 6 1 10 7 2 Step 2 (Centroid Update) (6,7,8,9,10)= 8 (1,2,3,4,5)=3 Mean Mean Centroids

3 8 1 2 7 2 1 6 3 5 4 1 4 5 2 3 6 3 2 7 4 1 8 5 9 6 1 10 7 2 Step 3 (Centroid Update) (6,7,8,9,10)= 8 (1,2,3,4,5)=3 Mean Mean

NOTE The centroids do not change after the 2 nd iteration. Therefore we stop updating the centroids. Remember ! Our goal is to identify the optimum value of K.

K-means Clustering (How to pick the optimum k) Minimize the within-cluster sum-of-squares( tighten clusters) and increase between-cluster sum of squares. Where S j is a specific cluster in K number of clusters. X n is a datapoint within a cluster S j . µ j is the centroid of the cluster

There are 2 common methods ; The Elbow Method Calculate the sum of squares for a cluster. Plot sum of squares against number of clusters, k. Observe change in sum of squares to select optimum K. K-means Clustering (How to pick the optimum k) Observe graph at k=3(the elbow), sum of squares does not reduce significantly after k>3 Elbow

Limitations with the elbow method is that the elbow might not be well defined. This can be overcome using the Silhouette method Silhouette method The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It ranges from [+1,-1] with +1 showing that the point is very close to its own cluster, -1 shows that the point is very similar to the neighboring cluster. K-means Clustering (How to pick the optimum k)

Silhouette value s( i ) of a point ( i ) is mathematically defined as Where; b( i ) is the mean distance of point ( i ) with respect to points in its neighboring cluster. a( i ) is the mean distance of point ( i ) with respect to points in its own cluster K-means Clustering (How to pick the optimum k)

K-means Clustering(advantages) It is guaranteed to converge Easily scales to large datasets Has a linear time complexity O(tkn) t – number of iterations k – number of clusters n – number of data points

K-means Clustering(Limitations) k is chosen manually Clusters are typically dependent on initial centroids Outliers can drastically affect centroids Can give unrealistic clusters i.e. (local optimum) Organization/order of data may have an impact on results Sensitive to scale

Applications Pattern recognitions Classification Analysis Image processing Machine Vision

References/Resources https://blogs.oracle.com/datascience/introduction-to-k-means-clustering https://medium.com/analytics-vidhya/how-to-determine-the-optimal-k-for-k-means-708505d204eb https://en.wikipedia.org/wiki/Silhouette_(clustering)

Kmeans clustering using machine learning

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Kmeans clustering using machine learning

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx