Lecturer: Dr. Ahmed Khanfir 1 TERM: Fall 2024 Chapter 4 part 2: Unsupervised learning – Clustering k-means and Hierarchical Clustering Analysis (HCA) SOME SLIDES ARE BORROWED FROM THOSE OF Dr. Oumayma Bounouh
Recap - Unsupervised => Unlabled data - PCA is an unsupervised technique - Clustering is among the unsupervised techniques, the most commonly used. 2
ASSOCIATION (Association Rule Mining) Association is a rule-based machine learning to discover the probability of the co-occurrence of items in a collection. 3
4 CLUSTERING - Clustering is the method of dividing the objects into clusters that are similar between them and are dissimilar to the objects belonging to another cluster.
Example E1: Clustering books of different unknown themes E2: Clustering medical scans E3: Clustering pictures - In fact, clustering is one of the most utilized data mining techniques. It has a long history, and used in almost every field, e.g., medicine, psychology, botany, sociology, biology, archeology, marketing, insurance, libraries, etc. In recent years, due to the rapid increase of online documents, text clustering becomes more and more important. 5
Main Aspects - Clustering Algorithm : group data points into clusters based on similarity. The choice of a clustering algorithm depends on the data structure, size, dimensionality, and desired outcome. - A distance (similarity, or dissimilarity) function - Clustering quality : based on how •Inter-clusters distance ïƒ maximized •Intra-clusters distance ïƒ minimized The quality of a clustering result depends on the algorithm, the distance function and the application. 6
Recap Minkowski distance: For p=1: Manhattan distance For p=2: Euclidean distance 7
Recap Minkowski distance: For p=1: Manhattan distance For p=2: Euclidean distance 10
Example of algorithms - Partition-Based Clustering: 1. K-Means: Divides data into k clusters by minimizing variance within clusters. 2. K-Means++: same as K-Means except that it favors the selection of distant seeds. It is now the default initialization method in many libraries (e.g., Scikit-learn). 3. K-Medoids (PAM): Similar to K-Means but uses medoids (actual data points) instead of centroids. - Hierarchical Clustering: 1. Agglomerative (Bottom-Up): Starts with individual points and merges clusters. 2. Divisive (Top-Down): Starts with one cluster and splits it iteratively. 11
K-Means Algorithm 12
K-Means Example Example Walkthrough Input: Dataset: {(1,1),(1,4), (4,1), (4,4),(8,1)} k=2 (Two clusters) Step 1: Initialization Randomly select two points as initial centroids: Centroid 1: (1,1) Centroid 2: (4,4) 13
K-Means Example Input: Dataset: {(1,1),(1,4), (4,1), (4,4),(8,1)} and k=2 (Two clusters) Step 1: Initialization: Centroid 1: (1,1) and Centroid 2: (4,4) Step 2: Assign Points to Clusters Compute distances of each point to the centroids: Point (1,1): Closest to Centroid 1. Point (1,4): Closest to Centroid 2. Point (4,1): Closest to Centroid 1. Point (4,4): Closest to Centroid 2. Point (8,1): Closest to Centroid 1. Cluster assignments: Cluster 1: (1,1),(4,1),(8,1) Cluster 2: (1,4),(4,4) 14
Key Notes: Distance Metric: Euclidean distance is most common but can be replaced (e.g., Manhattan, cosine). Initialization Sensitivity: Results depend on initial centroid selection. Convergence Criteria: Stops when cluster assignments stabilize or changes in centroid positions are minimal. Iterative: Typical runtime is linear in terms of data points and clusters, but it may get stuck in a local minimum. 15
K-Means Exercise: 16
K-Means exercise 17
K-Means exercise 18
19 K-Means exercise
20 K-Means exercise
Partition-Based Clustering Example Algorithms : K-Means , K-Medoids (PAM), When to Use : When clusters are spherical or convex. With moderate-sized datasets. When k, the number of clusters, is known beforehand. Limitations : Struggles with non-spherical clusters. Sensitive to outliers. 21
Hierarchical Clustering 22
Hierarchical Clustering Example Algorithms: Agglomerative (Bottom-Up) , Divisive (Top-Down) Hierarchical Agglomerative Clustering (HAC) is also called Hierarchical Bottom-Up Clustering or Agglomerative Hierarchical Clustering (AHC). When to Use: When you need a hierarchy of clusters (dendrogram). When the number of clusters is not predefined. Small to medium-sized datasets. Limitations: Computationally expensive for large datasets. Sensitive to noise. 23
Hard Vs soft clustering Hard clustering: - Each observation belongs to exactly one cluster - More common and easier to do Soft clustering: - An observation can belong to more than one cluster. - It makes more sense for applications like creating browsable hierarchies - e.g. You may want to put a pair of sneakers in two clusters: ( i ) sports and (ii) shoes 24
HAC algorithm 0- Starts with each data point as its own cluster (singleton clusters). 1- Iteratively, Coputes the similiraty between clusters, merges the closest (most similar) clusters until all data points are in a single cluster or a desired number of clusters is reached. The process creates a hierarchy (dendrogram) of clusters that can be cut at different levels to obtain varying numbers of clusters. 25
Initial step 26
Intermediate step 27
Similarity metrics Similar to k-Means, the most commonly used is the Euclidean distance or the Manhattan distance. 28
Distance between clusters? Linkage criteria for merging Single Linkage ( Single-link distance): Minimum distance between points in two clusters. Complete Linkage ( Complete-link distance): Maximum distance between points in two clusters. Average Linkage ( Group average distance): Average distance between points in two clusters. Ward's Method: Minimizes the increase in total within-cluster variance. 29