Fuzzy Clustering & Fuzzy Classification Method
zahramojtahediin
26 views
30 slides
Dec 30, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Fuzzy clustering, also known as soft clustering, is a method of clustering where each data point can belong to more than one cluster. This contrasts with traditional "hard" clustering methods, such as k-means, where each data point belongs to exactly one cluster. Fuzzy clustering is partic...
Fuzzy clustering, also known as soft clustering, is a method of clustering where each data point can belong to more than one cluster. This contrasts with traditional "hard" clustering methods, such as k-means, where each data point belongs to exactly one cluster. Fuzzy clustering is particularly useful when cluster boundaries are not well-defined.
Table of contents 01 03 02 04 Fuzzy Clustering Goals of Fuzzy Clustering K-means (Review) 05 07 06 08 C-means Fuzzy Clustering Application Pros and Cons KFCM Iris Dataset Segmentation
Fuzzy Clustering 01 Fuzzy clustering allows data points to belong to multiple clusters with varying degrees of membership. 9 7 3 5
Fuzzy Clustering Definition: Allows each data point to belong to multiple clusters with varying degrees of membership. It differs from traditional clustering, where each data point belongs to exactly one cluster. Membership Degrees: Each data point has a set of membership degrees to different clusters. The sum of these membership degrees equals 1. Unsupervised Learning: Fuzzy clustering is an unsupervised learning method. The model groups data without using predefined labels. Benefits: Flexibility in partitioning data. Ability to assign a data point to multiple clusters. Efficient when boundaries between clusters are not clearly defined. Allows for nuanced analysis and better handling of uncertainty and overlap in data.
Crisp & Fuzzy Cluster Crisp Clustering: Each data point belongs to exactly one cluster. The boundaries between clusters are clear and distinct. Data points have binary membership: either they belong to a cluster, or they do not. Example of crisp clustering: K-Means algorithm. Fuzzy Clustering: Each data point can belong to multiple clusters with varying degrees of membership. The boundaries between clusters are vague and flexible. Data points have relative membership, meaning they have varying degrees of belonging to different clusters. Example of fuzzy clustering: Fuzzy C-Means algorithm.
Goals of Fuzzy Clustering 2 The primary goal of fuzzy clustering is to identify patterns in data with increased accuracy and flexibility. 9 7 3 5
Goals of Fuzzy Clustering Pattern recognition: Fuzzy clustering helps us identify patterns in data. This is particularly useful in cases where data points are not clearly separable. Increased accuracy: By using fuzzy clustering, we can increase the accuracy of analyses because this method allows a data point to belong to multiple clusters. Reduced complexity: Fuzzy clustering can help reduce the complexity of predictive models, as it provides more precise information about the relationships between data points. Greater flexibility: This method offers greater flexibility in data analysis, as it allows each data point to belong to multiple clusters. 7 5 8 6
K-means 3 is a crisp clustering algorithm, meaning each data point is assigned to exactly one cluster without overlapping or partial memberships. 9 7 3 5
K-means K-Means Clustering is one of the most commonly used algorithms in unsupervised machine learning for partitioning a dataset into distinct clusters. This algorithm assigns each data point to one of K clusters by minimizing the within-cluster variance.
K-means Choosing the Number of Clusters: Decide on the number of clusters, to partition the data. This is typically based on prior knowledge or by using methods like the elbow method. Initialization of Centroids: Randomly select K data points from the dataset as the initial cluster centroids. Assigning Data Points to Clusters: Each data point is assigned to the nearest centroid, forming K clusters. The distance metric commonly used is the Euclidean distance. Updating Centroids: Recalculate the centroid of each cluster by taking the mean of all data points assigned to that cluster. The centroid is the new center of the cluster. Iterating Until Convergence: Steps 3 and 4 are repeated until the centroids no longer change significantly, indicating that the algorithm has converged.
K-means Objective Function: The objective of K-means is to minimize the sum of squared distances between data points and their respective cluster centroids. This can be formulated as: where: is the number of clusters. represents the i-th cluster. x is a data point in cluster . is the centroid of cluster .
K-means Cluster Assignment Step: Assign each data point to the cluster with the nearest centroid: where is the cluster assignment for data point . Centroid Update Step: Update the centroid of each cluster by calculating the mean of all data points assigned to that cluster: where ∣ ∣ is the number of data points in cluster .
C-means 4 An algorithm that assigns data points to clusters with varying degrees of membership, enhancing flexibility and precision in clustering. 9 7 3 5
C-means Fuzzy C-Means (FCM) is an unsupervised learning algorithm that allows each data point to belong to multiple clusters with varying degrees of membership. This approach is particularly useful for analyzing complex and non-separable data. FCM clustering is a soft clustering approach, where each data point is assigned a likelihood or probability score belonging to that cluster.
C-means 1. Initialization : Determine the number of clusters ( ) and the fuzziness parameter ( ). Initialize the membership matrix ( ) with random values, where indicates the membership degree of data point ii in cluster . 2. Compute Cluster Centers : Calculate the cluster centers using the following formula: where is the data point i and is the total number of data points.
C-means 3. Update Membership Matrix : Update the membership degrees using the following formula: 4. Check for Convergence : If the change in the membership matrix is less than a specified threshold (e.g., ϵ), the algorithm stops. Otherwise, go back to step 2.
C-means
C-means
Fuzzy Clustering Application 05 9 7 3 5
Fuzzy Clustering Application Image Segmentation: Medical Imaging: Helps in segmenting different tissues or detecting anomalies in medical images. Satellite Imaging: Used in classifying land use and land cover from satellite images. Pattern Recognition: Handwriting Recognition: Distinguishes between different styles of handwriting. Voice Recognition: Classifies different voice patterns for speaker identification. Bioinformatics: Gene Expression Data: Clusters gene expression data to identify patterns and understand gene functions. Protein Structure Analysis: Groups similar protein structures for comparative analysis.
Iris Dataset Segmentation 6 9 7 3 5
Iris Dataset Origin: The dataset was introduced by British biologist and statistician Ronald A. Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems". Content: It contains 150 samples from three species of iris flowers (Iris setosa , Iris versicolor, and Iris virginica). Features: Each sample has four features: Sepal length (in centimeters) Sepal width (in centimeters) Petal length (in centimeters) Petal width (in centimeters) Classes: There are three classes, each corresponding to one species of iris flower (50 samples per class).
Iris Dataset
Iris Dataset
KFCM 7 9 7 3 5
KFCM FCM is not robust in noisy images Lack of local information of image pixels Spatial penalty
Pros and Cons 8 9 7 3 5
Pros and Cons Pros Gives best result for overlapped data set and comparatively better then k-means algorithm. Unlike k-means where data point must exclusively belong to one cluster center here data point is assigned membership to each cluster center as a result of which data point may belong to more then one cluster center. Cons Apriori specification of the number of clusters. With lower value of β we get the better result but at the expense of more number of iteration. Euclidean distance measures can unequally weight underlying factors. The performance of the FCM algorithm depends on the selection of the initial cluster center and/or the initial membership value.
J. C. Dunn (1973): 'A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters', Journal of Cybernetics 3: 32-57 J. C. Bezdek (1981): 'Pattern Recognition with Fuzzy Objective Function Algorithms', Plenum Press, New York Tariq Rashid: 'Clustering' Hans-Joachim Mucha and Hizir Sofyan : 'Nonhierarchical Clustering' http://www.cs.bris.ac.uk/home/ir600/documentation/fuzzy_clustering_initial_report/node11.html http://www.quantlet.com/mdstat/scripts/sg/ssd/html/sgshiftclumei149.html References