Unsupervised Learning Clustering KMean and Hirarchical.pptx

FaridAliMousa1 23 views 29 slides Apr 27, 2024

Slide 1 of 29

About This Presentation

Unsupervised Learning

Size: 264.71 KB

Language: en

Added: Apr 27, 2024

Slides: 29 pages

Slide Content

Clustering

Clustering is the process of grouping a set of patterns. Representations or descriptions of the clusters formed are used in decision making classification is one of the popular decision-making paradigms used.

Clustering Types Clustering algorithms which typically group sets of unlabelled patterns called unsupervised classification . Algorithms which cluster labelled patterns called: supervised clustering The process of clustering is carried out so that patterns in the same cluster are similar whereas different clusters are dissimilar

Inter-cluster and intra-cluster The Euclidean distance between any two points belonging to the same cluster is smaller than that between any two points belonging to different clusters.

The distance matrix All sub-matrices satisfy the condition that the value of any entry in the sub-matrix is less than 5 units.

In the previous example, the intra-cluster distance is less than 5 units and the distance between two points belonging to two different clusters ( inter-cluster distance ) is greater than 5 units .

Applications Data reduction Example: let Computing the clusters using the previous technique yields:

Continue So These reduced data can be used for the nearest neighbor algorithm So, there is a reduction in both space and time.

Data re- organisation Can eliminate Similar rows

Removing outliers clustering can identify outliers. The outlier will show up as an element of a singleton cluster. This forms the basis for a variety of applications like automatic auditing of records in a database and detection of intrusions .

K-means algorithm

Example Pattern Feature 1 Feature 2 p1 1.0 1.0 p2 1.5 2.0 p3 3.0 4.0 p4 5.0 7.0 p5 3.5 5.0 p6 4.5 5.0 p7 3.5 4.5 Cluster the following data into two clusters using pattern p1 and p4 as cluster centers :

Step 1 Individual Mean Vector ( centroid ) Group 1 1 (1.0, 1.0) Group 2 4 (5.0, 7.0) The initial means are.

The assigned patterns Cluster 1 Cluster 2 Step Individual Mean Vector ( centroid ) Individual Mean Vector (centroid) 1 1 (1.0, 1.0) 4 (5.0, 7.0) 2 1, 2 (1.2, 1.5) 4 (5.0, 7.0) 3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0) 4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0) 5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7) 6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)

The new means Individual Mean Vector (centroid) Cluster 1 1, 2, 3 (1.8, 2.3) Cluster 2 4, 5, 6, 7 (4.1, 5.4)

The distances Individual Distance to mean ( centroid ) of Cluster 1 Distance to mean ( centroid ) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1

Step 5, the final means individual 3 is nearer to the mean of the opposite cluster (Cluster 2). A new iteration is required. So, add pattern 3 to cluster 2 The final result is: Individual Mean Vector (centroid) Cluster 1 1, 2 (1.3, 1.5) Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)

Finally… Putting all together

An Application: Handwritten Digit Recognition There are ten classes corresponding to the handwritten digits ‘‘0’’ to ‘‘9’’. The data set consists of 6670 training patterns and 3333 test patterns. The nearest neighbour algorithm (NN), the kNN and the mkNN algorithms have been used on this data set.

An Application: Handwritten Digit Recognition The k-means algorithm (KMA) is used on the data and the centroids of the clusters formed are used as prototypes representing all the patterns in the cluster.

Description of the Digit Data Each original digit pattern is a binary image of size 32 ×24 pixels. The Feature vector: Non-overlapping windows of size 2 × 2 are formed over the entire image and each window is replaced by one feature whose value corresponds to the number of one bits in that window. This results in 192 features, where the value of each feature varies from 0 to 4. the data patterns vary in terms of orientation of the digit, width and height.

A sample collection of training patterns

Mean and standard deviation of each class

More data reduction The dimensionality of the data set is further reduced by using the 2x2 windows. Finally, each pattern gets a total of 48 features. the dimensionality reduction improve the speed of the nearest neighbour classifier

Pre-processing of Data Take some algorithms to deal with scaling, translation, and rotation problems before feature extraction.

Recognition results Results show that, the classification accuracy ranges from 65% to 93% depending on the variation of NN classifiers, prototyping, and the clustering methods used.

Unsupervised Learning Clustering KMean and Hirarchical.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Unsupervised Learning Clustering KMean and Hirarchical.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd