Module 6 - clusteriuufigggggg 5 DMW.pptx

rprahulcoder 5 views 7 slides Oct 29, 2025
Slide 1
Slide 1 of 7
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7

About This Presentation

Ufuffufufufufuggugifigivivyxkgyxktxotddot


Slide Content

Density Based Methods Used to find clusters of arbitrary shapes based on density. Density of an object o is based on number of objects close to o. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) finds core objects, that is, objects that have dense neighbourhoods. It connects core objects and their neighbourhoods to form dense regions as clusters . Parameters ϵ-neighbourhood : parameter specified by user (sets distance value: radius of neighbourhood considered) Minpts : parameter specified by user to determine core objects. Clustering is done based on core objects and their neighbourhoods.

Density R eachability & Density Connectivity Given Minpts =3; m,p,o,r – core objects. ; q is density reachable from p but not vice versa because q is not a core object. Similarly, r and s are density-reachable from o and o is density-reachable from r. Thus, o, r, and s are all density-connected.

DBSCAN Algorithm All objects in a given data set D are marked as “unvisited.” Randomly select an unvisited object p, mark p as “visited,” and check whether the ϵ - neighborhood of p contains at least MinPts objects. If not, p is marked as a noise point. Otherwise, a new cluster C is created for p, and all t he objects in the ϵ - neighborhood of p are added to a candidate set, N. Iteratively add to C those objects in N that do not belong to any cluster. ( In this process, for an object p’ in N that carries the label “unvisited,” DBSCAN marks it as “visited” and checks its ϵ - neighborhood . If the ϵ - neighborhood of p’ has at least MinPts objects, those objects in the ϵ - neighborhood of p’ are added to N. DBSCAN continues adding objects to C until C can no longer be expanded, or N is empty. At this time, cluster C is completed, and thus is output.) To find the next cluster, DBSCAN randomly selects an unvisited object from the remaining ones. The clustering process continues until all objects are visited.

OPTICS: Ordering Points to Identify the Clustering Structure Alleviates the difficulty of selecting appropriate parameter values for acceptable clusters Parameters are usually empirically set and difficult to determine, especially for real-world, high dimensional data sets. Most real world datasets may have skewed distributions and their clustering structure may not be well defined – a single set of global density parameters may not be sufficient.

OPTICS OPTICS – overcomes the difficulty in using one set of global parameters in clustering analysis. OPTICS does not explicitly cluster data instead, it outputs a cluster ordering. This linear list represents the clustering structure of the dataset. The cluster ordering can be used to extract basic clustering information (cluster centers , arbitrary-shaped clusters), derive the intrinsic clustering structure, as well as provide a visualization of the clustering.

To get the different clusters, objects are processed in increasing order of ϵ value. Higher density clusters are processed first. This requires Core distance – The minimum distance that would make an objects a core point. (p has Minpts (here 5) in its ϵ neighbourhood). Reachability distance - minimum radius value that makes p density-reachable from q (core object ). Given by max{(core-distance(q), dist(p, q)}
Tags