Density based clustering

Density based Clustering

Review: Supervised Learning From example pairs, learn a function that maps from input to output Fruit example:

Unsupervised Learning Given a pile of unlabelled data, what can you learn from it?

Examples how many types of fruits are there group types of customers detect anomalies or “odd behavior”

Clustering Most of these applications are related to the task of Clustering. In Classification, we draw separators to differentiate known classes of data. In Clustering, we make up different types of data and draw separators . Types: Connectivity models Centroid models Density Models Distribution models

Non-globular clusters What if the clusters are weird shapes? How would K-means fare for this data? Is there another way we could find clusters? Source: https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

Heirarchical clustering Much better clusters on this (made-up) data Still not perfect, what should we do with outliers? Does every point need to be assigned to a cluster? Source: https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

Density-based clustering Uses the density of surrounding points to assign clusters Not every point assigned to a cluster Source: https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

DBSACN DBSCAN is a density-based algorithm DBScan stands for Density-Based Spatial Clustering of Applications with Noise Density-based Clustering locates regions of high density that are separated from one another by regions of low density Density = number of points within a specified radius (Eps)

Concepts: Preliminary A point is a core point if it has more than a specified number of points (MinPts) within Eps These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point A noise point is any point that is not a core point or a border point

Parameter Estimation

Determining the parameters Eps and MinPts The parameters Eps and MinPts can be determined by a heuristic . Observation For points in a cluster , their k-th nearest neighbors are at roughly the same distance . Noise points have the k-th nearest neighbor at farther distance .  Plot sorted distance of every point to its k-th nearest neighbor.

Determining the parameters Eps and MinPts database Procedure Define a function k-dist from the database to the real numbers, mapping each point to the distance from its k-th nearest neighbor . S ort the points of the database in descending order of their k-dist values . k-dist

Determining the parameters Eps and MinPts Procedure Choose an arbitrary point p set Eps = k-dist(p) set MinPts = k . All points with an equal or smaller k-dist value will be cluster points k-dist p cluster points noise

DBSCAN : Advantages

DBSCAN : Disadvantages DBSCAN is not entirely deterministic: Border points that are reachable from more than one cluster can be part of either cluster, depending on the order the data is processed. The quality of DBSCAN depends on the distance measure used in the function regionQuery. (such as Euclidean distance) Has problems of identifying clusters of varying densities (SSN algorithm) If the data and scale are not well understood, choosing a meaningful distance threshold ε can be difficult.

OPTICS It computes an ordering of all objects in a given database. And It stores the core-distance and a suitable reachability-distance for each object in the database. OPTICS maintains a list called OrderSeeds to generate the output ordering. Objects in OrderSeeds are sorted by the reachability-distance from their respective closest core objects, that is, by the smallest reachability-distance of each object.

Optics Concepts: Core Distance- the minimum epsilon to make a distinct point a core point, given a finite MinPts parameter. Reachability Distance- the reachability-distance of an object p with respect to another object o is the smallest distance from o if o is a core object. It also cannot be smaller than the core distance of o.

OPTICS ALGORITHM EXAMPLE A I J L R P B K M N D C E F G H 44  re a ch seedlist: Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 April 30,2012 <number>

OPTICS ALGORITHM EXAMPLE se e d l is t : A I J L R P K M N D C E F G H A 44 re a ch   B c o re- distance (B,40) (I, 40) Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 April 30,2012 <number>

OPTICS ALGORITHM EXAMPLE 44 re a ch  A B A I J L R P B K M N D C E F G H Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 seedlist: (I, 40) (C, 40) April 30,2012 <number>

OPTICS ALGORITHM EXAMPLE 44 re a ch  A I J L R P B K M N D C E F G H A B I Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43) April 30,2012 <number>

OPTICS ALGORITHM EXAMPLE 44 re a ch  A I J L R P B K M N D C E F G H A B I J Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40) April 30,2012

OPTICS ALGORITHM EXAMPLE 44 re a ch  A I B J L R P K M N D C E F G H A B I J L … Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40) April 30,2012

OPTICS ALGORITHM EXAMPLE A I J L R P B K M N D C E F G H A B I J L M K N R P C D F G E H 44 re a ch  Example Database (2-dimensional, 16 points) ε = 44, MinPts = 3 seedlist: - April 30,2012

GRAPHICAL REPRESENTATION A data set’s cluster ordering can be represented graphically. It helps to visualize and understand the clustering structure in a data set. 32

Density based clustering

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Density based clustering

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......