Clustering in data analytics.This PPT tells about a clear view of clustering in non euclidiean space in data analytics using R

goviraj098765 9 views 9 slides Sep 01, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

This PPT tells about a clear view of clustering in non euclidiean space in data analytics using R


Slide Content

Clustering in non-euclidean space

Introduction to Clustering in Non-Euclidean Space Definition of Clustering: Grouping similar data points into clusters based on certain criteria.
Non-Euclidean Space: Refers to spaces where the traditional Euclidean distance (e.g., straight-line distance) does not apply. Types of Non-Euclidean Spaces: Manifold learning
Graph-based data
Discrete spaces

Common Non-Euclidean Metrics: Cosine similarity, Jaccard similarity, etc. Why Non-Euclidean Space ? Limitations of Euclidean Space: It’s not always applicable to complex, high-dimensional, or non-linear data. Examples : Text data (similarity between documents, cosine distance).
Graph data (distance between nodes in networks).

Clustering Algorithms in Non-Euclidean Space K-means in Non-Euclidean Spaces: Requires adapting distance metrics (e.g., cosine distance). DBSCAN: Density-based clustering that can work with arbitrary distance functions. Hierarchical Clustering: Can use different distance metrics (e.g., Manhattan, cosine). Spectral Clustering: Often applied to graph data, using similarity matrices.

Example install.packages (“proxy”) install.packages (“cluster”)
library(proxy)
library(cluster)
data <- data.frame (
Feature1 = c(1, 4, 5),
Feature2 = c(2, 5, 6) ) manhattan_dist <- proxy:: dist (data, method = “Manhattan”) kmedoids_result <- pam( manhattan_dist , k = 2)
print( kmedoids_result )

Explanation: Data: We created a tiny dataset with 3 points and 2 features.
Manhattan Distance: proxy:: dist () calculates the Manhattan distance between each pair of data points.
K- Medoids Clustering: The pam() function performs K- medoids clustering with k = 2 clusters.

Output: You will get the clusters assigned to each point and the medoids (representative points) for each cluster.

Conclusion: Clustering in Non-Euclidean Spaces: Essential for non-linear and complex data.
Utilizes alternative distance measures like cosine similarity or Jaccard . Tools in R: R offers robust tools for clustering in non-Euclidean spaces, including packages like proxy, cluster, and hclust . Applications : Natural language processing (NLP), image recognition, graph analysis.

Thank you!!
Tags