Logo Logo
Hilfe
Hilfe
Switch Language to English

Beer, Anna ORCID logoORCID: https://orcid.org/0000-0002-6890-997X; Draganov, Andrew ORCID logoORCID: https://orcid.org/0000-0002-1617-4166; Hohma, Ellen ORCID logoORCID: https://orcid.org/0000-0002-5235-6856; Jahn, Philipp ORCID logoORCID: https://orcid.org/0009-0002-0059-9183; Frey, Christian M.M. ORCID logoORCID: https://orcid.org/0000-0003-2458-6651 und Assent, Ira ORCID logoORCID: https://orcid.org/0000-0002-1091-9948 (2023): Connecting the Dots -- Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering. 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Long Beach, CA, 06. - 10. August 2023. Singh, Ambuj (Hrsg.): In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York: Association for Computing Machinery. S. 80-92 [PDF, 2MB]

Abstract

Despite the popularity of density-based clustering, its procedural definition makes it difficult to analyze compared to clustering methods that minimize a loss function. In this paper, we reformulate DBSCAN through a clean objective function by introducing the density-connectivity distance (dc-dist), which captures the essence of density-based clusters by endowing the minimax distance with the concept of density. This novel ultrametric allows us to show that DBSCAN, k-center, and spectral clustering are equivalent in the space given by the dc-dist, despite these algorithms being perceived as fundamentally different in their respective literatures. We also verify that finding the pairwise dc-dists gives DBSCAN clusterings across all epsilon-values, simplifying the problem of parameterizing density-based clustering. We conclude by thoroughly analyzing density-connectivity and its properties -- a task that has been elusive thus far in the literature due to the lack of formal tools. Our code recreates every experiment below: https://github.com/Andrew-Draganov/dc_dist

Dokument bearbeiten Dokument bearbeiten