Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Mautz, Dominik; Ye, Wei; Plant, Claudia und Boehm, Christian (2020): Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans. In: ACM Transactions on Knowledge Discovery From Data, Bd. 14, Nr. 5, 55

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1145/3385652

Abstract

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of Nr-Kmeans that harnesses Hartigan's dip test to identify the number of clusters for each subspace automatically.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
ISSN:	1556-4681
Sprache:	Englisch
Dokumenten ID:	89020
Datum der Veröffentlichung auf Open Access LMU:	25. Jan. 2022 09:28
Letzte Änderungen:	25. Jan. 2022 09:28

Dokument bearbeiten