k-SubMix: Common Subspace Clustering on Mixed-Type Data

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Klein, Mauritius; Leiber, Collin ORCID: https://orcid.org/0000-0001-5368-5697 und Böhm, Christian ORCID: https://orcid.org/0000-0002-2237-9969 (2023): k-SubMix: Common Subspace Clustering on Mixed-Type Data. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Turin, Italy, 18.-22. September 2023. Koutra, Danai; Plant, Claudia; Gomez Rodriguez, Manuel; Baralis, Elena und Bonchi, Francesco (Hrsg.): In: Machine Learning and Knowledge Discovery in Databases: Research Track, Lecture Notes in Computer Science Bd. 14169 Cham: Springer. S. 662-677

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1007/978-3-031-43412-9_39

Abstract

Clustering heterogeneous data is an ongoing challenge in the data mining community. The most prevalent clustering methods are designed to process datasets with numerical features only, but often datasets consist of mixed numerical and categorical features. This requires new approaches capable of handling both kinds of data types. Further, the most relevant cluster structures are often hidden in only a few features. Thus, another key challenge is to detect those specific features automatically and abandon features not relevant for clustering. This paper proposes the subspace mixed-type clustering algorithm k-SubMix, which tackles both challenges. Its cost function can handle both numerical and categorical features while simultaneously identifying those with the biggest impact for a high-quality clustering result. Unlike other subspace mixed-type clustering methods, k-SubMix preserves inter-cluster comparability, as it is the first mixed-type approach that defines a common subspace for all clusters. Extensive experiments show that k-SubMix outperforms competitive methods and reduces the data’s complexity by a simultaneous dimensionality reduction.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
ISBN:	978-3-031-43411-2 ; 978-3-031-43412-9
ISSN:	0302-9743
Ort:	Cham
Sprache:	Englisch
Dokumenten ID:	123648
Datum der Veröffentlichung auf Open Access LMU:	18. Feb. 2025 18:55
Letzte Änderungen:	18. Feb. 2025 18:55

Dokument bearbeiten