Abstract
Existing clustering algorithms aim at identifying clusters from a single dataset. However, many applications generate a series of datasets. For example, scientists need to repeat an experiment many times to ensure reproducibility;sensors collect information day after day. In such scenarios, we need to identify clusters separately from a large number of datasets, which can contain an unknown number of clusters with various densities and shapes. Density-based clustering algorithms are commonly used in identifying arbitrary shaped clusters when the cluster number is unknown. Most density-based clustering algorithms are "DBSCAN-alike", where clusters are formed by connecting consecutive high dense regions. Therefore, points are grouped as one cluster as long as they are densely connected. When the distribution shape of points is changed across different datasets, parameter tuning on each dataset is necessary to obtain proper results, which is time-consuming. In this work, we developed a new kNN density-based clustering algorithm, which does not adopt the DBSCAN paradigm. Instead, we identify clusters by maximizing the intra-cluster similarities, which are estimated using: 1) the probability that two points belong to the same cluster;2) the probability that a point is a cluster center. The kNN concept and minimum spanning tree are used to compute both probabilities. Our approach is capable of extracting clusters in arbitrary shapes using the single parameter k, and can handle a series of datasets with less parameter tuning effort. Experiments on both synthetic and real-world datasets show that our approach outperforms other recent kNN clustering algorithms.
Item Type: | Journal article |
---|---|
Faculties: | Mathematics, Computer Science and Statistics > Computer Science |
Subjects: | 000 Computer science, information and general works > 004 Data processing computer science |
ISSN: | 2161-4393 |
Language: | English |
Item ID: | 89021 |
Date Deposited: | 25. Jan 2022, 09:28 |
Last Modified: | 25. Jan 2022, 09:28 |