Nearest neighbor imputation for categorical data by weighting of attributes

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Faisal, Shahla und Tutz, Gerhard (2022): Nearest neighbor imputation for categorical data by weighting of attributes. In: Information Sciences, Bd. 592: S. 306-319

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1016/j.ins.2022.01.056

Abstract

Missing values are a common phenomenon in modern medical research of complex diseases. The data often contains nominal or categorical variables, for example, single nucleotide polymorphisms (SNPs) in genetic studies. If the missing values are not handled properly, the downstream statistical analysis of incomplete data may be biased. While various imputation methods are available for metrically scaled variables, methods for categorical data are scarce. An imputation method that has been shown to work well for high dimensional metrically scaled variables is the imputation by nearest neighbor methods. In this paper, we propose a weighted nearest neighbors approach to impute missing values in categorical variables in high dimensional datasets. The proposed method explicitly uses the information on the association among attributes. Using different simulation settings, the performance is compared with available imputation methods. A variety of real data sets, containing heart, DNA, and lymphatic cancer, is also used to support the results obtained by simulations. The results show that the weighting of attributes yields smaller imputation errors than existing approaches like random forest and MICE. (C) 2022 Elsevier Inc. All rights reserved.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Mathematik, Informatik und Statistik > Statistik
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
ISSN:	0020-0255
Sprache:	Englisch
Dokumenten ID:	110982
Datum der Veröffentlichung auf Open Access LMU:	02. Apr. 2024 07:22
Letzte Änderungen:	02. Apr. 2024 07:22

Dokument bearbeiten