Abstract
Missing values are a major problem in medical research. As the complete case analysis dis-cards useful information, estimation and inference may suffer strongly. Multiple imputa-tion has been shown to be a useful strategy to handle missing data problems and account for the uncertainty of imputation. In the presence of high-dimensional data (p >> n), the missing values raise even more serious problems as the existing software packages tend to fail. We present multiple imputation methods based on nearest neigh-bors. The distances are computed using the information of correlation among the target and candidate predictors. Thus only the relevant predictors contribute for computing dis-tances. The method successfully imputes missing values also in high-dimensional settings. Using a variety of simulated data with MCAR and MAR missing patterns, the proposed algo-rithm is compared to existing methods. Various measures are used to compare the perfor-mance of methods, including MSE for imputation, MSE of estimated regression coefficients, their standard errors, confidence intervals, and their coverage probabilities. The simulation results, for both cases n < p and n > p, show that the sequential imputation using weighted nearest neighbors can be successfully applied to a wide range of data settings and outper-forms or is close to the best when compared to existing methods. (c) 2021 Elsevier Inc. All rights reserved.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Mathematik, Informatik und Statistik > Statistik |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
ISSN: | 0020-0255 |
Sprache: | Englisch |
Dokumenten ID: | 98004 |
Datum der Veröffentlichung auf Open Access LMU: | 05. Jun. 2023, 15:27 |
Letzte Änderungen: | 05. Jun. 2023, 15:27 |