Tutz, Gerhard; Ramzan, Shahla (13. Oktober 2014): Improved Methods for the Imputation of Missing Data by Nearest Neighbor Methods. Department of Statistics: Technical Reports, Nr. 172




Missing data is an important issue in almost all fields of quantitative research. A nonparametric procedure that has been shown to be useful is the nearest neighbor imputation method. We suggest a weighted nearest neighbor imputation method based on Lq-distances. The weighted method is shown to have smaller imputation error than available NN estimates. In addition we consider weighted neighbor imputation methods that use selected distances. The careful selection of distances that carry information on the missing values yields an imputation tool that outperforms competing nearest neighbor methods distinctly. Simulation studies show that the suggested weighted imputation with selection of distances provides the smallest imputation error, in particular when the number of predictors is large. In addition, the selected procedure is applied to real data from different fields.