Abstract
Multiple-imputation (MI) is a method for treating the problem of missing data. There are various competing computational algorithms available in the R environment to address missing data problems of categorical and continuous variables. In the case of a high amount of missing information, large sample sizes and complex dependency structures among categorical variables, the utility of the provided R packages is some-what limited. A computationally expedient, fully Bayesian, joint modeling (JM) approach known as Dirichlet process mixtures of multinomial distributions" (DPMD), automatically models complex dependencies among variables. But this approach is limited to categorical variables only. We propose a simple and easy to implement combining algorithm which imputes continuous variables using various algorithms and uses the JM approach to detect complex dependency structures among categorical variables. We review, describe and evaluate software packages commonly available in R and compare the results with the proposed MI method by using as example an artificial data set. The results suggest that the MI approach which combines the JM approach and various algorithms based on generalized linear models dominates various algorithms when applied solely.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Mathematik, Informatik und Statistik > Statistik |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
ISSN: | 1018-046X |
Sprache: | Englisch |
Dokumenten ID: | 88854 |
Datum der Veröffentlichung auf Open Access LMU: | 25. Jan. 2022, 09:28 |
Letzte Änderungen: | 25. Jan. 2022, 09:28 |