Random Forest variable importance with missing data

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Hapfelmeier, Alexander; Hothorn, Torsten und Ulm, Kurt (15. Februar 2012): Random Forest variable importance with missing data. Department of Statistics: Technical Reports, Nr. 121 [PDF, 567kB]

Vorschau

DOI: 10.5282/ubm/epub.12757

Abstract

Random Forests are commonly applied for data prediction and interpretation. The latter purpose is supported by variable importance measures that rate the relevance of predictors. Yet existing measures can not be computed when data contains missing values. Possible solutions are given by imputation methods, complete case analysis and a newly suggested importance measure. However, it is unknown to what extend these approaches are able to provide a reliable estimate of a variables relevance. An extensive simulation study was performed to investigate this property for a variety of missing data generating processes. Findings and recommendations: Complete case analysis should not be applied as it inappropriately penalized variables that were completely observed. The new importance measure is much more capable to reflect decreased information exclusively for variables with missing values and should therefore be used to evaluate actual data situations. By contrast, multiple imputation allows for an estimation of importances one would potentially observe in complete data situations.

Dokumententyp:	Paper
Publikationsform:	Preprint
Keywords:	Random Forests, variable importance measures, missing data, multiple imputation, surrogates, complete case analysis
Fakultät:	Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-12757-8
Sprache:	Englisch
Dokumenten ID:	12757
Datum der Veröffentlichung auf Open Access LMU:	15. Feb. 2012 17:10
Letzte Änderungen:	04. Nov. 2020 12:53

Dokument bearbeiten