Variable selection with Random Forests for missing data

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Hapfelmeier, Alexander und Ulm, Kurt (15. Januar 2013): Variable selection with Random Forests for missing data. Department of Statistics: Technical Reports, Nr. 137 [PDF, 563kB]

[thumbnail of TechnicalReport_LMU_10012013.pdf]

Vorschau

DOI: 10.5282/ubm/epub.14344

Abstract

Variable selection has been suggested for Random Forests to improve their efficiency of data prediction and interpretation. However, its basic element, i.e. variable importance measures, can not be computed straightforward when there is missing data. Therefore an extensive simulation study has been conducted to explore possible solutions, i.e. multiple imputation, complete case analysis and a newly suggested importance measure for several missing data generating processes. The ability to distinguish relevant from non-relevant variables has been investigated for these procedures in combination with two popular variable selection methods. Findings and recommendations: Complete case analysis should not be applied as it lead to inaccurate variable selection and models with the worst prediction accuracy. Multiple imputation is a good means to select variables that would be of relevance in fully observed data. It produced the best prediction accuracy. By contrast, the application of the new importance measure causes a selection of variables that reflects the actual data situation, i.e. that takes the occurrence of missing values into account. It's error was only negligible worse compared to imputation.

Dokumententyp:	Paper
Publikationsform:	Submitted Version
Keywords:	random forests, variable selection, missing data, multiple imputation, surrogates, complete case analysis
Fakultät:	Mathematik, Informatik und Statistik Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-14344-5
Sprache:	Englisch
Dokumenten ID:	14344
Datum der Veröffentlichung auf Open Access LMU:	15. Jan. 2013 18:11
Letzte Änderungen:	13. Aug. 2024 11:44

Dokument bearbeiten