Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Strobl, Carolin und Zeileis, Achim (30. Januar 2008): Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance. Department of Statistics: Technical Reports, Nr. 17 [PDF, 323kB]

Vorschau

Download (323kB)

DOI: 10.5282/ubm/epub.2111

Abstract

Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., in genetics and bioinformatics. We highlight both advantages and limitations of different variable importance scores and associated testing procedures, especially in the context of correlated predictor variables. For the test of Breiman and Cutler (2008), we investigate the statistical properties and find that the power of the test depends both on the sample size and the number of trees, an arbitrarily chosen tuning parameter, leading to undesired results that nullify any significance judgments. Moreover, the specification of the null hypothesis of this test is discussed in the context of correlated predictor variables.

Dokumententyp:	Paper
Keywords:	feature selection, variable importance, permutation tests
Fakultät:	Mathematik, Informatik und Statistik > Statistik > Technische Reports
URN:	urn:nbn:de:bvb:19-epub-2111-8
Sprache:	Englisch
Dokumenten ID:	2111
Datum der Veröffentlichung auf Open Access LMU:	01. Feb. 2008, 08:25
Letzte Änderungen:	04. Nov. 2020, 12:46

Dokument bearbeiten