To tune or not to tune the number of trees in random forest?

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Probst, Philipp und Boulesteix, Anne-Laure (16. Mai 2017): To tune or not to tune the number of trees in random forest?

Volltext auf 'Open Access LMU' nicht verfügbar.

Externer Volltext: https://arxiv.org/abs/1705.05654

Abstract

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

Dokumententyp:	Paper
Keywords:	Random forest, number of trees, bagging, out-of-bag, error rate
Fakultät:	Mathematik, Informatik und Statistik > Statistik
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
Sprache:	Deutsch
Dokumenten ID:	39384
Datum der Veröffentlichung auf Open Access LMU:	29. Jun. 2017 07:46
Letzte Änderungen:	29. Jun. 2017 07:46

Dokument bearbeiten