Random forest versus logistic regression: A large-scale benchmark experiment

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Couronné, Raphael; Probst, Philipp und Boulesteix, Anne-Laure (2018): Random forest versus logistic regression: A large-scale benchmark experiment. In: BMC Bioinformatics 19: S. 1-14 [PDF, 1MB]

[thumbnail of Probst_Boulesteix_Couronne_Random_forest_versus_logistic_regression.pdf]

Vorschau

DOI: 10.1186/s12859-018-2264-5

Abstract

BACKGROUND AND GOAL The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS In this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases. CONCLUSION RF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =0.022,0.038) for the accuracy, 0.041 (95{\%}-CI =0.031,0.053) for the Area Under the Curve, and - 0.027 (95{\%}-CI =-0.034,-0.021) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Medizin > Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
URN:	urn:nbn:de:bvb:19-epub-57405-0
ISSN:	1471-2105
Sprache:	Englisch
Dokumenten ID:	57405
Datum der Veröffentlichung auf Open Access LMU:	31. Aug. 2018 06:10
Letzte Änderungen:	04. Nov. 2020 13:37

Dokument bearbeiten