Improved outcome prediction across data sources through robust parameter tuning

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Schüller, Nicole; Boulesteix, Anne-Laure; Bischl, Bernd; Unger, Kristian und Hornung, Roman (18. März 2019): Improved outcome prediction across data sources through robust parameter tuning. Department of Statistics: Technical Reports, Nr. 221 [PDF, 522kB]

[thumbnail of Schueller_Bouelesteix_Bischl_Unger_Hornung_Improved_Outcome_prediction.pdf]

Vorschau

DOI: 10.5282/ubm/epub.61275

Abstract

In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they are, in principle, similar in terms of population and definitions of the variables. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance within the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these tuning approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic real data sets, tuning on external data and robust tuning with tuned robustness parameter are the two approaches leading to better generalizing prediction rules. All R code written to produce and evaluate our results is available online.

Dokumententyp:	Paper
Keywords:	Prediction; Robust modeling; Tuning parameter value optimization; Batch effects
Fakultät:	Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 500 Naturwissenschaften
URN:	urn:nbn:de:bvb:19-epub-61275-1
Sprache:	Englisch
Dokumenten ID:	61275
Datum der Veröffentlichung auf Open Access LMU:	19. Mrz. 2019 06:26
Letzte Änderungen:	04. Nov. 2020 13:39

Dokument bearbeiten