A Framework for Unbiased Model Selection Based on Boosting

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Hofner, Benjamin; Hothorn, Torsten; Kneib, Thomas und Schmid, Matthias (10. Dezember 2009): A Framework for Unbiased Model Selection Based on Boosting. Department of Statistics: Technical Reports, Nr. 72 [PDF, 914kB]

Vorschau

DOI: 10.5282/ubm/epub.11243

Abstract

Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models.

Dokumententyp:	Paper
Keywords:	effective degrees of freedom, penalized least squares base-learner, penalized ordinal predictors, P-splines, ridge penalization, variable selection
Fakultät:	Mathematik, Informatik und Statistik Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-11243-8
Sprache:	Englisch
Dokumenten ID:	11243
Datum der Veröffentlichung auf Open Access LMU:	11. Dez. 2009 09:17
Letzte Änderungen:	13. Aug. 2024 11:44

Dokument bearbeiten