Minimization and estimation of the variance of prediction errors for cross-validation designs

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Fuchs, Mathias und Krautenbacher, Norbert (November 2014): Minimization and estimation of the variance of prediction errors for cross-validation designs. Department of Statistics: Technical Reports, Nr. 173 [PDF, 468kB]

Dies ist die neueste Version des Dokumentes.

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Akzeptierte Version

Vorschau

DOI: 10.1080/15598608.2016.1158675

Abstract

We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data's probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice.

Dokumententyp:	Paper
Keywords:	U-statistic; cross-validation; design; model selection.
Fakultät:	Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-27656-8
Sprache:	Englisch
Dokumenten ID:	27656
Datum der Veröffentlichung auf Open Access LMU:	22. Mrz. 2016 18:38
Letzte Änderungen:	04. Nov. 2020 13:07

Alle Versionen dieses Dokumentes

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs. (deposited 18. Nov. 2014 17:17)
- Minimization and estimation of the variance of prediction errors for cross-validation designs. (deposited 22. Mrz. 2016 18:38) [momentan angezeigt]

Dokument bearbeiten