A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Fuchs, Mathias und Krautenbacher, Norbert (November 2014): A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs. Department of Statistics: Technical Reports, Nr. 173 [PDF, 405kB]

Es gibt eine neuere Version des Dokumentes.

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

DOI: 10.5282/ubm/epub.21858

Abstract

The mean prediction error of a classification or regression procedure can be estimated using resampling designs such as the cross-validation design. We decompose the variance of such an estimator associated with an arbitrary resampling procedure into a small linear combination of covariances between elementary estimators, each of which is a regular parameter as described in the theory of $U$-statistics. The enumerative combinatorics of the occurrence frequencies of these covariances govern the linear combination's coefficients and, therefore, the variance's large scale behavior. We study the variance of incomplete U-statistics associated with kernels which are partly but not entirely symmetric. This leads to asymptotic statements for the prediction error's estimator, under general non-empirical conditions on the resampling design. In particular, we show that the resampling based estimator of the average prediction error is asymptotically normally distributed under a general and easily verifiable condition. Likewise, we give a sufficient criterion for consistency. We thus develop a new approach to understanding small-variance designs as they have recently appeared in the literature. We exhibit the $U$-statistics which estimate these variances. We present a case from linear regression where the covariances between the elementary estimators can be computed analytically. We illustrate our theory by computing estimators of the studied quantities in an artificial data example.

Dokumententyp:	Paper
Keywords:	U-statistic, cross-validation, limit theorem, design, model selection
Fakultät:	Mathematik, Informatik und Statistik > Statistik Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-21858-9
Sprache:	Englisch
Dokumenten ID:	21858
Datum der Veröffentlichung auf Open Access LMU:	18. Nov. 2014 17:17
Letzte Änderungen:	04. Nov. 2020 13:02

Alle Versionen dieses Dokumentes

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs. (deposited 18. Nov. 2014 17:17) [momentan angezeigt]
- Minimization and estimation of the variance of prediction errors for cross-validation designs. (deposited 22. Mrz. 2016 18:38)

Dokument bearbeiten