Fuchs, Mathias; Krautenbacher, Norbert (November 2014): Minimization and estimation of the variance of prediction errors for crossvalidation designs. Department of Statistics: Technical Reports, No.173 
This is the latest version of this item.
Creative Commons Attribution  Accepted Version 468kB  
405kB 
Abstract
We consider the mean prediction error of a classification or regression procedure as well as its crossvalidation estimates, and investigate the variance of this estimate as a function of an arbitrary crossvalidation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data's probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leavepout estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of Kfold crossvalidation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than Kfold crossvalidation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice.
Item Type:  Paper (Technical Report) 

Keywords:  Ustatistic; crossvalidation; design; model selection. 
Faculties:  Mathematics, Computer Science and Statistics > Statistics Mathematics, Computer Science and Statistics > Statistics > Technical Reports 
Subjects:  500 Science > 510 Mathematics 
URN:  urn:nbn:de:bvb:19epub276568 
Language:  English 
ID Code:  27656 
Deposited On:  22. Mar 2016 18:38 
Last Modified:  04. Nov 2020 13:07 
Available Versions of this Item

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs. (deposited 18. Nov 2014 17:17)
 Minimization and estimation of the variance of prediction errors for crossvalidation designs. (deposited 22. Mar 2016 18:38) [Currently Displayed]