Logo Logo
Switch Language to German
Janitza, Silke; Binder, Harald; Boulesteix, Anne-Laure (27. June 2014): Pitfalls of hypothesis tests and model selection on bootstrap samples: causes and consequences in biometrical applications. Department of Statistics: Technical Reports, No.163
WarningThere is a more recent version of this item available.


The bootstrap method has become a widely used tool that has been applied in diverse areas where results based on asymptotic theory are scarce. It can be applied for example for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently some approaches have been suggested in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it was the original sample. From the literature, however, there is evidence that these procedures might lead to more significant results or overcomplex models, respectively, when ignoring that the bootstrap sample is not a direct realization of the true underlying distribution. We explain why this is the case and illustrate that tests on bootstrap samples do not provide valid p-values, using the Z-test and likelihood ratio test as examples. We also illustrate that information criteria when computed based on bootstrap samples are not reliable, as suggested by known theory. Furthermore, we revisit four approaches in light of these considerations: estimation of the p-value distribution, model complexity selection, variable inclusion frequencies, and model averaging. Using simulation studies and evidence from the literature we demonstrate that these approaches might give misleading conclusions and discuss possible solutions to this problem.

Available Versions of this Item