Abstract
We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a pregnancy and birth data set.
Dokumententyp: | Paper |
---|---|
Keywords: | Categorical variables, association test, contingency table, exact distribution, variable selection, selection bias |
Fakultät: | Mathematik, Informatik und Statistik > Statistik > Sonderforschungsbereich 386
Sonderforschungsbereiche > Sonderforschungsbereich 386 |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
URN: | urn:nbn:de:bvb:19-epub-1818-8 |
Dokumenten ID: | 1818 |
Datum der Veröffentlichung auf Open Access LMU: | 11. Apr. 2007 |
Letzte Änderungen: | 29. Apr. 2016, 08:50 |