Boulesteix, Anne-Laure and Strobl, Carolin
Maximally selected chi-square statistics and umbrella orderings.
Collaborative Research Center 386, Discussion Paper 476
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two cutpoints that define an interval in the range of the ordinal predictor for which the probability of a positive outcome is particularly high (or low). A chi-square test may then be performed to compare the proportions of positive outcomes in and outside this interval. However, if the two cutpoints are chosen to maximize the chi-square statistic, referring the obtained chi-square statistic to the standard chi-square distribution is an inappropriate approach. It is then necessary to correct the p-value for multiple comparisons by considering the distribution of the maximally selected chi-square statistic instead of the nominal chi-square distribution. Here, we derive the exact distribution of the chi-square statistic obtained by the optimal two cutpoints. We suggest a combinatorial computation method and illustrate our approach by a simulation study and an application to varicella data.