Logo
EnglishCookie löschen - von nun an wird die Spracheinstellung Ihres Browsers verwendet.
Strobl, Carolin (2005): Variable Selection Bias in Classification Trees Based on Imprecise Probabilities. Sonderforschungsbereich 386, Discussion Paper 419
[img]
Vorschau

PDF

325kB

Abstract

Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees, while in classification trees based on imprecise probabilities, an extension of the Shannon entropy has been introduced as the splitting criterion. However, the use of these empirical entropy measures as split selection criteria can lead to a bias in variable selection, such that variables are preferred for features other than their information content. This bias is not eliminated by the imprecise probability approach. The source of variable selection bias for the estimated Shannon entropy, as well as possible corrections, are outlined. The variable selection performance of the biased and corrected estimators are evaluated in a simulation study. Additional results from research on variable selection bias in classical classification trees are incorporated, implying further investigation of alternative split selection criteria in classification trees based on imprecise probabilities.