Unbiased split selection for classification trees based on the Gini Index

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Strobl, Carolin; Boulesteix, Anne-Laure und Augustin, Thomas (2005): Unbiased split selection for classification trees based on the Gini Index. Sonderforschungsbereich 386, Discussion Paper 464 [PDF, 237kB]

Vorschau

DOI: 10.5282/ubm/epub.1833

Abstract

The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion.

Dokumententyp:	Paper
Fakultät:	Mathematik, Informatik und Statistik > Statistik > Sonderforschungsbereich 386 Sonderforschungsbereiche > Sonderforschungsbereich 386
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-1833-1
Sprache:	Englisch
Dokumenten ID:	1833
Datum der Veröffentlichung auf Open Access LMU:	11. Apr. 2007
Letzte Änderungen:	04. Nov. 2020 12:45

Dokument bearbeiten