(Psycho-)Analysis of Benchmark Experiments

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Eugster, Manuel J. A.; Leisch, Friedrich und Strobl, Carolin (9. März 2010): (Psycho-)Analysis of Benchmark Experiments. A Formal Framework for Investigating the Relationship between Data Sets and Learning Algorithms. Department of Statistics: Technical Reports, Nr. 78 [PDF, 785kB]

Vorschau

DOI: 10.5282/ubm/epub.11425

Abstract

It is common knowledge that certain characteristics of data sets -- such as linear separability or sample size -- determine the performance of learning algorithms. In this paper we propose a formal framework for investigations on this relationship.

The framework combines three, in their respective scientific discipline well-established, methods. Benchmark experiments are the method of choice in machine and statistical learning to compare algorithms with respect to a certain performance measure on particular data sets. To realize the interaction between data sets and algorithms, the data sets are characterized using statistical and information-theoretic measures; a common approach in the field of meta learning to decide which algorithms are suited to particular data sets. Finally, the performance ranking of algorithms on groups of data sets with similar characteristics is determined by means of recursively partitioning Bradley-Terry models, that are commonly used in psychology to study the preferences of human subjects. The result is a tree with splits in data set characteristics which significantly change the performances of the algorithms. The main advantage is the automatic detection of these important characteristics.

The framework is introduced using a simple artificial example. Its real-word usage is demonstrated by means of an application example consisting of thirteen well-known data sets and six common learning algorithms. All resources to replicate the examples are available online.

Dokumententyp:	Paper
Keywords:	Benchmark experiments, data set characterization, recursive partitioning, preference scaling, Bradley-Terry model
Fakultät:	Mathematik, Informatik und Statistik > Statistik > Technische Reports
Themengebiete:	300 Sozialwissenschaften > 310 Statistiken
URN:	urn:nbn:de:bvb:19-epub-11425-9
Sprache:	Englisch
Dokumenten ID:	11425
Datum der Veröffentlichung auf Open Access LMU:	09. Mrz. 2010 13:54
Letzte Änderungen:	04. Nov. 2020 12:52

Dokument bearbeiten