Qualitative Multi-Armed Bandits: A Quantile-Based Approach

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Szörényi, Balázs; Busa-Fekete, Róbert; Weng, Paul und Hüllermeier, Eyke ORCID: https://orcid.org/0000-0002-9944-4108 (Juli 2015): Qualitative Multi-Armed Bandits: A Quantile-Based Approach. 32nd International Conference on Machine Learning, Lille, France, July 6 - 11, 2015. In: Proceedings of the 32nd International Conference on Machine Learning, Bd. 37 S. 1660-1668 [PDF, 449kB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

Externer Volltext: https://proceedings.mlr.press/v37/szorenyi15.html

Abstract

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
URN:	urn:nbn:de:bvb:19-epub-91703-2
Sprache:	Englisch
Dokumenten ID:	91703
Datum der Veröffentlichung auf Open Access LMU:	31. Mrz. 2022 12:12
Letzte Änderungen:	14. Okt. 2024 16:19

Dokument bearbeiten