Multi-armed bandits with censored consumption of resources

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Bengs, Viktor ORCID: https://orcid.org/0000-0001-6988-6186 und Hüllermeier, Eyke ORCID: https://orcid.org/0000-0002-9944-4108 (16. November 2022): Multi-armed bandits with censored consumption of resources. In: Machine Learning, Bd. 112, Nr. 2: S. 217-240 [PDF, 2MB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

DOI: 10.1007/s10994-022-06271-z

Abstract

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

Dokumententyp:	Zeitschriftenartikel
Publikationsform:	Publisher's Version
Fakultät:	Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
URN:	urn:nbn:de:bvb:19-epub-94657-3
ISSN:	0885-6125
Sprache:	Englisch
Dokumenten ID:	94657
Datum der Veröffentlichung auf Open Access LMU:	16. Feb. 2023 14:18
Letzte Änderungen:	11. Okt. 2024 13:52
DFG:	Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - 491502892

Dokument bearbeiten