Logo Logo
Hilfe
Hilfe
Switch Language to English

Strauss, Niklas ORCID logoORCID: https://orcid.org/0000-0002-8083-7323; Winkel, David ORCID logoORCID: https://orcid.org/0000-0001-8829-0863; Berrendorf, Max ORCID logoORCID: https://orcid.org/0000-0001-9724-4009 und Schubert, Matthias ORCID logoORCID: https://orcid.org/0000-0002-6566-6343 (2023): Reinforcement Learning for Multi-Agent Stochastic Resource Collection. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Grenoble, France, 19. - 23. September 2022. Amini, Massih-Reza; Canu, Stéphane; Fischer, Asja; Guns, Tias; Kralj Novak, Petra und Tsoumakas, Grigorios (Hrsg.): In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science Bd. 13716 Cham: Springer. S. 200-215

Volltext auf 'Open Access LMU' nicht verfügbar.

Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

Dokument bearbeiten Dokument bearbeiten