Reinforcement Learning for Multi-Agent Stochastic Resource Collection

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Strauss, Niklas ORCID: https://orcid.org/0000-0002-8083-7323; Winkel, David ORCID: https://orcid.org/0000-0001-8829-0863; Berrendorf, Max ORCID: https://orcid.org/0000-0001-9724-4009 und Schubert, Matthias ORCID: https://orcid.org/0000-0002-6566-6343 (2023): Reinforcement Learning for Multi-Agent Stochastic Resource Collection. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Grenoble, France, 19. - 23. September 2022. Amini, Massih-Reza; Canu, Stéphane; Fischer, Asja; Guns, Tias; Kralj Novak, Petra und Tsoumakas, Grigorios (Hrsg.): In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science Bd. 13716 Cham: Springer. S. 200-215

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1007/978-3-031-26412-2_13

Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
ISBN:	978-3-031-26411-5 ; 978-3-031-26412-2 ; 978-3-031-26413-9
ISSN:	0302-9743
Ort:	Cham
Sprache:	Englisch
Dokumenten ID:	124492
Datum der Veröffentlichung auf Open Access LMU:	09. Mrz. 2025 09:59
Letzte Änderungen:	09. Mrz. 2025 09:59

Dokument bearbeiten