Logo Logo
Hilfe
Hilfe
Switch Language to English

Scholl, Philipp; Dietrich, Felix; Otte, Clemens und Udluft, Steffen (2022): Safe Policy Improvement Approaches on Discrete Markov Decision Processes. 14th International Conference on Agents and Artificial Intelligence, Online, 3-5 February, 2022. Rocha, Ana Paula; Steels, Luc und Herik, Jaap van den (Hrsg.): In: Proceedings of the 14th International Conference on Agents and Artificial Intelligence. Vol. II, Setúbal: SciTePress - Science and Technology Publications, Lda.. S. 142-151

Volltext auf 'Open Access LMU' nicht verfügbar.

Abstract

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

Dokument bearbeiten Dokument bearbeiten