Abstract
. Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.
Dokumententyp: | Konferenzbeitrag (Poster) |
---|---|
Keywords: | Reinforcement learning, RLHF, Human feedback |
Fakultät: | Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen |
Themengebiete: | 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik |
URN: | urn:nbn:de:bvb:19-epub-122934-6 |
ISBN: | 978-3-031-74632-1 |
Sprache: | Englisch |
Dokumenten ID: | 122934 |
Datum der Veröffentlichung auf Open Access LMU: | 05. Dez. 2024 12:24 |
Letzte Änderungen: | 12. Dez. 2024 13:20 |