ORCID: https://orcid.org/0000-0001-5193-8574; Ball, Sarah; Beck, Jakob; Hüllermeier, Eyke
ORCID: https://orcid.org/0000-0002-9944-4108 und Kreuter, Frauke
(1. January 2025):
On the challenges and practices of reinforcement learning from real human feedback.
First ECMLPKDD Workshop on Hybrid Human-Machine Learning and Decision Making, Turin, Italy, 22. September 2023.
Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
Springer Cham. pp. 276-294
[PDF, 1MB]
Abstract
. Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.
| Item Type: | Conference or Workshop Item (Poster) |
|---|---|
| Keywords: | Reinforcement learning, RLHF, Human feedback |
| Faculties: | Mathematics, Computer Science and Statistics > Computer Science > Artificial Intelligence and Machine Learning |
| Subjects: | 000 Computer science, information and general works > 004 Data processing computer science |
| URN: | urn:nbn:de:bvb:19-epub-122934-6 |
| ISBN: | 978-3-031-74627-7 |
| Language: | English |
| Item ID: | 122934 |
| Date Deposited: | 05. Dec 2024 12:24 |
| Last Modified: | 08. Jan 2026 15:23 |
