The Hidden Utility Bandit (HUB) framework models teacher heterogeneity in reward learning and supports active teacher selection algorithms that outperform baselines in paper recommendation and COVID-19 vaccine testing domains.
On the sensitivity of reward inference to misspecified human models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2023 2representative citing papers
Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.
citing papers explorer
-
Active teacher selection for reward learning
The Hidden Utility Bandit (HUB) framework models teacher heterogeneity in reward learning and supports active teacher selection algorithms that outperform baselines in paper recommendation and COVID-19 vaccine testing domains.
-
Towards Understanding Sycophancy in Language Models
Sycophancy is prevalent in state-of-the-art AI assistants and is likely driven in part by human preferences that favor agreement over truthfulness.