SENIOR improves feedback efficiency and policy learning speed in PbRL by combining motion-distinction query selection via kernel density estimation with preference-guided intrinsic rewards, showing gains on simulated and real robot tasks.
PEBBLE: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
The Hidden Utility Bandit (HUB) framework models teacher heterogeneity in reward learning and supports active teacher selection algorithms that outperform baselines in paper recommendation and COVID-19 vaccine testing domains.
citing papers explorer
-
SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning
SENIOR improves feedback efficiency and policy learning speed in PbRL by combining motion-distinction query selection via kernel density estimation with preference-guided intrinsic rewards, showing gains on simulated and real robot tasks.
-
Active teacher selection for reward learning
The Hidden Utility Bandit (HUB) framework models teacher heterogeneity in reward learning and supports active teacher selection algorithms that outperform baselines in paper recommendation and COVID-19 vaccine testing domains.