DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.
An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning Reward Functions by Integrating Human Demonstrations and Preferences
DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.