QuickLAP fuses LLM-extracted language observations with physical feedback in a closed-form Bayesian update to cut reward learning error by over 70% in a driving simulator and improve user preference in a 15-person study.
Learning a prior over intent via meta-inverse reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Learning the demonstrator's planning algorithm via a differentiable planner improves IRL reward inference over incorrect bias assumptions but underperforms exact planners.
citing papers explorer
-
QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents
QuickLAP fuses LLM-extracted language observations with physical feedback in a closed-form Bayesian update to cut reward learning error by over 70% in a driving simulator and improve user preference in a 15-person study.
-
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
Learning the demonstrator's planning algorithm via a differentiable planner improves IRL reward inference over incorrect bias assumptions but underperforms exact planners.