Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations

Andreea Bobu; Helena Merker; Nick Walker

arxiv: 2605.22986 · v1 · pith:IR72YOXHnew · submitted 2026-05-21 · 💻 cs.RO · cs.AI· cs.HC· cs.LG

Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations

Helena Merker , Nick Walker , Andreea Bobu This is my paper

Pith reviewed 2026-05-25 05:34 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.HCcs.LG

keywords reward learning from demonstrationsactive queryingnatural language explanationsunderspecified featuresrobot alignmenttargeted correctionsimperfect demonstrations

0 comments

The pith

Robots recover misaligned rewards by detecting underspecified features via demonstration variation and soliciting targeted natural language corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that robots can identify which task features remain underspecified in human demonstrations by measuring statistical variation across those demonstrations. Low variation signals features the human consistently optimized; high variation flags gaps that leave the reward function ambiguous. The robot then describes its uncertainty in natural language and requests new demonstrations that explicitly address the identified features. Evaluations in a simulated tabletop manipulation task and a real-robot user study show these guided queries recover more accurate rewards than random queries or passive data collection alone.

Core claim

Demonstrations implicitly reveal which features are well specified through low variation across examples, while high variation indicates underspecified features that create reward ambiguity. The robot leverages this signal to generate natural language explanations of its uncertainty and actively queries for corrective demonstrations that resolve those gaps, thereby reducing misalignment that would otherwise persist from imperfect initial data.

What carries the argument

Variation across demonstrations as a statistical signal for underspecified features, paired with natural language explanations to elicit targeted corrective demonstrations.

If this is right

Reward functions learned this way exhibit reduced ambiguity on features that initially varied widely.
Robot behavior aligns more closely with intended preferences in deployment situations not covered by the original demonstrations.
The method yields measurable gains in both simulated manipulation domains and physical robot interactions with human users.
Targeted queries outperform both random questioning and reliance on the initial imperfect demonstrations alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The variation signal could be combined with other uncertainty measures to prioritize queries even more efficiently.
The approach may generalize to non-robotics settings where humans give demonstrations or feedback and language clarification is available.
Fewer total demonstrations might suffice overall because queries focus human effort on high-variation features rather than spreading it evenly.

Load-bearing premise

Statistical variation across demonstrations reliably indicates which features humans left underspecified, and natural language queries will produce demonstrations that correctly resolve those gaps.

What would settle it

A controlled comparison in which explanation-guided queries produce no measurable improvement in reward alignment metrics over random querying or passive collection would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.22986 by Andreea Bobu, Helena Merker, Nick Walker.

**Figure 2.** Figure 2: Experimental environments. (a) JacoRobot simulated environment, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Reward recovery in JacoRobot when the initial demonstrations underspecify (a) a single feature or (b) two features. ASQ consistently outperforms the random baselines and approaches Oracle performance, recovering the true reward with fewer targeted demonstrations. Lines show mean normalized reward across 5 seeds and shaded regions denote standard error. weights θ ∗ and per-feature rationality coefficients β… view at source ↗

**Figure 4.** Figure 4: Reward recovery in JacoRobot in the 8-feature setting that augments the feature set of 4 task-relevant features with 4 irrelevant distractor objects, with (a) one task-relevant feature underspecified and (b) two task-relevant features underspecified. LLM-based filtering prunes the distractors before variance-based detection, allowing ASQ to focus queries on task-relevant features. ASQ matches or exceeds th… view at source ↗

**Figure 5.** Figure 5: Change in normalized reward by condition. Hashed indicates an [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Whether participants reported emphasizing the underspecified feature [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: GridRobot navigation simulated environment. Along with the JacoRobot environment, we evaluate our approach in the GridRobot environment: a simulated discrete 2D navigation task. In the GridRobot domain, an agent must navigate from a start position to a goal position on a 5x5 grid while avoiding an obstacle ( [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: GridRobot where the initial demonstrations have one underspecified [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Per-participant demonstration feature values across the three user-study conditions. Each column shows one task-relevant feature; rows correspond to [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: End-effector trajectories from all user study demonstrations. Each panel shows the three demonstrations provided by one participant (rows, P1-P12) [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

read the original abstract

Learning reward functions from demonstrations assumes that demonstrations provide adequate supervision over all features -- or task-relevant aspects of behavior. In practice, demonstrations are often imperfect: humans may under-emphasize certain features due to cognitive load or physical difficulty, or the training regime may fail to sufficiently cover all relevant situations. In either case, important features may be underspecified, leading to ambiguity in the learned reward function and misaligned behavior at deployment. We propose a framework that detects such underspecified features and actively solicits targeted corrective demonstrations. Our key insight is that demonstrations implicitly reveal which features are well specified: features that are consistently optimized show little variation across demonstrations, while features that are underspecified vary widely. We leverage this statistical signal to infer which features may have been insufficiently demonstrated. The robot then explains which features it is uncertain about in natural language and queries for demonstrations that explicitly address the identified gaps. We evaluate our approach in a simulated tabletop manipulation domain and in a user study with a real Franka robot. Targeted, explanation-guided queries significantly improve reward recovery compared to random querying and passive data collection, reducing ambiguity that would otherwise persist in learning from imperfect demonstrations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's variation signal for spotting underspecified features plus targeted NL queries is a plausible practical step, but the signal may not distinguish underspecification from noise or multiple optima.

read the letter

The main move here is to treat cross-demonstration variation as a signal for which features were left underspecified by the human, then have the robot describe its uncertainty in natural language and request demos that close those specific gaps. That combination of a statistical detector with explanation-guided active querying is the piece that does not appear in the cited prior work on reward learning or active querying alone. They run it in a simulated tabletop task and a user study with a physical Franka, reporting that the targeted queries recover rewards better than random queries or passive collection alone. The real-robot evaluation is a concrete plus for the claim that this helps in practice. The soft spot is exactly the one the stress-test flags: variation can arise from sensor noise, multiple equally good ways to complete the task, or human inconsistency unrelated to underspecification. The abstract gives no mechanism or ablation that isolates the intended cause, so the inferred uncertainty set could be off and the subsequent queries could target the wrong gaps. If the full paper has controls or analysis showing the signal is reliable under those alternatives, that would address the concern; otherwise the central step remains an assumption. This is for researchers working on reward learning from imperfect human input in robotics. It has a working method, empirical results on hardware, and a clear practical motivation, so it should go to peer review rather than desk reject, with the variation assumption as the main point for referees to check.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that reward functions learned from demonstrations often suffer from underspecified features due to imperfect human input, and proposes detecting these via statistical variation (low variation = well-specified, high variation = underspecified). The robot then generates natural-language explanations of its uncertainty and solicits targeted demonstrations to resolve the gaps. Empirical results in a simulated tabletop domain and a real Franka robot user study show that this targeted querying improves reward recovery over random querying and passive collection.

Significance. If the central result holds, the work provides a concrete mechanism for active resolution of ambiguity in inverse reinforcement learning from imperfect demonstrations, a practical issue in human-robot interaction. The real-robot user study and comparison to both random and passive baselines are strengths that ground the claim in usable settings. The natural-language query interface is a positive contribution to interpretability.

major comments (1)

[Abstract and §3] Abstract (key insight paragraph) and §3 (method description of feature detection): the claim that variation across demonstrations reliably indicates underspecification (vs. sensor/execution noise, multiple equally optimal policies, or unrelated human suboptimality) is load-bearing for the query-selection step. No mechanism, control experiment, or statistical test is described to distinguish these cases, so the inferred uncertainty set and subsequent improvement may not be attributable to the proposed signal.

minor comments (1)

[Experiments section] Figure captions and axis labels in the experimental results could more explicitly state the number of trials, statistical test used, and exact definition of 'reward recovery' metric for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract (key insight paragraph) and §3 (method description of feature detection): the claim that variation across demonstrations reliably indicates underspecification (vs. sensor/execution noise, multiple equally optimal policies, or unrelated human suboptimality) is load-bearing for the query-selection step. No mechanism, control experiment, or statistical test is described to distinguish these cases, so the inferred uncertainty set and subsequent improvement may not be attributable to the proposed signal.

Authors: We agree that variation across demonstrations is not a unique indicator of underspecification and may also arise from sensor or execution noise, multiple equally optimal policies, or other forms of human suboptimality. The manuscript presents variation as a practical statistical proxy for identifying features that may require additional supervision, without providing an explicit mechanism, control experiment, or statistical test to isolate the underlying cause. The empirical improvements over random querying and passive collection are demonstrated in the tabletop domain and Franka user study, but these results do not rule out alternative explanations for the observed variation. In the revised manuscript we will add explicit discussion of this assumption and its limitations in §3, including a dedicated paragraph on potential confounding factors and their implications for the uncertainty set. revision: yes

Circularity Check

0 steps flagged

No circularity: central claim uses direct statistical signal from input demonstrations

full rationale

The paper's key step infers underspecified features from variation across demonstrations as an empirical observation ('features that are consistently optimized show little variation across demonstrations, while features that are underspecified vary widely'), without any quoted equations, fitted parameters renamed as predictions, or self-citation chains that reduce the result to its own inputs by construction. The framework then uses this signal for natural-language queries and evaluates against random/passive baselines in simulation and user studies, remaining self-contained without load-bearing self-references or ansatz smuggling. This matches the most common honest finding of no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on the domain assumption that variation across demonstrations is a valid proxy for underspecification and that language-based queries will produce corrective data; no free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption Demonstrations implicitly reveal which features are well-specified through low variation across examples
Stated as the key insight enabling detection of underspecified features
domain assumption Natural language explanations can effectively elicit targeted corrective demonstrations from users
Required for the querying component of the framework to function

pith-pipeline@v0.9.0 · 5745 in / 1144 out tokens · 26662 ms · 2026-05-25T05:34:55.139014+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Pieter Abbeel and Andrew Y . Ng. Apprenticeship learn- ing via inverse reinforcement learning. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 1, New York, NY , USA,

work page
[2]

ISBN 1581138385.DOI: 10.1145/1015330.1015430

Association for Computing Machinery. ISBN 1581138385.DOI: 10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430
[3]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Saleema Amershi, Dan Weld, Mihaela V orvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. Guidelines for human-ai interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, page 1–13, New York, NY , USA, 2019...

work page doi:10.1145/3290605.3300233 2019
[4]

Learning robot objectives from physical human interaction

Andrea Bajcsy, Dylan P Losey, Marcia K O’malley, and Anca D Dragan. Learning robot objectives from physical human interaction. InConference on robot learning, pages 217–226. PMLR, 2017. URL http://proceedings. mlr.press/v78/bajcsy17a.html

work page 2017
[5]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. Learning from physical human corrections, one feature at a time. InProceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2018, Chicago, IL, USA, March 05-08, 2018, pages 141–149. ACM, 2018.DOI: 10.1145/3171221.3171267

work page doi:10.1145/3171221.3171267 2018
[6]

Goal inference as inverse planning

Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, volume 29, 2007

work page 2007
[7]

Fitting linear mixed-effects models using lme4

Douglas Bates, Martin M ¨achler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015.DOI: 10.18637/jss.v067.i01

work page doi:10.18637/jss.v067.i01 2015
[8]

Inverse reinforce- ment learning by estimating expertise of demonstrators

Mark Beliaev and Ramtin Pedarsani. Inverse reinforce- ment learning by estimating expertise of demonstrators. InAAAI-25, Sponsored by the Association for the Ad- vancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, pages 15532–15540. AAAI Press, 2025.DOI: 10.1609/AAAI.V39I15.33705

work page doi:10.1609/aaai.v39i15.33705 2025
[9]

Data quality in imitation learning

Suneel Belkhale, Yuchen Cui, and Dorsa Sadigh. Data quality in imitation learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc. URL http://papers.nips.cc/paper files/paper/2023/hash/ fe692980c5d9732cf153ce27947653a7-Abstract-Conference. html

work page 2023
[10]

A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. Learning under misspecified objective spaces. InCon- ference on Robot Learning (CoRL), 2018. URL http: //proceedings.mlr.press/v87/bobu18a.html

work page 2018
[11]

A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. Quantifying hypothesis space misspecification in learning from human–robot demonstrations and physical corrections.Transactions on Robotics (T-RO), 2020.DOI: 10.1109/TRO.2020.2971415

work page doi:10.1109/tro.2020.2971415 2020
[12]

Inducing structure in reward learn- ing by learning features.The International Jour- nal of Robotics Research, 41(5):497–518, 2022.DOI: 10.1177/02783649221078031

Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D Dragan. Inducing structure in reward learn- ing by learning features.The International Jour- nal of Robotics Research, 41(5):497–518, 2022.DOI: 10.1177/02783649221078031

work page doi:10.1177/02783649221078031 2022
[13]

Brown, Wonjoon Goo, and Scott Niekum

Daniel S. Brown, Wonjoon Goo, and Scott Niekum. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 ofProceedings of Machine Learning Research, pages 330–359. PMLR, 2019. URL http://proceedings.mlr.pre...

work page 2019
[14]

Brown, Russell Coleman, Ravi Srinivasan, and Scott Niekum

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, and Scott Niekum. Safe imitation learning via fast bayesian reward inference from preferences. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 ofProceedings of Machine Learning Research, pages 1165–1177. PMLR, 2020. URL http://p...

work page 2020
[15]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

S ´ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Har- sha Nori, Hamid Palangi, Marco T ´ulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with GPT-4.CoRR, abs/2303.12712, 2023. DOI: 10.48550/ARXIV .2303.12712

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[16]

Maya Cakmak and Andrea L. Thomaz. Optimal- ity of human teachers for robot learners. In2010 IEEE 9th International Conference on Development and Learning, pages 64–69, 2010.DOI: 10.1109/DE- VLRN.2010.5578865

work page doi:10.1109/de- 2010
[17]

Maya Cakmak and Andrea L. Thomaz. Designing robot learners that ask good questions. InProceedings of the Seventh Annual ACM/IEEE International Con- ference on Human-Robot Interaction, HRI ’12, page 17–24, New York, NY , USA, 2012. Association for Computing Machinery. ISBN 9781450310635.DOI: 10.1145/2157689.2157693

work page doi:10.1145/2157689.2157693 2012
[18]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 4299–4307, 2017. URL https://proceedings.neuri...

work page 2017
[19]

Guided cost learning: Deep inverse optimal control via policy optimization

Chelsea Finn, Sergey Levine, and Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. InICML, 2016. URL http://proceedings. mlr.press/v48/finn16.html

work page 2016
[20]

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes,

Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, and Anca D. Dragan. Probabilistically safe robot planning with confidence-based human predictions. InRobotics: Science and Systems XIV , Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018. DOI: 10.15607/RSS.2018.XIV .069

work page doi:10.15607/rss.2018.xiv 2018
[21]

Demonstra- tion based explainable ai for learning from demonstration methods.IEEE Robotics and Automation Letters, 10(7): 6552–6559, 2025.DOI: 10.1109/LRA.2025.3568617

Morris Gu, Elizabeth Croft, and Dana Kuli ´c. Demonstra- tion based explainable ai for learning from demonstration methods.IEEE Robotics and Automation Letters, 10(7): 6552–6559, 2025.DOI: 10.1109/LRA.2025.3568617

work page doi:10.1109/lra.2025.3568617 2025
[22]

Blumenschein, and Dylan P

Soheil Habibian, Antonio Alvarez Valdivia, Laura H. Blumenschein, and Dylan P. Losey. A survey of commu- nicating robot learning during human-robot interaction. The International Journal of Robotics Research, 44(4): 665–698, 2025.DOI: 10.1177/02783649241281369

work page doi:10.1177/02783649241281369 2025
[23]

Hart and Lowell E

Sandra G. Hart and Lowell E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research.Advances in Psychology, 52:139– 183, 1988.DOI: 10.1016/S0166-4115(08)62386-9

work page doi:10.1016/s0166-4115(08)62386-9 1988
[24]

Huang, David Held, Pieter Abbeel, and Anca D

Sandy H. Huang, David Held, Pieter Abbeel, and Anca D. Dragan. Enabling robots to communicate their objectives.Auton. Robots, 43(2):309–326, 2019.DOI: 10.1007/S10514-018-9771-0

work page doi:10.1007/s10514-018-9771-0 2019
[25]

Masked irl: Llm-guided re- ward disambiguation from demonstrations and language,

Minyoung Hwang, Alexandra Forsey-Smerek, Nathaniel Dennler, and Andreea Bobu. Masked irl: Llm-guided re- ward disambiguation from demonstrations and language,

work page
[26]

URL https://arxiv.org/abs/2511.14565

work page arXiv
[27]

E. T. Jaynes. Information theory and statistical me- chanics.Phys. Rev., 106:620–630, May 1957.DOI: 10.1103/PhysRev.106.620

work page doi:10.1103/physrev.106.620 1957
[28]

Brockhoff, and Rune H

Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. lmertest package: Tests in linear mixed effects models.Journal of Statistical Software, 82(13): 1–26, 2017.DOI: 10.18637/jss.v082.i13

work page doi:10.18637/jss.v082.i13 2017
[29]

Huang, and Anca D

Minae Kwon, Sandy H. Huang, and Anca D. Dragan. Expressing robot incapability. InProceedings of the 2018 ACM/IEEE International Conference on Human- Robot Interaction, HRI ’18, page 87–95, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450349536.DOI: 10.1145/3171221.3171276

work page doi:10.1145/3171221.3171276 2018
[30]

Lenth and Julia Piaskowski.emmeans: Esti- mated Marginal Means, aka Least-Squares Means, 2025

Russell V . Lenth and Julia Piaskowski.emmeans: Esti- mated Marginal Means, aka Least-Squares Means, 2025. URL https://rvlenth.github.io/emmeans/. R package ver- sion 2.0.1

work page 2025
[31]

Hamrick, Jaime F

Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, and Thomas L. Griffiths. Goal inference improves objective and perceived performance in human-robot collaboration. InProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, May 9-13, 2016, pages 940–948. ACM, 20...

work page 2016
[32]

Ng and Stuart Russell

Andrew Y . Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In Pat Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, pages 663–

work page 2000
[33]

Morgan Kaufmann, 2000

work page 2000
[34]

Ho, Tianmin Shu, Andreea Bobu, Julie Shah, and Pulkit Agrawal

Andi Peng, Aviv Netanyahu, Mark K. Ho, Tianmin Shu, Andreea Bobu, Julie Shah, and Pulkit Agrawal. Diagnosis, feedback, adaptation: A human-in-the-loop framework for test-time policy adaptation. InInterna- tional Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 ofProceedings of Machine Learning Research, pages 2...

work page 2023
[35]

Li, Theodore R

Andi Peng, Andreea Bobu, Belinda Z. Li, Theodore R. Sumers, Ilia Sucholutsky, Nishanth Kumar, Thomas L. Griffiths, and Julie A. Shah. Preference-conditioned language-guided abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human- Robot Interaction, HRI 2024, Boulder, CO, USA, March 11-15, 2024, pages 572–581. ACM, 2024.DOI: 10.11...

work page doi:10.1145/3610977.3634930 2024
[36]

Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. Adaptive language-guided abstraction from contrastive explanations. InConference on Robot Learning, 6- 9 November 2024, Munich, Germany, volume 270 of Proceedings of Machine Learning Research, pages 3425–

work page 2024
[37]

URL https://proceedings.mlr.press/ v270/peng25c.html

PMLR, 2024. URL https://proceedings.mlr.press/ v270/peng25c.html

work page 2024
[38]

Huang, and Anca D

Daniel Rakita, Bilge Mutlu, and Michael Gleicher. An autonomous dynamic camera method for effective remote teleoperation. InProceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2018, Chicago, IL, USA, March 05-08, 2018, pages 325–333. ACM, 2018.DOI: 10.1145/3171221.3171279

work page doi:10.1145/3171221.3171279 2018
[39]

Polydoros, Sonia Chernova, and Aude Billard

Harish Ravichandar, Athanasios S. Polydoros, Sonia Chernova, and Aude Billard. Recent advances in robot learning from demonstration.Annu. Rev. Con- trol. Robotics Auton. Syst., 3:297–330, 2020.DOI: 10.1146/ANNUREV-CONTROL-100819-063206

work page doi:10.1146/annurev-control-100819-063206 2020
[40]

Dragan, Shankar Sastry, and Sanjit A

Dorsa Sadigh, Anca D. Dragan, Shankar Sastry, and Sanjit A. Seshia. Active preference-based learning of reward functions. InRobotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July 12-16, 2017, 2017.DOI: 10.15607/RSS.2017.XIII.053

work page doi:10.15607/rss.2017.xiii.053 2017
[41]

Maram Sakr, Juyan Zhang, H. F. Machiel Van der Loos, Dana Kuli ´c, and Elizabeth Croft. Consistency matters: Defining demonstration data quality metrics in robot learning from demonstration.J. Hum.-Robot Interact., 15(2), December 2025.DOI: 10.1145/3773904

work page doi:10.1145/3773904 2025
[42]

Correcting robot plans with natural language feedback

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. Correcting robot plans with natural language feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, 2022.DOI: 10.15607/RSS.2022.XVIII.065

work page doi:10.15607/rss.2022.xviii.065 2022
[43]

Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257– 285, 1988

John Sweller. Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257– 285, 1988. ISSN 0364-0213.DOI: 10.1016/0364- 0213(88)90023-7

work page doi:10.1016/0364- 1988
[44]

Princeton University Press Princeton, NJ, 1945

John V on Neumann and Oskar Morgenstern.Theory of games and economic behavior. Princeton University Press Princeton, NJ, 1945

work page 1945
[45]

bad at this task

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. Maximum entropy inverse reinforce- ment learning. InAAAI, volume 8, pages 1433–1438. Chicago, IL, USA, 2008. APPENDIX A. Implementation Details Reference Variance Distributions.To construct the reference distributionsP(σ 2 i |o i = 0, ϕ i)andP(σ 2 i |o i = 1, ϕ i)for each featureϕ i ∈ϕ,...

work page 2008

[1] [1]

Pieter Abbeel and Andrew Y . Ng. Apprenticeship learn- ing via inverse reinforcement learning. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 1, New York, NY , USA,

work page

[2] [2]

ISBN 1581138385.DOI: 10.1145/1015330.1015430

Association for Computing Machinery. ISBN 1581138385.DOI: 10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430

[3] [3]

Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz

Saleema Amershi, Dan Weld, Mihaela V orvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. Guidelines for human-ai interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, page 1–13, New York, NY , USA, 2019...

work page doi:10.1145/3290605.3300233 2019

[4] [4]

Learning robot objectives from physical human interaction

Andrea Bajcsy, Dylan P Losey, Marcia K O’malley, and Anca D Dragan. Learning robot objectives from physical human interaction. InConference on robot learning, pages 217–226. PMLR, 2017. URL http://proceedings. mlr.press/v78/bajcsy17a.html

work page 2017

[5] [5]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. Learning from physical human corrections, one feature at a time. InProceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2018, Chicago, IL, USA, March 05-08, 2018, pages 141–149. ACM, 2018.DOI: 10.1145/3171221.3171267

work page doi:10.1145/3171221.3171267 2018

[6] [6]

Goal inference as inverse planning

Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, volume 29, 2007

work page 2007

[7] [7]

Fitting linear mixed-effects models using lme4

Douglas Bates, Martin M ¨achler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015.DOI: 10.18637/jss.v067.i01

work page doi:10.18637/jss.v067.i01 2015

[8] [8]

Inverse reinforce- ment learning by estimating expertise of demonstrators

Mark Beliaev and Ramtin Pedarsani. Inverse reinforce- ment learning by estimating expertise of demonstrators. InAAAI-25, Sponsored by the Association for the Ad- vancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, pages 15532–15540. AAAI Press, 2025.DOI: 10.1609/AAAI.V39I15.33705

work page doi:10.1609/aaai.v39i15.33705 2025

[9] [9]

Data quality in imitation learning

Suneel Belkhale, Yuchen Cui, and Dorsa Sadigh. Data quality in imitation learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc. URL http://papers.nips.cc/paper files/paper/2023/hash/ fe692980c5d9732cf153ce27947653a7-Abstract-Conference. html

work page 2023

[10] [10]

A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. Learning under misspecified objective spaces. InCon- ference on Robot Learning (CoRL), 2018. URL http: //proceedings.mlr.press/v87/bobu18a.html

work page 2018

[11] [11]

A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. Quantifying hypothesis space misspecification in learning from human–robot demonstrations and physical corrections.Transactions on Robotics (T-RO), 2020.DOI: 10.1109/TRO.2020.2971415

work page doi:10.1109/tro.2020.2971415 2020

[12] [12]

Inducing structure in reward learn- ing by learning features.The International Jour- nal of Robotics Research, 41(5):497–518, 2022.DOI: 10.1177/02783649221078031

Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D Dragan. Inducing structure in reward learn- ing by learning features.The International Jour- nal of Robotics Research, 41(5):497–518, 2022.DOI: 10.1177/02783649221078031

work page doi:10.1177/02783649221078031 2022

[13] [13]

Brown, Wonjoon Goo, and Scott Niekum

Daniel S. Brown, Wonjoon Goo, and Scott Niekum. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 ofProceedings of Machine Learning Research, pages 330–359. PMLR, 2019. URL http://proceedings.mlr.pre...

work page 2019

[14] [14]

Brown, Russell Coleman, Ravi Srinivasan, and Scott Niekum

Daniel S. Brown, Russell Coleman, Ravi Srinivasan, and Scott Niekum. Safe imitation learning via fast bayesian reward inference from preferences. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 ofProceedings of Machine Learning Research, pages 1165–1177. PMLR, 2020. URL http://p...

work page 2020

[15] [15]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

S ´ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Har- sha Nori, Hamid Palangi, Marco T ´ulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with GPT-4.CoRR, abs/2303.12712, 2023. DOI: 10.48550/ARXIV .2303.12712

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023

[16] [16]

Maya Cakmak and Andrea L. Thomaz. Optimal- ity of human teachers for robot learners. In2010 IEEE 9th International Conference on Development and Learning, pages 64–69, 2010.DOI: 10.1109/DE- VLRN.2010.5578865

work page doi:10.1109/de- 2010

[17] [17]

Maya Cakmak and Andrea L. Thomaz. Designing robot learners that ask good questions. InProceedings of the Seventh Annual ACM/IEEE International Con- ference on Human-Robot Interaction, HRI ’12, page 17–24, New York, NY , USA, 2012. Association for Computing Machinery. ISBN 9781450310635.DOI: 10.1145/2157689.2157693

work page doi:10.1145/2157689.2157693 2012

[18] [18]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 4299–4307, 2017. URL https://proceedings.neuri...

work page 2017

[19] [19]

Guided cost learning: Deep inverse optimal control via policy optimization

Chelsea Finn, Sergey Levine, and Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. InICML, 2016. URL http://proceedings. mlr.press/v48/finn16.html

work page 2016

[20] [20]

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes,

Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, and Anca D. Dragan. Probabilistically safe robot planning with confidence-based human predictions. InRobotics: Science and Systems XIV , Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018. DOI: 10.15607/RSS.2018.XIV .069

work page doi:10.15607/rss.2018.xiv 2018

[21] [21]

Demonstra- tion based explainable ai for learning from demonstration methods.IEEE Robotics and Automation Letters, 10(7): 6552–6559, 2025.DOI: 10.1109/LRA.2025.3568617

Morris Gu, Elizabeth Croft, and Dana Kuli ´c. Demonstra- tion based explainable ai for learning from demonstration methods.IEEE Robotics and Automation Letters, 10(7): 6552–6559, 2025.DOI: 10.1109/LRA.2025.3568617

work page doi:10.1109/lra.2025.3568617 2025

[22] [22]

Blumenschein, and Dylan P

Soheil Habibian, Antonio Alvarez Valdivia, Laura H. Blumenschein, and Dylan P. Losey. A survey of commu- nicating robot learning during human-robot interaction. The International Journal of Robotics Research, 44(4): 665–698, 2025.DOI: 10.1177/02783649241281369

work page doi:10.1177/02783649241281369 2025

[23] [23]

Hart and Lowell E

Sandra G. Hart and Lowell E. Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research.Advances in Psychology, 52:139– 183, 1988.DOI: 10.1016/S0166-4115(08)62386-9

work page doi:10.1016/s0166-4115(08)62386-9 1988

[24] [24]

Huang, David Held, Pieter Abbeel, and Anca D

Sandy H. Huang, David Held, Pieter Abbeel, and Anca D. Dragan. Enabling robots to communicate their objectives.Auton. Robots, 43(2):309–326, 2019.DOI: 10.1007/S10514-018-9771-0

work page doi:10.1007/s10514-018-9771-0 2019

[25] [25]

Masked irl: Llm-guided re- ward disambiguation from demonstrations and language,

Minyoung Hwang, Alexandra Forsey-Smerek, Nathaniel Dennler, and Andreea Bobu. Masked irl: Llm-guided re- ward disambiguation from demonstrations and language,

work page

[26] [26]

URL https://arxiv.org/abs/2511.14565

work page arXiv

[27] [27]

E. T. Jaynes. Information theory and statistical me- chanics.Phys. Rev., 106:620–630, May 1957.DOI: 10.1103/PhysRev.106.620

work page doi:10.1103/physrev.106.620 1957

[28] [28]

Brockhoff, and Rune H

Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. lmertest package: Tests in linear mixed effects models.Journal of Statistical Software, 82(13): 1–26, 2017.DOI: 10.18637/jss.v082.i13

work page doi:10.18637/jss.v082.i13 2017

[29] [29]

Huang, and Anca D

Minae Kwon, Sandy H. Huang, and Anca D. Dragan. Expressing robot incapability. InProceedings of the 2018 ACM/IEEE International Conference on Human- Robot Interaction, HRI ’18, page 87–95, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450349536.DOI: 10.1145/3171221.3171276

work page doi:10.1145/3171221.3171276 2018

[30] [30]

Lenth and Julia Piaskowski.emmeans: Esti- mated Marginal Means, aka Least-Squares Means, 2025

Russell V . Lenth and Julia Piaskowski.emmeans: Esti- mated Marginal Means, aka Least-Squares Means, 2025. URL https://rvlenth.github.io/emmeans/. R package ver- sion 2.0.1

work page 2025

[31] [31]

Hamrick, Jaime F

Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, and Thomas L. Griffiths. Goal inference improves objective and perceived performance in human-robot collaboration. InProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, May 9-13, 2016, pages 940–948. ACM, 20...

work page 2016

[32] [32]

Ng and Stuart Russell

Andrew Y . Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In Pat Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, pages 663–

work page 2000

[33] [33]

Morgan Kaufmann, 2000

work page 2000

[34] [34]

Ho, Tianmin Shu, Andreea Bobu, Julie Shah, and Pulkit Agrawal

Andi Peng, Aviv Netanyahu, Mark K. Ho, Tianmin Shu, Andreea Bobu, Julie Shah, and Pulkit Agrawal. Diagnosis, feedback, adaptation: A human-in-the-loop framework for test-time policy adaptation. InInterna- tional Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 ofProceedings of Machine Learning Research, pages 2...

work page 2023

[35] [35]

Li, Theodore R

Andi Peng, Andreea Bobu, Belinda Z. Li, Theodore R. Sumers, Ilia Sucholutsky, Nishanth Kumar, Thomas L. Griffiths, and Julie A. Shah. Preference-conditioned language-guided abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human- Robot Interaction, HRI 2024, Boulder, CO, USA, March 11-15, 2024, pages 572–581. ACM, 2024.DOI: 10.11...

work page doi:10.1145/3610977.3634930 2024

[36] [36]

Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. Adaptive language-guided abstraction from contrastive explanations. InConference on Robot Learning, 6- 9 November 2024, Munich, Germany, volume 270 of Proceedings of Machine Learning Research, pages 3425–

work page 2024

[37] [37]

URL https://proceedings.mlr.press/ v270/peng25c.html

PMLR, 2024. URL https://proceedings.mlr.press/ v270/peng25c.html

work page 2024

[38] [38]

Huang, and Anca D

Daniel Rakita, Bilge Mutlu, and Michael Gleicher. An autonomous dynamic camera method for effective remote teleoperation. InProceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2018, Chicago, IL, USA, March 05-08, 2018, pages 325–333. ACM, 2018.DOI: 10.1145/3171221.3171279

work page doi:10.1145/3171221.3171279 2018

[39] [39]

Polydoros, Sonia Chernova, and Aude Billard

Harish Ravichandar, Athanasios S. Polydoros, Sonia Chernova, and Aude Billard. Recent advances in robot learning from demonstration.Annu. Rev. Con- trol. Robotics Auton. Syst., 3:297–330, 2020.DOI: 10.1146/ANNUREV-CONTROL-100819-063206

work page doi:10.1146/annurev-control-100819-063206 2020

[40] [40]

Dragan, Shankar Sastry, and Sanjit A

Dorsa Sadigh, Anca D. Dragan, Shankar Sastry, and Sanjit A. Seshia. Active preference-based learning of reward functions. InRobotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July 12-16, 2017, 2017.DOI: 10.15607/RSS.2017.XIII.053

work page doi:10.15607/rss.2017.xiii.053 2017

[41] [41]

Maram Sakr, Juyan Zhang, H. F. Machiel Van der Loos, Dana Kuli ´c, and Elizabeth Croft. Consistency matters: Defining demonstration data quality metrics in robot learning from demonstration.J. Hum.-Robot Interact., 15(2), December 2025.DOI: 10.1145/3773904

work page doi:10.1145/3773904 2025

[42] [42]

Correcting robot plans with natural language feedback

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. Correcting robot plans with natural language feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, 2022.DOI: 10.15607/RSS.2022.XVIII.065

work page doi:10.15607/rss.2022.xviii.065 2022

[43] [43]

Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257– 285, 1988

John Sweller. Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257– 285, 1988. ISSN 0364-0213.DOI: 10.1016/0364- 0213(88)90023-7

work page doi:10.1016/0364- 1988

[44] [44]

Princeton University Press Princeton, NJ, 1945

John V on Neumann and Oskar Morgenstern.Theory of games and economic behavior. Princeton University Press Princeton, NJ, 1945

work page 1945

[45] [45]

bad at this task

Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. Maximum entropy inverse reinforce- ment learning. InAAAI, volume 8, pages 1433–1438. Chicago, IL, USA, 2008. APPENDIX A. Implementation Details Reference Variance Distributions.To construct the reference distributionsP(σ 2 i |o i = 0, ϕ i)andP(σ 2 i |o i = 1, ϕ i)for each featureϕ i ∈ϕ,...

work page 2008