pith. sign in

arxiv: 2511.17855 · v4 · pith:QKJPCUZFnew · submitted 2025-11-22 · 💻 cs.AI · cs.RO

QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents

Pith reviewed 2026-05-21 17:54 UTC · model grok-4.3

classification 💻 cs.AI cs.RO
keywords reward learninglanguage-action fusionBayesian inferencesemi-autonomous agentshuman-robot interactionpreference learningmultimodal feedback
0
0 comments X

The pith

QuickLAP treats language as a probabilistic observation of latent preferences to fuse with physical corrections in a closed-form Bayesian update for real-time reward learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robots receive feedback that is either grounded but ambiguous in intent from physical corrections or high-level but ungrounded from language. QuickLAP fuses both modalities by using large language models to extract reward feature attention masks and preference shifts from free-form utterances, then integrates these as observations in a Bayesian framework with physical feedback. This produces a real-time update rule that handles ambiguity and reduces reward learning error substantially in a semi-autonomous driving simulator. A user study with fifteen participants shows the resulting behaviors are rated more understandable and collaborative, and are preferred over physical-only or heuristic baselines.

Core claim

The paper establishes that language can be modeled as a probabilistic observation over the user's latent reward preferences, allowing a Bayesian update that combines LLM-parsed attention masks and preference shifts with physical corrections to infer accurate reward functions quickly and robustly, achieving over 70 percent lower learning error than single-modality or heuristic baselines.

What carries the argument

The closed-form Bayesian update rule that treats language-derived reward feature attention masks and preference shifts as probabilistic observations over latent preferences.

If this is right

  • Semi-autonomous agents can adapt their behavior in real time to ambiguous multimodal feedback without requiring extensive physical demonstrations.
  • The learned reward functions produce trajectories that users rate as more understandable and collaborative.
  • Preference shifts expressed in language can be directly incorporated into ongoing physical correction updates.
  • The framework scales to handling mixed feedback in dynamic environments like driving simulators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion approach could apply to other domains such as robotic manipulation where language clarifies goals during physical guidance.
  • Reducing reliance on purely physical feedback might lower the cognitive load on human operators in long sessions.
  • If LLM extraction quality improves over time, the method could generalize to less structured language without retraining the Bayesian core.

Load-bearing premise

Large language models can reliably extract accurate reward feature attention masks and preference shifts from free-form user utterances without introducing substantial bias or error.

What would settle it

An experiment in the same driving simulator where LLM extractions from utterances are deliberately noisy or biased, resulting in reward learning error no lower than physical-only baselines.

Figures

Figures reproduced from arXiv: 2511.17855 by Andreea Bobu, David Lee, Jordan Abi Nader, Nathaniel Dennler.

Figure 1
Figure 1. Figure 1: (Top) User Study Setup. Participants controlled the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example scenarios created from our four exper [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: We ran our experiments on a single CPU and used up to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of adaptation methods across different environments for 4 interventions per episode. (a) Bars represent [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: User study results. All error bars represent standard error. (a) Average ratings for [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Graphical model for QuickLAP. The robot opti [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Trade-off between physical correction weight ( [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces QuickLAP, a Bayesian framework for real-time reward learning in semi-autonomous agents that fuses physical corrections with language feedback. LLMs extract reward feature attention masks and preference shifts from free-form utterances, which are integrated via a closed-form update rule. In a semi-autonomous driving simulator, it reports over 70% reduction in reward learning error versus physical-only and heuristic multimodal baselines. A 15-participant user study finds the approach more understandable, collaborative, and preferable, with code released at a GitHub repository.

Significance. If the LLM extraction step proves reliable, the work offers a practical advance in multimodal preference learning for human-robot interaction, enabling faster and more natural reward inference than unimodal baselines. The closed-form update and public code are strengths that support reproducibility and potential adoption. The significance is limited by the absence of quantified validation for the LLM component, which directly affects whether the reported error reductions generalize.

major comments (2)
  1. [Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).
  2. [Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.
minor comments (2)
  1. [User Study] The user-study section would benefit from explicit reporting of statistical tests (e.g., p-values or effect sizes) and the precise wording of preference questions to allow independent assessment of the qualitative findings.
  2. [Notation and Preliminaries] Notation for the attention mask and preference shift variables should be defined once in the main text with a clear mapping to the LLM prompt template.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We have reviewed the major comments concerning the evaluation of the LLM component and the robustness of the update rule. We provide detailed responses below and will make revisions to address these points.

read point-by-point responses
  1. Referee: [Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).

    Authors: We agree with the referee that additional validation of the LLM extraction step would strengthen the paper. Although our simulator experiments and user study demonstrate the overall benefits of the multimodal fusion, we did not include direct metrics on LLM accuracy in the original submission. In the revised version, we will add per-utterance accuracy metrics by annotating a set of utterances with human labels for feature attention masks and preference shifts, and report agreement with LLM outputs. We will also include inter-annotator agreement scores and a sensitivity analysis showing how noise in the masks affects the learning error. This will address concerns about utterance ambiguity. revision: yes

  2. Referee: [Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.

    Authors: We acknowledge that a specific robustness test for the closed-form update is valuable. To bound the effect of LLM variance, we will add an experiment in the revised manuscript that injects controlled noise into the LLM-derived masks and shifts. We will vary the noise level and report the resulting changes to the posterior mean and the reward learning error compared to baselines. This will provide quantitative bounds on how LLM inaccuracies propagate through the Bayesian update. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a Bayesian framework whose central step is a closed-form update rule that treats LLM-extracted attention masks and preference shifts as probabilistic observations to be fused with physical corrections. This update is derived from standard Bayesian inference rather than being defined in terms of the target performance metric. Empirical claims of 70% error reduction are obtained from a separate simulator evaluation against baselines and from a 15-participant user study; neither quantity is obtained by fitting parameters to the same data used to declare success nor by renaming an input as a prediction. No load-bearing self-citation, uniqueness theorem, or ansatz-smuggling step is required for the derivation to hold. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that language utterances can be treated as probabilistic observations over latent user preferences and that LLMs can extract usable feature attention and shift information from them. No explicit free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Language can be treated as a probabilistic observation over the user's latent preferences.
    Stated as the key insight that allows fusion of modalities in the Bayesian framework.

pith-pipeline@v0.9.0 · 5741 in / 1357 out tokens · 58427 ms · 2026-05-21T17:54:31.466667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

  1. [1]

    Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. InMachine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (ACM International Conference Proceeding Series, Vol. 69), Carla E. Brodley (Ed.). ACM. doi:10.1145/1015330.1015430

  2. [2]

    Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review.Journal of Human-Robot Interaction6, 1 (2017), 25–63

  3. [3]

    Losey, Marcia K

    Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. InPro- ceedings of the 2018 ACM/IEEE International Conference on Human-Robot In- teraction(Chicago, IL, USA)(HRI ’18). ACM, New York, NY, USA, 141–149. doi:10.1145/3171221.3171267

  4. [4]

    Losey, Marcia K

    Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2017. Learning Robot Objectives from Physical Human Interaction. InProceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78), Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (Eds.). PMLR, 217–226. http://proceedings.mlr.press/v78/...

  5. [5]

    Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29

  6. [6]

    Erdem Bıyık, Malayandi Palan, Nicholas C Landolfi, Dylan P Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning.arXiv preprint arXiv:1910.04365(2019)

  7. [7]

    A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. 2020. Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstra- tions and Physical Corrections.Transactions on Robotics (T-RO)(2020)

  8. [8]

    A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. 2018. Learning under Misspecified Objective Spaces. InConference on Robot Learning (CoRL)

  9. [9]

    Landon Brown, Jared Hamilton, Zhao Han, Albert Phan, Thao Phung, Eric Hansen, Nhan Tran, and Tom Williams. 2023. Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic Gestures.J. Hum.-Robot Interact.12, 1, Article 9 (Feb. 2023), 23 pages. doi:10.1145/3563387

  10. [10]

    Arthur Bucker, Luis F. C. Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, and Rogerio Bonatti. 2023. LATTE: LAnguage Trajectory TransformEr. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 7287–7294. doi:10.1109/ICRA48891. 2023.10161068

  11. [11]

    Kate Candon, Nicholas C Georgiou, Helen Zhou, Sidney Richardson, Qiping Zhang, Brian Scassellati, and Marynel Vázquez. 2024. REACT: Two datasets for analyzing both human reactions and evaluative feedback to robots over time. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 885–889

  12. [12]

    Gombolay, and Benjamin Rosman

    Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew C. Gombolay, and Benjamin Rosman. 2021. Learning to Follow Language Instruc- tions with Compositional Policies.CoRRabs/2110.04647 (2021). arXiv:2110.04647 https://arxiv.org/abs/2110.04647

  13. [13]

    Maggie A Collier, Rithika Narayan, and Henny Admoni. 2025. The sense of agency in assistive robotics using shared autonomy. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 880–888

  14. [14]

    Y. Cui, S. Karamcheti, R. Palleti, N. Shivakumar, P. Liang, and D. Sadigh. 2023. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy. InProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction(Stockholm, Sweden)(HRI ’23). Association for Com- puting Machinery, New York, NY, USA, 93–101. do...

  15. [15]

    Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The empathic framework for task learning from implicit human feedback. InConference on Robot Learning. PMLR, 604–626

  16. [16]

    Nathaniel Dennler, Stefanos Nikolaidis, and Maja Matarić. 2025. Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Prefer- ence Elicitation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 778–788. doi:10.1109/HRI61500.2025.10974136

  17. [17]

    Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, and Maja Matarić. 2024. Improving user experience in preference-based optimization of reward functions for assistive robots.arXiv preprint arXiv:2411.11182(2024)

  18. [18]

    Nathaniel Dennler, Catherine Yunis, Jonathan Realmuto, Terence Sanger, Ste- fanos Nikolaidis, and Maja Matarić. 2021. Personalizing user engagement dy- namics in a non-verbal communication game for cerebral palsy. In2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 873–879

  19. [19]

    A. D. Dragan, K. Muelling, J. Andrew Bagnell, and S. S. Srinivasa. 2015. Movement primitives via optimization. In2015 IEEE International Conference on Robotics and Automation (ICRA). 2339–2346. doi:10.1109/ICRA.2015.7139510

  20. [20]

    Tesca Fitzgerald, Pallavi Koppol, Patrick Callaghan, Russell Quinlan Jun Hei Wong, Reid Simmons, Oliver Kroemer, and Henny Admoni. 2022. INQUIRE: INteractive querying for user-aware informative REasoning. In6th Annual Con- ference on Robot Learning

  21. [21]

    García, David M

    Carlos E. García, David M. Prett, and Manfred Morari. 1989. Model predictive control: Theory and practice—A survey.Automatica25, 3 (1989), 335 – 348. doi:10.1016/0005-1098(89)90002-2

  22. [22]

    Michael Hagenow and Julie A. Shah. 2025. REALM: Real-Time Estimates of Assistance for Learned Models in Human-Robot Interaction.IEEE Robotics and Automation Letters10, 6 (2025), 5473–5480. doi:10.1109/LRA.2025.3560862

  23. [23]

    Erin Hedlund-Botti, Julianna Schalkwyk, Nina Moorman, Sanne van Waveren, Lakshmi Seelam, Chuxuan Yang, Russell Perkins, Paul Robinette, and Matthew Gombolay. 2025. Learning Interpretable Features from Interventions. InRobotics: Science and Systems (RSS)

  24. [24]

    Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Lan- guage Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, ...

  25. [25]

    Humphrey and Julie A

    Curtis M. Humphrey and Julie A. Adams. 2008. Compass visualizations for human-robotic interaction. InProceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction(Amsterdam, The Netherlands)(HRI ’08). Association for Computing Machinery, New York, NY, USA, 49–56. doi:10.1145/ 1349822.1349830

  26. [26]

    Joshi, Kyle Jeffrey, Rosario Jauregui Ruano, Jasmine Hsu, Keerthana Gopalakrish- nan, Byron David, Andy Zeng, and Chuyuan Kelly Fu

    Brian Ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar Co...

  27. [27]

    Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. 2015. Learning preferences for manipulation tasks from online coactive feedback.Int. J. Robotics Res.34, 10 (2015), 1296–1313. doi:10.1177/0278364915581193

  28. [28]

    Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. 2025. FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization. InRobotics: Science and Systems (RSS)

  29. [29]

    Emily Jensen, Sriram Sankaranarayanan, and Bradley Hayes. 2024. Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomo- tor Skills Training Outcomes in Quadrotor Teleoperation. InProceedings of the 12th International Conference on Human-Agent Interaction. 185–194

  30. [30]

    Siddharth Karamcheti, Megha Srivastava, Percy Liang, and Dorsa Sadigh. 2021. LILA: Language-Informed Latent Actions. InConference on Robot Learning, 8-11 November 2021, London, UK (Proceedings of Machine Learning Research, Vol. 164), Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.). PMLR, 1379–1390. https://proceedings.mlr.press/v164/karamcheti22a.html

  31. [31]

    Bradley Knox and Peter Stone

    W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. InProceedings of the 5th International Conference on Knowledge Capture (K-CAP 2009), September 1-4, 2009, Redondo Beach, California, USA, Yolanda Gil and Natasha Fridman Noy (Eds.). ACM, 9–16. doi:10.1145/1597735.1597738

  32. [32]

    Smith, and Pieter Abbeel

    Kimin Lee, Laura M. Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and T...

  33. [33]

    http://proceedings.mlr.press/v139/lee21i.html

  34. [34]

    Anthony Liang, Jesse Thomason, and Erdem Bıyık. 2024. Visarl: Visual rein- forcement learning guided by human saliency. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2907–2912. Abi Nader et al

  35. [35]

    Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. 2023. Safe planning in dynamic environments using conformal prediction.IEEE Robot- ics and Automation Letters(2023)

  36. [36]

    Dylan P Losey, Andrea Bajcsy, Marcia K O’Malley, and Anca D Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections.The International Journal of Robotics Research41, 1 (2022), 20–44

  37. [37]

    Corey Lynch and Pierre Sermanet. 2021. Language Conditioned Imitation Learning Over Unstructured Data. InRobotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). doi:10.15607/RSS.2021.XVII.047

  38. [38]

    Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2022. Interactive Language: Talking to Robots in Real Time.CoRRabs/2210.06407 (2022). arXiv:2210.06407 doi:10.48550/ARXIV.2210.06407

  39. [39]

    Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L

    James MacGlashan, Mark K. Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. 2017. Interactive Learning from Policy-Dependent Human Feedback. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Re...

  40. [40]

    Christoforos Mavrogiannis, Francesca Baldini, Allan Wang, Dapeng Zhao, Pete Trautman, Aaron Steinfeld, and Jean Oh. 2023. Core challenges of social robot navigation: A survey.ACM Transactions on Human-Robot Interaction12, 3 (2023), 1–39

  41. [41]

    Mehta and Dylan P

    Shaunak A. Mehta and Dylan P. Losey. 2024. Unified Learning from Demonstra- tions, Corrections, and Preferences during Physical Human-Robot Interaction. ACM Trans. Hum. Robot Interact.13, 3 (2024), 39:1–39:25. doi:10.1145/3623384

  42. [42]

    Amal Nanavati, Ethan K Gordon, Taylor A Kessler Faulkner, Yuxin Ray Song, Jonathan Ko, Tyler Schrenk, Vy Nguyen, Bernie Hao Zhu, Haya Bolotski, Atharva Kashyap, et al. 2025. Lessons Learned from Designing and Evaluating a Robot- Assisted Feeding System for Out-of-Lab Use. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE...

  43. [43]

    Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K Gupta, and Ste- fanos Nikolaidis. 2023. Transfer learning of human preferences for proactive robot assistance in assembly tasks. InProceedings of the 2023 ACM/IEEE Interna- tional Conference on Human-Robot Interaction. 575–583

  44. [44]

    Ng and Stuart Russell

    Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. InProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, Pat Langley (Ed.). Morgan Kaufmann, 663–670

  45. [45]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human f...

  46. [46]

    Ravi Pandya, Zhuoyuan Wang, Yorie Nakahira, and Changliu Liu. 2024. Towards Proactive Safe Human-Robot Collaborations via Data-Efficient Conditional Be- havior Prediction. arXiv:2311.11893 [cs.RO] https://arxiv.org/abs/2311.11893

  47. [47]

    Andi Peng, Andreea Bobu, Belinda Z Li, Theodore R Sumers, Ilia Sucholutsky, Nis- hanth Kumar, Thomas L Griffiths, and Julie A Shah. 2024. Preference-Conditioned Language-Guided Abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 572–581

  48. [48]

    Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu

    Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. 2024. Adaptive Language-Guided Abstraction from Contrastive Explanations. InConference on Robot Learning, 6-9 November 2024, Munich, Germany (Proceedings of Machine Learning Research, Vol. 270), Pulkit Agrawal, Oliver Kroemer, and Wolfram Burgard (Eds....

  49. [49]

    Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. InProceedings of the 20th International Joint Conference on Artifical Intelligence(Hyderabad, India)(IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2586–2591. http://dl.acm.org/citation.cfm?id=1625275. 1625692

  50. [50]

    Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, ...

  51. [51]

    Dorsa Sadigh, Anca D Dragan, Shankar Sastry, and Sanjit A Seshia. 2017. Active preference-based learning of reward functions. InRobotics: Science and systems

  52. [52]

    Sadigh, S

    D. Sadigh, S. Sastry, S. Seshia, and Anca D. Dragan. 2016. Planning for Au- tonomous Cars that Leverage Effects on Human Actions. InRobotics: Science and Systems

  53. [53]

    Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. 2022. Correcting Robot Plans with Natural Language Feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, Kris Hauser, Dylan A. Shell, and Shoudong Huang (Eds.). doi:10.15607/RSS....

  54. [54]

    Siebinga, A

    O. Siebinga, A. Zgonnikov, and D. Abbink. 2022. Interactive Merging Behavior in a Coupled Driving Simulator: Experimental Framework and Case Study. In Human Factors in Transportation. AHFE 2022 International Conference (AHFE Open Access, Vol. 60), Katie Plant and Gesa Praetorius (Eds.). AHFE International, USA. doi:10.54941/ahfe1002485

  55. [55]

    Sripathy, A

    A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D. S. Brown, and A. D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. InInterna- tional Conference on Intelligent Robots and Systems (IROS)

  56. [56]

    Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor

    Simon Stepputtis, Joseph Campbell, Mariano J. Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor. 2020. Language-Conditioned Imitation Learning for Robot Manipulation Tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle...

  57. [57]

    Maia Stiber, Russell Taylor, and Chien-Ming Huang. 2022. Modeling human response to robot errors for timely error detection. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 676–683

  58. [58]

    Sumers, Mark K

    Theodore R. Sumers, Mark K. Ho, Robert X. D. Hawkins, Karthik Narasimhan, and Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.CoRR abs/2009.14715 (2020). arXiv:2009.14715 https://arxiv.org/abs/2009.14715

  59. [59]

    Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. 2025. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams.arXiv preprint arXiv:2506.08892(2025)

  60. [60]

    Yiran Tao, Jehan Yang, Dan Ding, and Zackory Erickson. 2025. LAMS: LLM- Driven Automatic Mode Switching for Assistive Teleoperation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 242– 251

  61. [61]

    Walter, Ashis Gopal Banerjee, Seth J

    Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. InProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011...

  62. [62]

    Raphael Vallat. 2018. Pingouin: statistics in Python.Journal of Open Source Software3, 31 (Nov. 2018), 1026. doi:10.21105/joss.01026

  63. [63]

    Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, and Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning.CoRR abs/1805.12573 (2019). arXiv:1805.12573 http://arxiv.org/abs/1805.12573

  64. [64]

    Russell, Anca Dragan, and Erdem Bıyık

    Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, and Erdem Bıyık. 2024. Trajectory Improvement and Reward Learning from Comparative Language Feedback. arXiv:2410.06401 [cs.RO] https://arxiv.org/abs/2410.06401

  65. [65]

    Michelle Zhao, Reid Simmons, Henny Admoni, and Andrea Bajcsy. 2024. Confor- malized teleoperation: Confidently mapping human inputs to high-dimensional robot actions.arXiv preprint arXiv:2406.07767(2024)

  66. [66]

    Ziebart, Andrew Maas, J

    Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maxi- mum Entropy Inverse Reinforcement Learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3(Chicago, Illinois)(AAAI’08). AAAI Press, 1433–1438. http://dl.acm.org/citation.cfm?id=1620270.1620297

  67. [67]

    Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

    Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. InIEEE Inter- national Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021. IEEE, 2783–2789. doi:10.1109/ICRA48506.2021.9561839 QuickLAP: Quick Language–Action Preference Learning for Aut...

  68. [68]

    In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention

    How relevant is this feature to the intervention ? ( gate score 0.0 or 1.0) B.2 Preference Language Model (LM pref) System Message: You are an expert in autonomous vehicle control analyzing driver interventions . In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention . Your r...

  69. [69]

    What absolute change with direction ( this will be your 'mu') would support this intervention ? Consider the scale of the features , and the current weights

  70. [70]

    Be careful

    How confident are you in your decision ? ( confidence score 0.0 -1.0) B.3 LLM Configuration Parameters The table below describes the parameters we used for the LLM experiments using the OpenAI API. Table 1: LLM API Configuration Settings Parameter LM att LMpref Model gpt-4o gpt-4o Temperature 0.1 0.3 Response Format JSON JSON Max Tokens default default C ...