QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents

Andreea Bobu; David Lee; Jordan Abi Nader; Nathaniel Dennler

arxiv: 2511.17855 · v4 · pith:QKJPCUZFnew · submitted 2025-11-22 · 💻 cs.AI · cs.RO

QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents

Jordan Abi Nader , David Lee , Nathaniel Dennler , Andreea Bobu This is my paper

Pith reviewed 2026-05-21 17:54 UTC · model grok-4.3

classification 💻 cs.AI cs.RO

keywords reward learninglanguage-action fusionBayesian inferencesemi-autonomous agentshuman-robot interactionpreference learningmultimodal feedback

0 comments

The pith

QuickLAP treats language as a probabilistic observation of latent preferences to fuse with physical corrections in a closed-form Bayesian update for real-time reward learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robots receive feedback that is either grounded but ambiguous in intent from physical corrections or high-level but ungrounded from language. QuickLAP fuses both modalities by using large language models to extract reward feature attention masks and preference shifts from free-form utterances, then integrates these as observations in a Bayesian framework with physical feedback. This produces a real-time update rule that handles ambiguity and reduces reward learning error substantially in a semi-autonomous driving simulator. A user study with fifteen participants shows the resulting behaviors are rated more understandable and collaborative, and are preferred over physical-only or heuristic baselines.

Core claim

The paper establishes that language can be modeled as a probabilistic observation over the user's latent reward preferences, allowing a Bayesian update that combines LLM-parsed attention masks and preference shifts with physical corrections to infer accurate reward functions quickly and robustly, achieving over 70 percent lower learning error than single-modality or heuristic baselines.

What carries the argument

The closed-form Bayesian update rule that treats language-derived reward feature attention masks and preference shifts as probabilistic observations over latent preferences.

If this is right

Semi-autonomous agents can adapt their behavior in real time to ambiguous multimodal feedback without requiring extensive physical demonstrations.
The learned reward functions produce trajectories that users rate as more understandable and collaborative.
Preference shifts expressed in language can be directly incorporated into ongoing physical correction updates.
The framework scales to handling mixed feedback in dynamic environments like driving simulators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion approach could apply to other domains such as robotic manipulation where language clarifies goals during physical guidance.
Reducing reliance on purely physical feedback might lower the cognitive load on human operators in long sessions.
If LLM extraction quality improves over time, the method could generalize to less structured language without retraining the Bayesian core.

Load-bearing premise

Large language models can reliably extract accurate reward feature attention masks and preference shifts from free-form user utterances without introducing substantial bias or error.

What would settle it

An experiment in the same driving simulator where LLM extractions from utterances are deliberately noisy or biased, resulting in reward learning error no lower than physical-only baselines.

Figures

Figures reproduced from arXiv: 2511.17855 by Andreea Bobu, David Lee, Jordan Abi Nader, Nathaniel Dennler.

**Figure 2.** Figure 2: Example scenarios created from our four exper [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: We ran our experiments on a single CPU and used up to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 3.** Figure 3: Comparison of adaptation methods across different environments for 4 interventions per episode. (a) Bars represent [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: User study results. All error bars represent standard error. (a) Average ratings for [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Graphical model for QuickLAP. The robot opti [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Trade-off between physical correction weight ( [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QuickLAP fuses LLM-derived language observations with physical corrections via closed-form Bayesian update for real-time reward learning, but the lack of error bounds on the LLM masks is the main open question.

read the letter

The main point is that QuickLAP treats free-form language as a probabilistic observation over latent preferences, pulls feature attention masks and shifts out with an LLM, and folds everything into a closed-form Bayesian update with physical corrections. That specific pipeline looks new relative to the physical-only and heuristic baselines they compare against. The closed-form rule is a practical plus because it supports real-time use without heavy optimization each step. In the driving simulator they report over 70% lower reward error and a 15-person study where users rated it more understandable and collaborative. Code release helps too. The soft spot is exactly what the stress test flags: no per-utterance accuracy numbers, no human agreement checks on the masks, and no sensitivity runs showing how LLM noise moves the posterior. If the model systematically mis-weights safety versus comfort on ambiguous utterances, the claimed gains could shrink. The abstract and available details do not show those checks, so the robustness claim rests on an unquantified assumption. This is for people working on multimodal preference learning and semi-autonomous systems who need fast adaptation. A reader focused on Bayesian HRI methods would find the update rule and simulator results worth looking at. I would bring it to a reading group to talk through the multimodal fusion. I would cite the framework if I were extending similar work. It deserves peer review so referees can press on the LLM validation and statistical details.

Referee Report

2 major / 2 minor

Summary. The paper introduces QuickLAP, a Bayesian framework for real-time reward learning in semi-autonomous agents that fuses physical corrections with language feedback. LLMs extract reward feature attention masks and preference shifts from free-form utterances, which are integrated via a closed-form update rule. In a semi-autonomous driving simulator, it reports over 70% reduction in reward learning error versus physical-only and heuristic multimodal baselines. A 15-participant user study finds the approach more understandable, collaborative, and preferable, with code released at a GitHub repository.

Significance. If the LLM extraction step proves reliable, the work offers a practical advance in multimodal preference learning for human-robot interaction, enabling faster and more natural reward inference than unimodal baselines. The closed-form update and public code are strengths that support reproducibility and potential adoption. The significance is limited by the absence of quantified validation for the LLM component, which directly affects whether the reported error reductions generalize.

major comments (2)

[Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).
[Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.

minor comments (2)

[User Study] The user-study section would benefit from explicit reporting of statistical tests (e.g., p-values or effect sizes) and the precise wording of preference questions to allow independent assessment of the qualitative findings.
[Notation and Preliminaries] Notation for the attention mask and preference shift variables should be defined once in the main text with a clear mapping to the LLM prompt template.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We have reviewed the major comments concerning the evaluation of the LLM component and the robustness of the update rule. We provide detailed responses below and will make revisions to address these points.

read point-by-point responses

Referee: [Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).

Authors: We agree with the referee that additional validation of the LLM extraction step would strengthen the paper. Although our simulator experiments and user study demonstrate the overall benefits of the multimodal fusion, we did not include direct metrics on LLM accuracy in the original submission. In the revised version, we will add per-utterance accuracy metrics by annotating a set of utterances with human labels for feature attention masks and preference shifts, and report agreement with LLM outputs. We will also include inter-annotator agreement scores and a sensitivity analysis showing how noise in the masks affects the learning error. This will address concerns about utterance ambiguity. revision: yes
Referee: [Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.

Authors: We acknowledge that a specific robustness test for the closed-form update is valuable. To bound the effect of LLM variance, we will add an experiment in the revised manuscript that injects controlled noise into the LLM-derived masks and shifts. We will vary the noise level and report the resulting changes to the posterior mean and the reward learning error compared to baselines. This will provide quantitative bounds on how LLM inaccuracies propagate through the Bayesian update. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a Bayesian framework whose central step is a closed-form update rule that treats LLM-extracted attention masks and preference shifts as probabilistic observations to be fused with physical corrections. This update is derived from standard Bayesian inference rather than being defined in terms of the target performance metric. Empirical claims of 70% error reduction are obtained from a separate simulator evaluation against baselines and from a 15-participant user study; neither quantity is obtained by fitting parameters to the same data used to declare success nor by renaming an input as a prediction. No load-bearing self-citation, uniqueness theorem, or ansatz-smuggling step is required for the derivation to hold. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that language utterances can be treated as probabilistic observations over latent user preferences and that LLMs can extract usable feature attention and shift information from them. No explicit free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Language can be treated as a probabilistic observation over the user's latent preferences.
Stated as the key insight that allows fusion of modalities in the Bayesian framework.

pith-pipeline@v0.9.0 · 5741 in / 1357 out tokens · 58427 ms · 2026-05-21T17:54:31.466667+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the closed-form update: ˆθ_{t+1,i} = ˆθ_{t,i} + σ²_{L,i} ΔΦ_i + μ_t,i / (Λ_prior,i σ²_{L,i} + 1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

[1]

Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. InMachine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (ACM International Conference Proceeding Series, Vol. 69), Carla E. Brodley (Ed.). ACM. doi:10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430 2004
[2]

Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review.Journal of Human-Robot Interaction6, 1 (2017), 25–63

work page 2017
[3]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. InPro- ceedings of the 2018 ACM/IEEE International Conference on Human-Robot In- teraction(Chicago, IL, USA)(HRI ’18). ACM, New York, NY, USA, 141–149. doi:10.1145/3171221.3171267

work page doi:10.1145/3171221.3171267 2018
[4]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2017. Learning Robot Objectives from Physical Human Interaction. InProceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78), Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (Eds.). PMLR, 217–226. http://proceedings.mlr.press/v78/...

work page 2017
[5]

Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29

work page 2007
[6]

Erdem Bıyık, Malayandi Palan, Nicholas C Landolfi, Dylan P Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning.arXiv preprint arXiv:1910.04365(2019)

work page arXiv 2019
[7]

A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. 2020. Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstra- tions and Physical Corrections.Transactions on Robotics (T-RO)(2020)

work page 2020
[8]

A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. 2018. Learning under Misspecified Objective Spaces. InConference on Robot Learning (CoRL)

work page 2018
[9]

Landon Brown, Jared Hamilton, Zhao Han, Albert Phan, Thao Phung, Eric Hansen, Nhan Tran, and Tom Williams. 2023. Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic Gestures.J. Hum.-Robot Interact.12, 1, Article 9 (Feb. 2023), 23 pages. doi:10.1145/3563387

work page doi:10.1145/3563387 2023
[10]

Arthur Bucker, Luis F. C. Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, and Rogerio Bonatti. 2023. LATTE: LAnguage Trajectory TransformEr. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 7287–7294. doi:10.1109/ICRA48891. 2023.10161068

work page doi:10.1109/icra48891 2023
[11]

Kate Candon, Nicholas C Georgiou, Helen Zhou, Sidney Richardson, Qiping Zhang, Brian Scassellati, and Marynel Vázquez. 2024. REACT: Two datasets for analyzing both human reactions and evaluative feedback to robots over time. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 885–889

work page 2024
[12]

Gombolay, and Benjamin Rosman

Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew C. Gombolay, and Benjamin Rosman. 2021. Learning to Follow Language Instruc- tions with Compositional Policies.CoRRabs/2110.04647 (2021). arXiv:2110.04647 https://arxiv.org/abs/2110.04647

work page arXiv 2021
[13]

Maggie A Collier, Rithika Narayan, and Henny Admoni. 2025. The sense of agency in assistive robotics using shared autonomy. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 880–888

work page 2025
[14]

Y. Cui, S. Karamcheti, R. Palleti, N. Shivakumar, P. Liang, and D. Sadigh. 2023. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy. InProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction(Stockholm, Sweden)(HRI ’23). Association for Com- puting Machinery, New York, NY, USA, 93–101. do...

work page doi:10.1145/3568162.3578623 2023
[15]

Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The empathic framework for task learning from implicit human feedback. InConference on Robot Learning. PMLR, 604–626

work page 2021
[16]

Nathaniel Dennler, Stefanos Nikolaidis, and Maja Matarić. 2025. Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Prefer- ence Elicitation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 778–788. doi:10.1109/HRI61500.2025.10974136

work page doi:10.1109/hri61500.2025.10974136 2025
[17]

Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, and Maja Matarić. 2024. Improving user experience in preference-based optimization of reward functions for assistive robots.arXiv preprint arXiv:2411.11182(2024)

work page arXiv 2024
[18]

Nathaniel Dennler, Catherine Yunis, Jonathan Realmuto, Terence Sanger, Ste- fanos Nikolaidis, and Maja Matarić. 2021. Personalizing user engagement dy- namics in a non-verbal communication game for cerebral palsy. In2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 873–879

work page 2021
[19]

A. D. Dragan, K. Muelling, J. Andrew Bagnell, and S. S. Srinivasa. 2015. Movement primitives via optimization. In2015 IEEE International Conference on Robotics and Automation (ICRA). 2339–2346. doi:10.1109/ICRA.2015.7139510

work page doi:10.1109/icra.2015.7139510 2015
[20]

Tesca Fitzgerald, Pallavi Koppol, Patrick Callaghan, Russell Quinlan Jun Hei Wong, Reid Simmons, Oliver Kroemer, and Henny Admoni. 2022. INQUIRE: INteractive querying for user-aware informative REasoning. In6th Annual Con- ference on Robot Learning

work page 2022
[21]

García, David M

Carlos E. García, David M. Prett, and Manfred Morari. 1989. Model predictive control: Theory and practice—A survey.Automatica25, 3 (1989), 335 – 348. doi:10.1016/0005-1098(89)90002-2

work page doi:10.1016/0005-1098(89)90002-2 1989
[22]

Michael Hagenow and Julie A. Shah. 2025. REALM: Real-Time Estimates of Assistance for Learned Models in Human-Robot Interaction.IEEE Robotics and Automation Letters10, 6 (2025), 5473–5480. doi:10.1109/LRA.2025.3560862

work page doi:10.1109/lra.2025.3560862 2025
[23]

Erin Hedlund-Botti, Julianna Schalkwyk, Nina Moorman, Sanne van Waveren, Lakshmi Seelam, Chuxuan Yang, Russell Perkins, Paul Robinette, and Matthew Gombolay. 2025. Learning Interpretable Features from Interventions. InRobotics: Science and Systems (RSS)

work page 2025
[24]

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Lan- guage Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, ...

work page 2022
[25]

Humphrey and Julie A

Curtis M. Humphrey and Julie A. Adams. 2008. Compass visualizations for human-robotic interaction. InProceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction(Amsterdam, The Netherlands)(HRI ’08). Association for Computing Machinery, New York, NY, USA, 49–56. doi:10.1145/ 1349822.1349830

work page arXiv 2008
[26]

Joshi, Kyle Jeffrey, Rosario Jauregui Ruano, Jasmine Hsu, Keerthana Gopalakrish- nan, Byron David, Andy Zeng, and Chuyuan Kelly Fu

Brian Ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar Co...

work page 2022
[27]

Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. 2015. Learning preferences for manipulation tasks from online coactive feedback.Int. J. Robotics Res.34, 10 (2015), 1296–1313. doi:10.1177/0278364915581193

work page doi:10.1177/0278364915581193 2015
[28]

Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. 2025. FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization. InRobotics: Science and Systems (RSS)

work page 2025
[29]

Emily Jensen, Sriram Sankaranarayanan, and Bradley Hayes. 2024. Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomo- tor Skills Training Outcomes in Quadrotor Teleoperation. InProceedings of the 12th International Conference on Human-Agent Interaction. 185–194

work page 2024
[30]

Siddharth Karamcheti, Megha Srivastava, Percy Liang, and Dorsa Sadigh. 2021. LILA: Language-Informed Latent Actions. InConference on Robot Learning, 8-11 November 2021, London, UK (Proceedings of Machine Learning Research, Vol. 164), Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.). PMLR, 1379–1390. https://proceedings.mlr.press/v164/karamcheti22a.html

work page 2021
[31]

Bradley Knox and Peter Stone

W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. InProceedings of the 5th International Conference on Knowledge Capture (K-CAP 2009), September 1-4, 2009, Redondo Beach, California, USA, Yolanda Gil and Natasha Fridman Noy (Eds.). ACM, 9–16. doi:10.1145/1597735.1597738

work page doi:10.1145/1597735.1597738 2009
[32]

Smith, and Pieter Abbeel

Kimin Lee, Laura M. Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and T...

work page 2021
[33]

http://proceedings.mlr.press/v139/lee21i.html

work page
[34]

Anthony Liang, Jesse Thomason, and Erdem Bıyık. 2024. Visarl: Visual rein- forcement learning guided by human saliency. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2907–2912. Abi Nader et al

work page 2024
[35]

Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. 2023. Safe planning in dynamic environments using conformal prediction.IEEE Robot- ics and Automation Letters(2023)

work page 2023
[36]

Dylan P Losey, Andrea Bajcsy, Marcia K O’Malley, and Anca D Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections.The International Journal of Robotics Research41, 1 (2022), 20–44

work page 2022
[37]

Corey Lynch and Pierre Sermanet. 2021. Language Conditioned Imitation Learning Over Unstructured Data. InRobotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). doi:10.15607/RSS.2021.XVII.047

work page doi:10.15607/rss.2021.xvii.047 2021
[38]

Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2022. Interactive Language: Talking to Robots in Real Time.CoRRabs/2210.06407 (2022). arXiv:2210.06407 doi:10.48550/ARXIV.2210.06407

work page doi:10.48550/arxiv.2210.06407 2022
[39]

Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L

James MacGlashan, Mark K. Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. 2017. Interactive Learning from Policy-Dependent Human Feedback. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Re...

work page 2017
[40]

Christoforos Mavrogiannis, Francesca Baldini, Allan Wang, Dapeng Zhao, Pete Trautman, Aaron Steinfeld, and Jean Oh. 2023. Core challenges of social robot navigation: A survey.ACM Transactions on Human-Robot Interaction12, 3 (2023), 1–39

work page 2023
[41]

Mehta and Dylan P

Shaunak A. Mehta and Dylan P. Losey. 2024. Unified Learning from Demonstra- tions, Corrections, and Preferences during Physical Human-Robot Interaction. ACM Trans. Hum. Robot Interact.13, 3 (2024), 39:1–39:25. doi:10.1145/3623384

work page doi:10.1145/3623384 2024
[42]

Amal Nanavati, Ethan K Gordon, Taylor A Kessler Faulkner, Yuxin Ray Song, Jonathan Ko, Tyler Schrenk, Vy Nguyen, Bernie Hao Zhu, Haya Bolotski, Atharva Kashyap, et al. 2025. Lessons Learned from Designing and Evaluating a Robot- Assisted Feeding System for Out-of-Lab Use. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE...

work page 2025
[43]

Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K Gupta, and Ste- fanos Nikolaidis. 2023. Transfer learning of human preferences for proactive robot assistance in assembly tasks. InProceedings of the 2023 ACM/IEEE Interna- tional Conference on Human-Robot Interaction. 575–583

work page 2023
[44]

Ng and Stuart Russell

Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. InProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, Pat Langley (Ed.). Morgan Kaufmann, 663–670

work page 2000
[45]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human f...

work page 2022
[46]

Ravi Pandya, Zhuoyuan Wang, Yorie Nakahira, and Changliu Liu. 2024. Towards Proactive Safe Human-Robot Collaborations via Data-Efficient Conditional Be- havior Prediction. arXiv:2311.11893 [cs.RO] https://arxiv.org/abs/2311.11893

work page arXiv 2024
[47]

Andi Peng, Andreea Bobu, Belinda Z Li, Theodore R Sumers, Ilia Sucholutsky, Nis- hanth Kumar, Thomas L Griffiths, and Julie A Shah. 2024. Preference-Conditioned Language-Guided Abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 572–581

work page 2024
[48]

Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. 2024. Adaptive Language-Guided Abstraction from Contrastive Explanations. InConference on Robot Learning, 6-9 November 2024, Munich, Germany (Proceedings of Machine Learning Research, Vol. 270), Pulkit Agrawal, Oliver Kroemer, and Wolfram Burgard (Eds....

work page 2024
[49]

Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. InProceedings of the 20th International Joint Conference on Artifical Intelligence(Hyderabad, India)(IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2586–2591. http://dl.acm.org/citation.cfm?id=1625275. 1625692

work page 2007
[50]

Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, ...

work page 2023
[51]

Dorsa Sadigh, Anca D Dragan, Shankar Sastry, and Sanjit A Seshia. 2017. Active preference-based learning of reward functions. InRobotics: Science and systems

work page 2017
[52]

Sadigh, S

D. Sadigh, S. Sastry, S. Seshia, and Anca D. Dragan. 2016. Planning for Au- tonomous Cars that Leverage Effects on Human Actions. InRobotics: Science and Systems

work page 2016
[53]

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. 2022. Correcting Robot Plans with Natural Language Feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, Kris Hauser, Dylan A. Shell, and Shoudong Huang (Eds.). doi:10.15607/RSS....

work page doi:10.15607/rss.2022.xviii.065 2022
[54]

Siebinga, A

O. Siebinga, A. Zgonnikov, and D. Abbink. 2022. Interactive Merging Behavior in a Coupled Driving Simulator: Experimental Framework and Case Study. In Human Factors in Transportation. AHFE 2022 International Conference (AHFE Open Access, Vol. 60), Katie Plant and Gesa Praetorius (Eds.). AHFE International, USA. doi:10.54941/ahfe1002485

work page doi:10.54941/ahfe1002485 2022
[55]

Sripathy, A

A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D. S. Brown, and A. D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. InInterna- tional Conference on Intelligent Robots and Systems (IROS)

work page 2022
[56]

Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor

Simon Stepputtis, Joseph Campbell, Mariano J. Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor. 2020. Language-Conditioned Imitation Learning for Robot Manipulation Tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle...

work page arXiv 2020
[57]

Maia Stiber, Russell Taylor, and Chien-Ming Huang. 2022. Modeling human response to robot errors for timely error detection. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 676–683

work page 2022
[58]

Sumers, Mark K

Theodore R. Sumers, Mark K. Ho, Robert X. D. Hawkins, Karthik Narasimhan, and Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.CoRR abs/2009.14715 (2020). arXiv:2009.14715 https://arxiv.org/abs/2009.14715

work page arXiv 2020
[59]

Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. 2025. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams.arXiv preprint arXiv:2506.08892(2025)

work page arXiv 2025
[60]

Yiran Tao, Jehan Yang, Dan Ding, and Zackory Erickson. 2025. LAMS: LLM- Driven Automatic Mode Switching for Assistive Teleoperation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 242– 251

work page 2025
[61]

Walter, Ashis Gopal Banerjee, Seth J

Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. InProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011...

work page doi:10.1609/aaai.v25i1.7979 2011
[62]

Raphael Vallat. 2018. Pingouin: statistics in Python.Journal of Open Source Software3, 31 (Nov. 2018), 1026. doi:10.21105/joss.01026

work page doi:10.21105/joss.01026 2018
[63]

Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, and Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning.CoRR abs/1805.12573 (2019). arXiv:1805.12573 http://arxiv.org/abs/1805.12573

work page arXiv 2019
[64]

Russell, Anca Dragan, and Erdem Bıyık

Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, and Erdem Bıyık. 2024. Trajectory Improvement and Reward Learning from Comparative Language Feedback. arXiv:2410.06401 [cs.RO] https://arxiv.org/abs/2410.06401

work page arXiv 2024
[65]

Michelle Zhao, Reid Simmons, Henny Admoni, and Andrea Bajcsy. 2024. Confor- malized teleoperation: Confidently mapping human inputs to high-dimensional robot actions.arXiv preprint arXiv:2406.07767(2024)

work page arXiv 2024
[66]

Ziebart, Andrew Maas, J

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maxi- mum Entropy Inverse Reinforcement Learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3(Chicago, Illinois)(AAAI’08). AAAI Press, 1433–1438. http://dl.acm.org/citation.cfm?id=1620270.1620297

work page arXiv 2008
[67]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. InIEEE Inter- national Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021. IEEE, 2783–2789. doi:10.1109/ICRA48506.2021.9561839 QuickLAP: Quick Language–Action Preference Learning for Aut...

work page doi:10.1109/icra48506.2021.9561839 2021
[68]

In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention

How relevant is this feature to the intervention ? ( gate score 0.0 or 1.0) B.2 Preference Language Model (LM pref) System Message: You are an expert in autonomous vehicle control analyzing driver interventions . In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention . Your r...

work page
[69]

What absolute change with direction ( this will be your 'mu') would support this intervention ? Consider the scale of the features , and the current weights

work page
[70]

Be careful

How confident are you in your decision ? ( confidence score 0.0 -1.0) B.3 LLM Configuration Parameters The table below describes the parameters we used for the LLM experiments using the OpenAI API. Table 1: LLM API Configuration Settings Parameter LM att LMpref Model gpt-4o gpt-4o Temperature 0.1 0.3 Response Format JSON JSON Max Tokens default default C ...

work page arXiv 2071

[1] [1]

Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. InMachine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (ACM International Conference Proceeding Series, Vol. 69), Carla E. Brodley (Ed.). ACM. doi:10.1145/1015330.1015430

work page doi:10.1145/1015330.1015430 2004

[2] [2]

Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review.Journal of Human-Robot Interaction6, 1 (2017), 25–63

work page 2017

[3] [3]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. InPro- ceedings of the 2018 ACM/IEEE International Conference on Human-Robot In- teraction(Chicago, IL, USA)(HRI ’18). ACM, New York, NY, USA, 141–149. doi:10.1145/3171221.3171267

work page doi:10.1145/3171221.3171267 2018

[4] [4]

Losey, Marcia K

Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2017. Learning Robot Objectives from Physical Human Interaction. InProceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78), Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (Eds.). PMLR, 217–226. http://proceedings.mlr.press/v78/...

work page 2017

[5] [5]

Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29

work page 2007

[6] [6]

Erdem Bıyık, Malayandi Palan, Nicholas C Landolfi, Dylan P Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning.arXiv preprint arXiv:1910.04365(2019)

work page arXiv 2019

[7] [7]

A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. 2020. Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstra- tions and Physical Corrections.Transactions on Robotics (T-RO)(2020)

work page 2020

[8] [8]

A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. 2018. Learning under Misspecified Objective Spaces. InConference on Robot Learning (CoRL)

work page 2018

[9] [9]

Landon Brown, Jared Hamilton, Zhao Han, Albert Phan, Thao Phung, Eric Hansen, Nhan Tran, and Tom Williams. 2023. Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic Gestures.J. Hum.-Robot Interact.12, 1, Article 9 (Feb. 2023), 23 pages. doi:10.1145/3563387

work page doi:10.1145/3563387 2023

[10] [10]

Arthur Bucker, Luis F. C. Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, and Rogerio Bonatti. 2023. LATTE: LAnguage Trajectory TransformEr. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 7287–7294. doi:10.1109/ICRA48891. 2023.10161068

work page doi:10.1109/icra48891 2023

[11] [11]

Kate Candon, Nicholas C Georgiou, Helen Zhou, Sidney Richardson, Qiping Zhang, Brian Scassellati, and Marynel Vázquez. 2024. REACT: Two datasets for analyzing both human reactions and evaluative feedback to robots over time. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 885–889

work page 2024

[12] [12]

Gombolay, and Benjamin Rosman

Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew C. Gombolay, and Benjamin Rosman. 2021. Learning to Follow Language Instruc- tions with Compositional Policies.CoRRabs/2110.04647 (2021). arXiv:2110.04647 https://arxiv.org/abs/2110.04647

work page arXiv 2021

[13] [13]

Maggie A Collier, Rithika Narayan, and Henny Admoni. 2025. The sense of agency in assistive robotics using shared autonomy. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 880–888

work page 2025

[14] [14]

Y. Cui, S. Karamcheti, R. Palleti, N. Shivakumar, P. Liang, and D. Sadigh. 2023. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy. InProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction(Stockholm, Sweden)(HRI ’23). Association for Com- puting Machinery, New York, NY, USA, 93–101. do...

work page doi:10.1145/3568162.3578623 2023

[15] [15]

Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The empathic framework for task learning from implicit human feedback. InConference on Robot Learning. PMLR, 604–626

work page 2021

[16] [16]

Nathaniel Dennler, Stefanos Nikolaidis, and Maja Matarić. 2025. Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Prefer- ence Elicitation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 778–788. doi:10.1109/HRI61500.2025.10974136

work page doi:10.1109/hri61500.2025.10974136 2025

[17] [17]

Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, and Maja Matarić. 2024. Improving user experience in preference-based optimization of reward functions for assistive robots.arXiv preprint arXiv:2411.11182(2024)

work page arXiv 2024

[18] [18]

Nathaniel Dennler, Catherine Yunis, Jonathan Realmuto, Terence Sanger, Ste- fanos Nikolaidis, and Maja Matarić. 2021. Personalizing user engagement dy- namics in a non-verbal communication game for cerebral palsy. In2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 873–879

work page 2021

[19] [19]

A. D. Dragan, K. Muelling, J. Andrew Bagnell, and S. S. Srinivasa. 2015. Movement primitives via optimization. In2015 IEEE International Conference on Robotics and Automation (ICRA). 2339–2346. doi:10.1109/ICRA.2015.7139510

work page doi:10.1109/icra.2015.7139510 2015

[20] [20]

Tesca Fitzgerald, Pallavi Koppol, Patrick Callaghan, Russell Quinlan Jun Hei Wong, Reid Simmons, Oliver Kroemer, and Henny Admoni. 2022. INQUIRE: INteractive querying for user-aware informative REasoning. In6th Annual Con- ference on Robot Learning

work page 2022

[21] [21]

García, David M

Carlos E. García, David M. Prett, and Manfred Morari. 1989. Model predictive control: Theory and practice—A survey.Automatica25, 3 (1989), 335 – 348. doi:10.1016/0005-1098(89)90002-2

work page doi:10.1016/0005-1098(89)90002-2 1989

[22] [22]

Michael Hagenow and Julie A. Shah. 2025. REALM: Real-Time Estimates of Assistance for Learned Models in Human-Robot Interaction.IEEE Robotics and Automation Letters10, 6 (2025), 5473–5480. doi:10.1109/LRA.2025.3560862

work page doi:10.1109/lra.2025.3560862 2025

[23] [23]

Erin Hedlund-Botti, Julianna Schalkwyk, Nina Moorman, Sanne van Waveren, Lakshmi Seelam, Chuxuan Yang, Russell Perkins, Paul Robinette, and Matthew Gombolay. 2025. Learning Interpretable Features from Interventions. InRobotics: Science and Systems (RSS)

work page 2025

[24] [24]

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Lan- guage Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, ...

work page 2022

[25] [25]

Humphrey and Julie A

Curtis M. Humphrey and Julie A. Adams. 2008. Compass visualizations for human-robotic interaction. InProceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction(Amsterdam, The Netherlands)(HRI ’08). Association for Computing Machinery, New York, NY, USA, 49–56. doi:10.1145/ 1349822.1349830

work page arXiv 2008

[26] [26]

Joshi, Kyle Jeffrey, Rosario Jauregui Ruano, Jasmine Hsu, Keerthana Gopalakrish- nan, Byron David, Andy Zeng, and Chuyuan Kelly Fu

Brian Ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar Co...

work page 2022

[27] [27]

Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. 2015. Learning preferences for manipulation tasks from online coactive feedback.Int. J. Robotics Res.34, 10 (2015), 1296–1313. doi:10.1177/0278364915581193

work page doi:10.1177/0278364915581193 2015

[28] [28]

Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. 2025. FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization. InRobotics: Science and Systems (RSS)

work page 2025

[29] [29]

Emily Jensen, Sriram Sankaranarayanan, and Bradley Hayes. 2024. Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomo- tor Skills Training Outcomes in Quadrotor Teleoperation. InProceedings of the 12th International Conference on Human-Agent Interaction. 185–194

work page 2024

[30] [30]

Siddharth Karamcheti, Megha Srivastava, Percy Liang, and Dorsa Sadigh. 2021. LILA: Language-Informed Latent Actions. InConference on Robot Learning, 8-11 November 2021, London, UK (Proceedings of Machine Learning Research, Vol. 164), Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.). PMLR, 1379–1390. https://proceedings.mlr.press/v164/karamcheti22a.html

work page 2021

[31] [31]

Bradley Knox and Peter Stone

W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. InProceedings of the 5th International Conference on Knowledge Capture (K-CAP 2009), September 1-4, 2009, Redondo Beach, California, USA, Yolanda Gil and Natasha Fridman Noy (Eds.). ACM, 9–16. doi:10.1145/1597735.1597738

work page doi:10.1145/1597735.1597738 2009

[32] [32]

Smith, and Pieter Abbeel

Kimin Lee, Laura M. Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and T...

work page 2021

[33] [33]

http://proceedings.mlr.press/v139/lee21i.html

work page

[34] [34]

Anthony Liang, Jesse Thomason, and Erdem Bıyık. 2024. Visarl: Visual rein- forcement learning guided by human saliency. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2907–2912. Abi Nader et al

work page 2024

[35] [35]

Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. 2023. Safe planning in dynamic environments using conformal prediction.IEEE Robot- ics and Automation Letters(2023)

work page 2023

[36] [36]

Dylan P Losey, Andrea Bajcsy, Marcia K O’Malley, and Anca D Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections.The International Journal of Robotics Research41, 1 (2022), 20–44

work page 2022

[37] [37]

Corey Lynch and Pierre Sermanet. 2021. Language Conditioned Imitation Learning Over Unstructured Data. InRobotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). doi:10.15607/RSS.2021.XVII.047

work page doi:10.15607/rss.2021.xvii.047 2021

[38] [38]

Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2022. Interactive Language: Talking to Robots in Real Time.CoRRabs/2210.06407 (2022). arXiv:2210.06407 doi:10.48550/ARXIV.2210.06407

work page doi:10.48550/arxiv.2210.06407 2022

[39] [39]

Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L

James MacGlashan, Mark K. Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. 2017. Interactive Learning from Policy-Dependent Human Feedback. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Re...

work page 2017

[40] [40]

Christoforos Mavrogiannis, Francesca Baldini, Allan Wang, Dapeng Zhao, Pete Trautman, Aaron Steinfeld, and Jean Oh. 2023. Core challenges of social robot navigation: A survey.ACM Transactions on Human-Robot Interaction12, 3 (2023), 1–39

work page 2023

[41] [41]

Mehta and Dylan P

Shaunak A. Mehta and Dylan P. Losey. 2024. Unified Learning from Demonstra- tions, Corrections, and Preferences during Physical Human-Robot Interaction. ACM Trans. Hum. Robot Interact.13, 3 (2024), 39:1–39:25. doi:10.1145/3623384

work page doi:10.1145/3623384 2024

[42] [42]

Amal Nanavati, Ethan K Gordon, Taylor A Kessler Faulkner, Yuxin Ray Song, Jonathan Ko, Tyler Schrenk, Vy Nguyen, Bernie Hao Zhu, Haya Bolotski, Atharva Kashyap, et al. 2025. Lessons Learned from Designing and Evaluating a Robot- Assisted Feeding System for Out-of-Lab Use. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE...

work page 2025

[43] [43]

Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K Gupta, and Ste- fanos Nikolaidis. 2023. Transfer learning of human preferences for proactive robot assistance in assembly tasks. InProceedings of the 2023 ACM/IEEE Interna- tional Conference on Human-Robot Interaction. 575–583

work page 2023

[44] [44]

Ng and Stuart Russell

Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. InProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, Pat Langley (Ed.). Morgan Kaufmann, 663–670

work page 2000

[45] [45]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human f...

work page 2022

[46] [46]

Ravi Pandya, Zhuoyuan Wang, Yorie Nakahira, and Changliu Liu. 2024. Towards Proactive Safe Human-Robot Collaborations via Data-Efficient Conditional Be- havior Prediction. arXiv:2311.11893 [cs.RO] https://arxiv.org/abs/2311.11893

work page arXiv 2024

[47] [47]

Andi Peng, Andreea Bobu, Belinda Z Li, Theodore R Sumers, Ilia Sucholutsky, Nis- hanth Kumar, Thomas L Griffiths, and Julie A Shah. 2024. Preference-Conditioned Language-Guided Abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 572–581

work page 2024

[48] [48]

Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. 2024. Adaptive Language-Guided Abstraction from Contrastive Explanations. InConference on Robot Learning, 6-9 November 2024, Munich, Germany (Proceedings of Machine Learning Research, Vol. 270), Pulkit Agrawal, Oliver Kroemer, and Wolfram Burgard (Eds....

work page 2024

[49] [49]

Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. InProceedings of the 20th International Joint Conference on Artifical Intelligence(Hyderabad, India)(IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2586–2591. http://dl.acm.org/citation.cfm?id=1625275. 1625692

work page 2007

[50] [50]

Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, ...

work page 2023

[51] [51]

Dorsa Sadigh, Anca D Dragan, Shankar Sastry, and Sanjit A Seshia. 2017. Active preference-based learning of reward functions. InRobotics: Science and systems

work page 2017

[52] [52]

Sadigh, S

D. Sadigh, S. Sastry, S. Seshia, and Anca D. Dragan. 2016. Planning for Au- tonomous Cars that Leverage Effects on Human Actions. InRobotics: Science and Systems

work page 2016

[53] [53]

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. 2022. Correcting Robot Plans with Natural Language Feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, Kris Hauser, Dylan A. Shell, and Shoudong Huang (Eds.). doi:10.15607/RSS....

work page doi:10.15607/rss.2022.xviii.065 2022

[54] [54]

Siebinga, A

O. Siebinga, A. Zgonnikov, and D. Abbink. 2022. Interactive Merging Behavior in a Coupled Driving Simulator: Experimental Framework and Case Study. In Human Factors in Transportation. AHFE 2022 International Conference (AHFE Open Access, Vol. 60), Katie Plant and Gesa Praetorius (Eds.). AHFE International, USA. doi:10.54941/ahfe1002485

work page doi:10.54941/ahfe1002485 2022

[55] [55]

Sripathy, A

A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D. S. Brown, and A. D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. InInterna- tional Conference on Intelligent Robots and Systems (IROS)

work page 2022

[56] [56]

Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor

Simon Stepputtis, Joseph Campbell, Mariano J. Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor. 2020. Language-Conditioned Imitation Learning for Robot Manipulation Tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle...

work page arXiv 2020

[57] [57]

Maia Stiber, Russell Taylor, and Chien-Ming Huang. 2022. Modeling human response to robot errors for timely error detection. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 676–683

work page 2022

[58] [58]

Sumers, Mark K

Theodore R. Sumers, Mark K. Ho, Robert X. D. Hawkins, Karthik Narasimhan, and Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.CoRR abs/2009.14715 (2020). arXiv:2009.14715 https://arxiv.org/abs/2009.14715

work page arXiv 2020

[59] [59]

Tauhid Tanjim, Jonathan St George, Kevin Ching, and Angelique Taylor. 2025. Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams.arXiv preprint arXiv:2506.08892(2025)

work page arXiv 2025

[60] [60]

Yiran Tao, Jehan Yang, Dan Ding, and Zackory Erickson. 2025. LAMS: LLM- Driven Automatic Mode Switching for Assistive Teleoperation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 242– 251

work page 2025

[61] [61]

Walter, Ashis Gopal Banerjee, Seth J

Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. InProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011...

work page doi:10.1609/aaai.v25i1.7979 2011

[62] [62]

Raphael Vallat. 2018. Pingouin: statistics in Python.Journal of Open Source Software3, 31 (Nov. 2018), 1026. doi:10.21105/joss.01026

work page doi:10.21105/joss.01026 2018

[63] [63]

Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, and Chelsea Finn. 2019. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning.CoRR abs/1805.12573 (2019). arXiv:1805.12573 http://arxiv.org/abs/1805.12573

work page arXiv 2019

[64] [64]

Russell, Anca Dragan, and Erdem Bıyık

Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, and Erdem Bıyık. 2024. Trajectory Improvement and Reward Learning from Comparative Language Feedback. arXiv:2410.06401 [cs.RO] https://arxiv.org/abs/2410.06401

work page arXiv 2024

[65] [65]

Michelle Zhao, Reid Simmons, Henny Admoni, and Andrea Bajcsy. 2024. Confor- malized teleoperation: Confidently mapping human inputs to high-dimensional robot actions.arXiv preprint arXiv:2406.07767(2024)

work page arXiv 2024

[66] [66]

Ziebart, Andrew Maas, J

Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maxi- mum Entropy Inverse Reinforcement Learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3(Chicago, Illinois)(AAAI’08). AAAI Press, 1433–1438. http://dl.acm.org/citation.cfm?id=1620270.1620297

work page arXiv 2008

[67] [67]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. InIEEE Inter- national Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021. IEEE, 2783–2789. doi:10.1109/ICRA48506.2021.9561839 QuickLAP: Quick Language–Action Preference Learning for Aut...

work page doi:10.1109/icra48506.2021.9561839 2021

[68] [68]

In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention

How relevant is this feature to the intervention ? ( gate score 0.0 or 1.0) B.2 Preference Language Model (LM pref) System Message: You are an expert in autonomous vehicle control analyzing driver interventions . In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention . Your r...

work page

[69] [69]

What absolute change with direction ( this will be your 'mu') would support this intervention ? Consider the scale of the features , and the current weights

work page

[70] [70]

Be careful

How confident are you in your decision ? ( confidence score 0.0 -1.0) B.3 LLM Configuration Parameters The table below describes the parameters we used for the LLM experiments using the OpenAI API. Table 1: LLM API Configuration Settings Parameter LM att LMpref Model gpt-4o gpt-4o Temperature 0.1 0.3 Response Format JSON JSON Max Tokens default default C ...

work page arXiv 2071