QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents
Pith reviewed 2026-05-21 17:54 UTC · model grok-4.3
The pith
QuickLAP treats language as a probabilistic observation of latent preferences to fuse with physical corrections in a closed-form Bayesian update for real-time reward learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that language can be modeled as a probabilistic observation over the user's latent reward preferences, allowing a Bayesian update that combines LLM-parsed attention masks and preference shifts with physical corrections to infer accurate reward functions quickly and robustly, achieving over 70 percent lower learning error than single-modality or heuristic baselines.
What carries the argument
The closed-form Bayesian update rule that treats language-derived reward feature attention masks and preference shifts as probabilistic observations over latent preferences.
If this is right
- Semi-autonomous agents can adapt their behavior in real time to ambiguous multimodal feedback without requiring extensive physical demonstrations.
- The learned reward functions produce trajectories that users rate as more understandable and collaborative.
- Preference shifts expressed in language can be directly incorporated into ongoing physical correction updates.
- The framework scales to handling mixed feedback in dynamic environments like driving simulators.
Where Pith is reading between the lines
- The same fusion approach could apply to other domains such as robotic manipulation where language clarifies goals during physical guidance.
- Reducing reliance on purely physical feedback might lower the cognitive load on human operators in long sessions.
- If LLM extraction quality improves over time, the method could generalize to less structured language without retraining the Bayesian core.
Load-bearing premise
Large language models can reliably extract accurate reward feature attention masks and preference shifts from free-form user utterances without introducing substantial bias or error.
What would settle it
An experiment in the same driving simulator where LLM extractions from utterances are deliberately noisy or biased, resulting in reward learning error no lower than physical-only baselines.
Figures
read the original abstract
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces QuickLAP, a Bayesian framework for real-time reward learning in semi-autonomous agents that fuses physical corrections with language feedback. LLMs extract reward feature attention masks and preference shifts from free-form utterances, which are integrated via a closed-form update rule. In a semi-autonomous driving simulator, it reports over 70% reduction in reward learning error versus physical-only and heuristic multimodal baselines. A 15-participant user study finds the approach more understandable, collaborative, and preferable, with code released at a GitHub repository.
Significance. If the LLM extraction step proves reliable, the work offers a practical advance in multimodal preference learning for human-robot interaction, enabling faster and more natural reward inference than unimodal baselines. The closed-form update and public code are strengths that support reproducibility and potential adoption. The significance is limited by the absence of quantified validation for the LLM component, which directly affects whether the reported error reductions generalize.
major comments (2)
- [Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).
- [Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.
minor comments (2)
- [User Study] The user-study section would benefit from explicit reporting of statistical tests (e.g., p-values or effect sizes) and the precise wording of preference questions to allow independent assessment of the qualitative findings.
- [Notation and Preliminaries] Notation for the attention mask and preference shift variables should be defined once in the main text with a clear mapping to the LLM prompt template.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments on our manuscript. We have reviewed the major comments concerning the evaluation of the LLM component and the robustness of the update rule. We provide detailed responses below and will make revisions to address these points.
read point-by-point responses
-
Referee: [Evaluation and Methods] The central performance claim (over 70% error reduction) depends on treating LLM outputs as reliable probabilistic observations in the Bayesian update. No per-utterance accuracy metrics, human inter-annotator agreement, or sensitivity analysis on mask noise propagation appear in the evaluation; without these, it is unclear whether the reported gains hold under realistic utterance ambiguity (e.g., safety vs. comfort trade-offs).
Authors: We agree with the referee that additional validation of the LLM extraction step would strengthen the paper. Although our simulator experiments and user study demonstrate the overall benefits of the multimodal fusion, we did not include direct metrics on LLM accuracy in the original submission. In the revised version, we will add per-utterance accuracy metrics by annotating a set of utterances with human labels for feature attention masks and preference shifts, and report agreement with LLM outputs. We will also include inter-annotator agreement scores and a sensitivity analysis showing how noise in the masks affects the learning error. This will address concerns about utterance ambiguity. revision: yes
-
Referee: [Framework and Update Rule] The closed-form update rule integrates LLM-derived attention masks and preference shifts directly as observations. A concrete test of robustness—such as injecting controlled noise into the masks and measuring posterior shift—is missing, making it difficult to bound how LLM variance would affect the posterior mean and the claimed improvement over baselines.
Authors: We acknowledge that a specific robustness test for the closed-form update is valuable. To bound the effect of LLM variance, we will add an experiment in the revised manuscript that injects controlled noise into the LLM-derived masks and shifts. We will vary the noise level and report the resulting changes to the posterior mean and the reward learning error compared to baselines. This will provide quantitative bounds on how LLM inaccuracies propagate through the Bayesian update. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents a Bayesian framework whose central step is a closed-form update rule that treats LLM-extracted attention masks and preference shifts as probabilistic observations to be fused with physical corrections. This update is derived from standard Bayesian inference rather than being defined in terms of the target performance metric. Empirical claims of 70% error reduction are obtained from a separate simulator evaluation against baselines and from a 15-participant user study; neither quantity is obtained by fitting parameters to the same data used to declare success nor by renaming an input as a prediction. No load-bearing self-citation, uniqueness theorem, or ansatz-smuggling step is required for the derivation to hold. The framework is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language can be treated as a probabilistic observation over the user's latent preferences.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the closed-form update: ˆθ_{t+1,i} = ˆθ_{t,i} + σ²_{L,i} ΔΦ_i + μ_t,i / (Λ_prior,i σ²_{L,i} + 1)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. InMachine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (ACM International Conference Proceeding Series, Vol. 69), Carla E. Brodley (Ed.). ACM. doi:10.1145/1015330.1015430
-
[2]
Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review.Journal of Human-Robot Interaction6, 1 (2017), 25–63
work page 2017
-
[3]
Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. InPro- ceedings of the 2018 ACM/IEEE International Conference on Human-Robot In- teraction(Chicago, IL, USA)(HRI ’18). ACM, New York, NY, USA, 141–149. doi:10.1145/3171221.3171267
-
[4]
Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2017. Learning Robot Objectives from Physical Human Interaction. InProceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78), Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (Eds.). PMLR, 217–226. http://proceedings.mlr.press/v78/...
work page 2017
-
[5]
Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29
work page 2007
- [6]
-
[7]
A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan. 2020. Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstra- tions and Physical Corrections.Transactions on Robotics (T-RO)(2020)
work page 2020
-
[8]
A. Bobu, A. Bajcsy, J. F. Fisac, and A. D. Dragan. 2018. Learning under Misspecified Objective Spaces. InConference on Robot Learning (CoRL)
work page 2018
-
[9]
Landon Brown, Jared Hamilton, Zhao Han, Albert Phan, Thao Phung, Eric Hansen, Nhan Tran, and Tom Williams. 2023. Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic Gestures.J. Hum.-Robot Interact.12, 1, Article 9 (Feb. 2023), 23 pages. doi:10.1145/3563387
-
[10]
Arthur Bucker, Luis F. C. Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala, and Rogerio Bonatti. 2023. LATTE: LAnguage Trajectory TransformEr. InIEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023. IEEE, 7287–7294. doi:10.1109/ICRA48891. 2023.10161068
-
[11]
Kate Candon, Nicholas C Georgiou, Helen Zhou, Sidney Richardson, Qiping Zhang, Brian Scassellati, and Marynel Vázquez. 2024. REACT: Two datasets for analyzing both human reactions and evaluative feedback to robots over time. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 885–889
work page 2024
-
[12]
Vanya Cohen, Geraud Nangue Tasse, Nakul Gopalan, Steven James, Matthew C. Gombolay, and Benjamin Rosman. 2021. Learning to Follow Language Instruc- tions with Compositional Policies.CoRRabs/2110.04647 (2021). arXiv:2110.04647 https://arxiv.org/abs/2110.04647
-
[13]
Maggie A Collier, Rithika Narayan, and Henny Admoni. 2025. The sense of agency in assistive robotics using shared autonomy. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 880–888
work page 2025
-
[14]
Y. Cui, S. Karamcheti, R. Palleti, N. Shivakumar, P. Liang, and D. Sadigh. 2023. No, to the Right: Online Language Corrections for Robotic Manipulation via Shared Autonomy. InProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction(Stockholm, Sweden)(HRI ’23). Association for Com- puting Machinery, New York, NY, USA, 93–101. do...
-
[15]
Yuchen Cui, Qiping Zhang, Brad Knox, Alessandro Allievi, Peter Stone, and Scott Niekum. 2021. The empathic framework for task learning from implicit human feedback. InConference on Robot Learning. PMLR, 604–626
work page 2021
-
[16]
Nathaniel Dennler, Stefanos Nikolaidis, and Maja Matarić. 2025. Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Prefer- ence Elicitation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). 778–788. doi:10.1109/HRI61500.2025.10974136
- [17]
-
[18]
Nathaniel Dennler, Catherine Yunis, Jonathan Realmuto, Terence Sanger, Ste- fanos Nikolaidis, and Maja Matarić. 2021. Personalizing user engagement dy- namics in a non-verbal communication game for cerebral palsy. In2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). IEEE, 873–879
work page 2021
-
[19]
A. D. Dragan, K. Muelling, J. Andrew Bagnell, and S. S. Srinivasa. 2015. Movement primitives via optimization. In2015 IEEE International Conference on Robotics and Automation (ICRA). 2339–2346. doi:10.1109/ICRA.2015.7139510
-
[20]
Tesca Fitzgerald, Pallavi Koppol, Patrick Callaghan, Russell Quinlan Jun Hei Wong, Reid Simmons, Oliver Kroemer, and Henny Admoni. 2022. INQUIRE: INteractive querying for user-aware informative REasoning. In6th Annual Con- ference on Robot Learning
work page 2022
-
[21]
Carlos E. García, David M. Prett, and Manfred Morari. 1989. Model predictive control: Theory and practice—A survey.Automatica25, 3 (1989), 335 – 348. doi:10.1016/0005-1098(89)90002-2
-
[22]
Michael Hagenow and Julie A. Shah. 2025. REALM: Real-Time Estimates of Assistance for Learned Models in Human-Robot Interaction.IEEE Robotics and Automation Letters10, 6 (2025), 5473–5480. doi:10.1109/LRA.2025.3560862
-
[23]
Erin Hedlund-Botti, Julianna Schalkwyk, Nina Moorman, Sanne van Waveren, Lakshmi Seelam, Chuxuan Yang, Russell Perkins, Paul Robinette, and Matthew Gombolay. 2025. Learning Interpretable Features from Interventions. InRobotics: Science and Systems (RSS)
work page 2025
-
[24]
Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Lan- guage Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, ...
work page 2022
-
[25]
Curtis M. Humphrey and Julie A. Adams. 2008. Compass visualizations for human-robotic interaction. InProceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction(Amsterdam, The Netherlands)(HRI ’08). Association for Computing Machinery, New York, NY, USA, 49–56. doi:10.1145/ 1349822.1349830
-
[26]
Brian Ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar Co...
work page 2022
-
[27]
Ashesh Jain, Shikhar Sharma, Thorsten Joachims, and Ashutosh Saxena. 2015. Learning preferences for manipulation tasks from online coactive feedback.Int. J. Robotics Res.34, 10 (2015), 1296–1313. doi:10.1177/0278364915581193
-
[28]
Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, and Tapomayukh Bhattacharjee. 2025. FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization. InRobotics: Science and Systems (RSS)
work page 2025
-
[29]
Emily Jensen, Sriram Sankaranarayanan, and Bradley Hayes. 2024. Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomo- tor Skills Training Outcomes in Quadrotor Teleoperation. InProceedings of the 12th International Conference on Human-Agent Interaction. 185–194
work page 2024
-
[30]
Siddharth Karamcheti, Megha Srivastava, Percy Liang, and Dorsa Sadigh. 2021. LILA: Language-Informed Latent Actions. InConference on Robot Learning, 8-11 November 2021, London, UK (Proceedings of Machine Learning Research, Vol. 164), Aleksandra Faust, David Hsu, and Gerhard Neumann (Eds.). PMLR, 1379–1390. https://proceedings.mlr.press/v164/karamcheti22a.html
work page 2021
-
[31]
W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. InProceedings of the 5th International Conference on Knowledge Capture (K-CAP 2009), September 1-4, 2009, Redondo Beach, California, USA, Yolanda Gil and Natasha Fridman Noy (Eds.). ACM, 9–16. doi:10.1145/1597735.1597738
-
[32]
Kimin Lee, Laura M. Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and T...
work page 2021
-
[33]
http://proceedings.mlr.press/v139/lee21i.html
-
[34]
Anthony Liang, Jesse Thomason, and Erdem Bıyık. 2024. Visarl: Visual rein- forcement learning guided by human saliency. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2907–2912. Abi Nader et al
work page 2024
-
[35]
Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. 2023. Safe planning in dynamic environments using conformal prediction.IEEE Robot- ics and Automation Letters(2023)
work page 2023
-
[36]
Dylan P Losey, Andrea Bajcsy, Marcia K O’Malley, and Anca D Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections.The International Journal of Robotics Research41, 1 (2022), 20–44
work page 2022
-
[37]
Corey Lynch and Pierre Sermanet. 2021. Language Conditioned Imitation Learning Over Unstructured Data. InRobotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, Dylan A. Shell, Marc Toussaint, and M. Ani Hsieh (Eds.). doi:10.15607/RSS.2021.XVII.047
-
[38]
Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, and Pete Florence. 2022. Interactive Language: Talking to Robots in Real Time.CoRRabs/2210.06407 (2022). arXiv:2210.06407 doi:10.48550/ARXIV.2210.06407
-
[39]
Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L
James MacGlashan, Mark K. Ho, Robert Tyler Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. 2017. Interactive Learning from Policy-Dependent Human Feedback. InProceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Re...
work page 2017
-
[40]
Christoforos Mavrogiannis, Francesca Baldini, Allan Wang, Dapeng Zhao, Pete Trautman, Aaron Steinfeld, and Jean Oh. 2023. Core challenges of social robot navigation: A survey.ACM Transactions on Human-Robot Interaction12, 3 (2023), 1–39
work page 2023
-
[41]
Shaunak A. Mehta and Dylan P. Losey. 2024. Unified Learning from Demonstra- tions, Corrections, and Preferences during Physical Human-Robot Interaction. ACM Trans. Hum. Robot Interact.13, 3 (2024), 39:1–39:25. doi:10.1145/3623384
-
[42]
Amal Nanavati, Ethan K Gordon, Taylor A Kessler Faulkner, Yuxin Ray Song, Jonathan Ko, Tyler Schrenk, Vy Nguyen, Bernie Hao Zhu, Haya Bolotski, Atharva Kashyap, et al. 2025. Lessons Learned from Designing and Evaluating a Robot- Assisted Feeding System for Out-of-Lab Use. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE...
work page 2025
-
[43]
Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K Gupta, and Ste- fanos Nikolaidis. 2023. Transfer learning of human preferences for proactive robot assistance in assembly tasks. InProceedings of the 2023 ACM/IEEE Interna- tional Conference on Human-Robot Interaction. 575–583
work page 2023
-
[44]
Andrew Y. Ng and Stuart Russell. 2000. Algorithms for Inverse Reinforcement Learning. InProceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, Pat Langley (Ed.). Morgan Kaufmann, 663–670
work page 2000
-
[45]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human f...
work page 2022
- [46]
-
[47]
Andi Peng, Andreea Bobu, Belinda Z Li, Theodore R Sumers, Ilia Sucholutsky, Nis- hanth Kumar, Thomas L Griffiths, and Julie A Shah. 2024. Preference-Conditioned Language-Guided Abstraction. InProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. 572–581
work page 2024
-
[48]
Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu
Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, and Andreea Bobu. 2024. Adaptive Language-Guided Abstraction from Contrastive Explanations. InConference on Robot Learning, 6-9 November 2024, Munich, Germany (Proceedings of Machine Learning Research, Vol. 270), Pulkit Agrawal, Oliver Kroemer, and Wolfram Burgard (Eds....
work page 2024
-
[49]
Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning. InProceedings of the 20th International Joint Conference on Artifical Intelligence(Hyderabad, India)(IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2586–2591. http://dl.acm.org/citation.cfm?id=1625275. 1625692
work page 2007
-
[50]
Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, and Anirudha Majumdar. 2023. Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners. InConference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, ...
work page 2023
-
[51]
Dorsa Sadigh, Anca D Dragan, Shankar Sastry, and Sanjit A Seshia. 2017. Active preference-based learning of reward functions. InRobotics: Science and systems
work page 2017
- [52]
-
[53]
Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, and Dieter Fox. 2022. Correcting Robot Plans with Natural Language Feedback. InRobotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, Kris Hauser, Dylan A. Shell, and Shoudong Huang (Eds.). doi:10.15607/RSS....
-
[54]
O. Siebinga, A. Zgonnikov, and D. Abbink. 2022. Interactive Merging Behavior in a Coupled Driving Simulator: Experimental Framework and Case Study. In Human Factors in Transportation. AHFE 2022 International Conference (AHFE Open Access, Vol. 60), Katie Plant and Gesa Praetorius (Eds.). AHFE International, USA. doi:10.54941/ahfe1002485
-
[55]
A. Sripathy, A. Bobu, Z. Li, K. Sreenath, D. S. Brown, and A. D. Dragan. 2022. Teaching Robots to Span the Space of Functional Expressive Motion. InInterna- tional Conference on Intelligent Robots and Systems (IROS)
work page 2022
-
[56]
Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor
Simon Stepputtis, Joseph Campbell, Mariano J. Phielipp, Stefan Lee, Chitta Baral, and Heni Ben Amor. 2020. Language-Conditioned Imitation Learning for Robot Manipulation Tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle...
-
[57]
Maia Stiber, Russell Taylor, and Chien-Ming Huang. 2022. Modeling human response to robot errors for timely error detection. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 676–683
work page 2022
-
[58]
Theodore R. Sumers, Mark K. Ho, Robert X. D. Hawkins, Karthik Narasimhan, and Thomas L. Griffiths. 2020. Learning Rewards from Linguistic Feedback.CoRR abs/2009.14715 (2020). arXiv:2009.14715 https://arxiv.org/abs/2009.14715
- [59]
-
[60]
Yiran Tao, Jehan Yang, Dan Ding, and Zackory Erickson. 2025. LAMS: LLM- Driven Automatic Mode Switching for Assistive Teleoperation. In2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 242– 251
work page 2025
-
[61]
Walter, Ashis Gopal Banerjee, Seth J
Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. InProceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7-11, 2011...
-
[62]
Raphael Vallat. 2018. Pingouin: statistics in Python.Journal of Open Source Software3, 31 (Nov. 2018), 1026. doi:10.21105/joss.01026
- [63]
-
[64]
Russell, Anca Dragan, and Erdem Bıyık
Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, and Erdem Bıyık. 2024. Trajectory Improvement and Reward Learning from Comparative Language Feedback. arXiv:2410.06401 [cs.RO] https://arxiv.org/abs/2410.06401
- [65]
-
[66]
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maxi- mum Entropy Inverse Reinforcement Learning. InProceedings of the 23rd National Conference on Artificial Intelligence - Volume 3(Chicago, Illinois)(AAAI’08). AAAI Press, 1433–1438. http://dl.acm.org/citation.cfm?id=1620270.1620297
-
[67]
Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg
Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. InIEEE Inter- national Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021. IEEE, 2783–2789. doi:10.1109/ICRA48506.2021.9561839 QuickLAP: Quick Language–Action Preference Learning for Aut...
-
[68]
How relevant is this feature to the intervention ? ( gate score 0.0 or 1.0) B.2 Preference Language Model (LM pref) System Message: You are an expert in autonomous vehicle control analyzing driver interventions . In this task , a human driver has intervened to correct the behavior of a robot car and has provided an explanation of the intervention . Your r...
-
[69]
What absolute change with direction ( this will be your 'mu') would support this intervention ? Consider the scale of the features , and the current weights
-
[70]
How confident are you in your decision ? ( confidence score 0.0 -1.0) B.3 LLM Configuration Parameters The table below describes the parameters we used for the LLM experiments using the OpenAI API. Table 1: LLM API Configuration Settings Parameter LM att LMpref Model gpt-4o gpt-4o Temperature 0.1 0.3 Response Format JSON JSON Max Tokens default default C ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.