pith. sign in

arxiv: 2406.19741 · v3 · submitted 2024-06-28 · 💻 cs.RO · cs.AI

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Pith reviewed 2026-05-23 23:44 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords ROSLLMembodied AIrobot programmingnatural language interfacebehavior extractionimitation learningfeedback loops
0
0 comments X

The pith

A framework integrates LLMs with ROS so non-experts can program robots through natural language chat with automatic behavior extraction and feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ROS-LLM as a system that links large language models to the Robot Operating System, letting users describe tasks in plain language. The framework pulls behaviors from LLM responses, converts them into executable ROS actions or services, and runs them in one of three modes while incorporating imitation learning and feedback loops. Experiments test the approach on long-horizon tasks, tabletop rearrangements, and remote control. A sympathetic reader would care because the work aims to remove the need for expert coding when directing robots. If the mapping step holds, everyday users gain direct control over physical systems through conversation.

Core claim

The authors claim that connecting an AI agent to open-source and commercial LLMs inside ROS enables automatic extraction of behaviors from LLM output, their execution as ROS actions or services, support for sequence, behavior tree, and state machine modes, imitation learning to enlarge the action library, and reflection on human or environment feedback, with experiments confirming the setup handles diverse robotic scenarios.

What carries the argument

Automatic extraction of behaviors from LLM output and their direct mapping to executable ROS actions/services, which turns natural language into runnable robot programs across multiple structured modes.

If this is right

  • Non-experts can specify task requirements through a chat interface without writing code.
  • The same framework supports long-horizon tasks, tabletop rearrangements, and remote supervisory control.
  • Imitation learning expands the library of available robot actions over time.
  • LLM reflection improves execution by incorporating feedback from humans and the physical environment.
  • Open-source release of the code allows others to reproduce results and extend the system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could let non-programmers deploy robots in settings such as homes or small workshops where expert coders are unavailable.
  • Similar LLM-to-action pipelines might apply to other middleware besides ROS if the extraction step generalizes.
  • Long-term use could reveal whether repeated feedback loops reduce the rate of mapping errors over successive tasks.

Load-bearing premise

The automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention or failure across varied prompts and environments.

What would settle it

A test set of new prompts and environments in which the system fails to produce correct ROS mappings from LLM outputs in more than a small fraction of trials.

Figures

Figures reproduced from arXiv: 2406.19741 by Anbang Zhai, Antoine Grosnit, Cesar Cadena, Christopher E. Mower, Daniel Palenicek, Davide Tateo, Guangjian Tian, Haitham Bou-Ammar, Hongzhan Yu, Jan Peters, Jianye Hao, Jinlong Wang, Jonas Gonzalez-Billandon, Jun Wang, Kun Shao, Marco Hutter, Matthieu Zimmer, Puze Liu, Xingyue Quan, Xinyu Zhang, Yao Zhao, Yuhui Wan, Yuzheng Zhuang.

Figure 1
Figure 1. Figure 1: Overview of a typical robotics development workflow. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our proposed ROS-LLM framework overview illustrates the integration of several com [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Real-world laboratory setup used in our experiments. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Detailed steps in the coffee-making process arranged in a modified Z-shaped flow across [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Detailed steps in the coffee-making process are depicted across twelve images: (a) picking [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sequence for the 6-cube task, depicted over ten stages: (a) starting, (b) picking up a green [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results depicting policy correction through human feedback, where orange indicates task [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Steps in the pasta-making process, depicted in five stages: (a) grating cheese, (b) pouring [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Remote supervisory control using (a) language interfaces, depicted through continuous [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experiment setup for the human study [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Weighted NASA TLX results for remote supervisory control. [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 7
Figure 7. Figure 7: These results are encouraging for the deployment of this framework in scenarios requiring com￾plex sequential task execution, highlighting its potential reliability and effectiveness in practical applications. 5.2 Enhancing policy correction via human feedback Targeted human feedback has shown the potential to mitigate this degradation by correcting erroneous policy decisions dynamically. We noticed that d… view at source ↗
Figure 12
Figure 12. Figure 12: The setup for the Robot Air Hockey Challenge with two KUKA IIWA robot arms. [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
read the original abstract

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents ROS-LLM, a framework integrating large language models with ROS to enable non-experts to program robots via natural language chat. It features an AI agent supporting multiple LLMs, automatic extraction of LLM outputs into one of three behavior representations (sequence, behavior tree, state machine) for mapping to ROS actions/services, imitation learning to extend the action library, and LLM reflection using human/environment feedback. The authors claim extensive experiments demonstrate robustness, scalability, and versatility across long-horizon tasks, tabletop rearrangements, and remote supervisory control, with open-source code provided for reproduction.

Significance. If the automatic extraction and execution pipeline proves reliable with minimal intervention, the framework could lower barriers to embodied AI deployment by combining structured reasoning modes with ROS primitives. The open-source release is a clear strength for reproducibility. However, without quantitative validation the practical significance remains difficult to assess against prior ROS-LLM integrations.

major comments (2)
  1. [Abstract] Abstract: the claim that 'extensive experiments validate the framework, showcasing robustness, scalability, and versatility' is unsupported by any reported metrics (success rates, parsing failure counts, human intervention frequency, or baselines). This directly affects the central assertion that automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention.
  2. [Key features / Experimental validation] The description of the extraction and reflection mechanism (key features paragraph) asserts reliable mapping across prompt variations and environments, yet no quantitative evidence (e.g., extraction accuracy, retry counts, or task completion rates) is supplied for the long-horizon or tabletop scenarios. This leaves the 'intuitive for non-experts' requirement unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and agree that revisions are required to ensure claims are supported by the presented material.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'extensive experiments validate the framework, showcasing robustness, scalability, and versatility' is unsupported by any reported metrics (success rates, parsing failure counts, human intervention frequency, or baselines). This directly affects the central assertion that automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention.

    Authors: We agree that the abstract claim is not supported by quantitative metrics, as the manuscript presents only qualitative demonstrations of the framework in long-horizon tasks, tabletop rearrangements, and remote control scenarios. We will revise the abstract to describe these as illustrative examples of the framework's capabilities rather than claiming validation of robustness, scalability, or versatility through metrics. The central assertion about reliable mapping will also be qualified to reflect the absence of such data. revision: yes

  2. Referee: [Key features / Experimental validation] The description of the extraction and reflection mechanism (key features paragraph) asserts reliable mapping across prompt variations and environments, yet no quantitative evidence (e.g., extraction accuracy, retry counts, or task completion rates) is supplied for the long-horizon or tabletop scenarios. This leaves the 'intuitive for non-experts' requirement unverified.

    Authors: We acknowledge that assertions of reliable mapping and intuitiveness for non-experts in the key features and experimental sections lack supporting quantitative evidence such as accuracy rates or intervention counts. The current text relies on descriptive examples. We will revise these sections to remove or qualify claims of reliability and to clarify that the non-expert usability is a design goal illustrated by the chat interface and behavior modes, without empirical verification in the manuscript. A limitations discussion will be added if appropriate. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering integration validated externally

full rationale

The paper describes a software framework integrating ROS with LLMs for robot task execution via natural language. No mathematical derivations, fitted parameters, or equations are present. Validation relies on external robot performance in experiments (long-horizon tasks, rearrangements, remote control), not on self-referential definitions or self-citation chains. The extraction and mapping steps are implementation details whose reliability is claimed to be measured by system behavior, not reduced to inputs by construction. This matches the default non-circular case for systems papers.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework depends on the assumption that LLMs produce parsable structured outputs that map cleanly to ROS primitives and that imitation learning can reliably add new actions.

axioms (2)
  • domain assumption LLM-generated text can be automatically parsed into valid sequence, behavior tree, or state machine structures that execute correctly on ROS.
    Invoked in the description of automatic extraction of behaviors from LLM output.
  • standard math ROS actions and services provide a stable interface for execution and feedback collection.
    Background assumption of the entire integration layer.

pith-pipeline@v0.9.0 · 5809 in / 1214 out tokens · 20641 ms · 2026-05-23T23:44:25.062908+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

    cs.RO 2026-04 unverdicted novelty 6.0

    Backdoor attacks aligned with JSON command formats in LLM robot controllers achieve 83% attack success rate while preserving over 93% clean accuracy and sub-second latency.

  2. ORICF -- Open Robotics Inference and Control Framework

    cs.RO 2026-05 unverdicted novelty 5.0

    ORICF is a declarative, model-agnostic robotics framework with YAML specs and edge offloading that reduces robot compute utilization by up to 83% and energy by 66% in a ROS2 demo combining ASR, LLM, and CNN.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    Barto and Sridhar Mahadevan

    Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341–379, Oct 2003

  2. [2]

    Trac-ik: An open-source library for improved solving of generic inverse kinematics

    Patrick Beeson and Barrett Ames. Trac-ik: An open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, 2015

  3. [3]

    Rt-h: Action hierarchies using language, 2024

    Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, and Dorsa Sadigh. Rt-h: Action hierarchies using language, 2024

  4. [4]

    Calin Belta, Antonio Bicchi, Magnus Egerstedt, Emilio Frazzoli, Eric Klavins, and George J. Pappas. Symbolic planning and control of robot motion [grand challenges of robotics]. IEEE Robotics & Automation Magazine, 14(1):61–70, 2007

  5. [5]

    Bruyninckx

    H. Bruyninckx. Open robot control software: the orocos project. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164) , volume 3, pages 2523–2528 vol.3, 2001

  6. [6]

    Yue Cao and C. S. George Lee. Robot behavior-tree-based task generation with large language models, 2023

  7. [7]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  8. [8]

    ros_control: A generic and simple control framework for ros

    Sachin Chitta, Eitan Marder-Eppstein, Wim Meeussen, Vijay Pradeep, Adolfo Rodríguez Tsouroukdissian, Jonathan Bohren, David Coleman, Bence Magyar, Gennaro Raiola, Mathias Lüdtke, and Enrique Fernandez Perdomo. ros_control: A generic and simple control framework for ros. Journal of Open Source Software, 2(20):456, 2017

  9. [9]

    Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study

    David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a moveit! case study. arXiv preprint arXiv:1404.3785, 2014

  10. [10]

    Everett and Alexander H

    John G. Everett and Alexander H. Slocum. Automation and robotics opportunities: Construction versus manufacturing. Journal of Construction Engineering and Management , 120(2):443–452, 1994

  11. [11]

    tf: The transform library

    Tully Foote. tf: The transform library. In 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA), pages 1–6, 2013

  12. [12]

    Mathematical capabilities of chatgpt, 2023

    Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, and Julius Berner. Mathematical capabilities of chatgpt, 2023

  13. [13]

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y . Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024

  14. [14]

    Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

    Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

  15. [15]

    NASA task load index (TLX)

    Sandra G Hart. NASA task load index (TLX). 1986

  16. [16]

    Huang, Edwin Olson, and David C

    Albert S. Huang, Edwin Olson, and David C. Moore. Lcm: Lightweight communications and marshalling. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 4057–4062, 2010

  17. [17]

    Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

    Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

  18. [18]

    Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

    Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, and Brian Ichter. Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

  19. [19]

    Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles

    Martin Huber, Christopher E. Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles. Lbr-stack: Ros 2 and python integration of kuka fri for med and iiwa robots, 2024

  20. [20]

    Multimodal detection and classification of robot manipulation failures

    Arda Inceoglu, Eren Erdal Aksoy, and Sanem Sariel. Multimodal detection and classification of robot manipulation failures. IEEE Robotics and Automation Letters , 9(2):1396–1403, 2024

  21. [21]

    A survey of behavior trees in robotics and ai

    Matteo Iovino, Edvards Scukins, Jonathan Styrud, Petter Ögren, and Christian Smith. A survey of behavior trees in robotics and ai. Robotics and Autonomous Systems , 154:104096, 2022

  22. [22]

    Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

    Riccardo Andrea Izzo, Gianluca Bardaro, and Matteo Matteucci. Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

  23. [23]

    Vima: General robot manipulation with multimodal prompts, 2023

    Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts, 2023

  24. [24]

    HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution

    Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, and Jimmy Lin. HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution. arXiv:2307.16883, 2023

  25. [25]

    Understanding large-language model (llm)- powered human-robot interaction

    Callie Y Kim, Christine P Lee, and Bilge Mutlu. Understanding large-language model (llm)- powered human-robot interaction. In Proceedings of the 2024 ACM/IEEE International Confer- ence on Human-Robot Interaction , pages 371–380, 2024. 23

  26. [26]

    Design and use paradigms for gazebo, an open-source multi-robot simulator

    Nathan Koenig and Andrew Howard. Design and use paradigms for gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 2149–2154, Sendai, Japan, Sep 2004

  27. [27]

    Language models as zero-shot trajectory generators, 2023

    Teyun Kwon, Norman Di Palo, and Edward Johns. Language models as zero-shot trajectory generators, 2023

  28. [28]

    Chain of code: Reasoning with a language model-augmented code emulator, 2023

    Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. Chain of code: Reasoning with a language model-augmented code emulator, 2023

  29. [29]

    Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

    Xinrui Lin, Yangfan Wu, Huanyu Yang, Yu Zhang, Yanyong Zhang, and Jianmin Ji. Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

  30. [30]

    Interactive robot learning from verbal correction, 2023

    Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, and Ching-An Cheng. Interactive robot learning from verbal correction, 2023

  31. [31]

    Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

    Artem Lykov and Dzmitry Tsetserukou. Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

  32. [32]

    Robot op- erating system 2: Design, architecture, and uses in the wild

    Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot op- erating system 2: Design, architecture, and uses in the wild. Science Robotics, 7(66):eabm6074, 2022

  33. [33]

    Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

    Yusuke Mikami, Andrew Melnik, Jun Miura, and Ville Hautamäki. Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

  34. [34]

    Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction

    Christopher Mower, Theodoros Stouraitis, Joao Moura, Christian Rauch, Lei Yan, Nazanin Za- mani Behabadi, Michael Gienger, Tom Vercauteren, Christos Bergeles, and Sethu Vijayakumar. Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction. In Conference on Robot Learning, pages 1411–1423. PMLR, 2023

  35. [35]

    Skill-based Shared Control

    Christopher E Mower, Joao Moura, and Sethu Vijayakumar. Skill-based Shared Control. In Proceedings of Robotics: Science and Systems , Virtual, July 2021

  36. [36]

    Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles

    Christopher E. Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles. Optas: An optimization-based task specification library for trajectory optimization and model predictive control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 9118–9124, 2023

  37. [37]

    Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

    Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

  38. [38]

    Apriltag: A robust and flexible visual fiducial system

    Edwin Olson. Apriltag: A robust and flexible visual fiducial system. In2011 IEEE International Conference on Robotics and Automation , pages 3400–3407, 2011

  39. [39]

    OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Bern...

  40. [40]

    Murillo, and Mac Schwager

    Pablo Pueyo, Eduardo Montijano, Ana C. Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models, 2024

  41. [41]

    Ros: an open-source robot operating system

    Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, Andrew Y Ng, et al. Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe, Japan, 2009

  42. [42]

    Robust speech recognition via large-scale weak supervision

    Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023

  43. [43]

    Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn

    Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn. Yell at your robot: Improving on-the-fly from language corrections, 2024

  44. [44]

    Tenenbaum, Leslie Pack Kaelbling, and Michael Katz

    Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Pack Kaelbling, and Michael Katz. Generalized planning in pddl domains with pretrained large language models, 2023

  45. [45]

    Progprompt: Generating situated robot task plans using large language models, 2022

    Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models, 2022

  46. [46]

    Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Senevi- ratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Schärli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Agüera y Arcas, Dale Webster, Greg S. Corrad...

  47. [47]

    R. Smits. KDL: Kinematics and Dynamics Library. http://www.orocos.org/kdl

  48. [48]

    Llm-planner: Few-shot grounded planning for embodied agents with large language models

    Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 2998–3009, 2023. 25

  49. [49]

    To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

    Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, and Michael Gienger. To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

  50. [50]

    Llama: Open and efficient foundation language models, 2023

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023

  51. [51]

    Trinh, Yuhuai Wu, Quoc V

    Trieu H. Trinh, Yuhuai Wu, Quoc V . Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, Jan 2024

  52. [52]

    Why can large language models generate correct chain-of-thoughts? 2023

    Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, and Haitham Bou-Ammar. Why can large language models generate correct chain-of-thoughts? 2023

  53. [53]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

  54. [54]

    Performance and usability evaluation scheme for mobile manipulator teleopera- tion

    Yuhui Wan, Jingcheng Sun, Christopher Peers, Joseph Humphreys, Dimitrios Kanoulas, and Chengxu Zhou. Performance and usability evaluation scheme for mobile manipulator teleopera- tion. IEEE Transactions on Human-Machine Systems, 2023

  55. [55]

    Llm granularity for on-the-fly robot control, 2024

    Peng Wang, Mattia Robbiani, and Zhihao Guo. Llm granularity for on-the-fly robot control, 2024

  56. [56]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. 2022

  57. [57]

    Large language models for verifiable sequential decision-making in autonomous systems

    Yunhao Yang, Jean-Raphael Gaglione, Cyrus Neary, et al. Large language models for verifiable sequential decision-making in autonomous systems. In 2nd Workshop on Language and Robot Learning: Language as Grounding , 2023

  58. [58]

    Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew R. Walter. Statler: State-maintaining language models for embodied reasoning, 2023

  59. [59]

    Socratic models: Composing zero-shot multimodal reasoning with language, 2022

    Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, and Pete Florence. Socratic models: Composing zero-shot multimodal reasoning with language, 2022. 26