ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Anbang Zhai; Antoine Grosnit; Cesar Cadena; Christopher E. Mower; Daniel Palenicek; Davide Tateo; Guangjian Tian; Haitham Bou-Ammar; Hongzhan Yu; Jan Peters

arxiv: 2406.19741 · v3 · submitted 2024-06-28 · 💻 cs.RO · cs.AI

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Christopher E. Mower , Yuhui Wan , Hongzhan Yu , Antoine Grosnit , Jonas Gonzalez-Billandon , Matthieu Zimmer , Jinlong Wang , Xinyu Zhang

show 15 more authors

Yao Zhao Anbang Zhai Puze Liu Daniel Palenicek Davide Tateo Cesar Cadena Marco Hutter Jan Peters Guangjian Tian Yuzheng Zhuang Kun Shao Xingyue Quan Jianye Hao Jun Wang Haitham Bou-Ammar

This is my paper

Pith reviewed 2026-05-23 23:44 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords ROSLLMembodied AIrobot programmingnatural language interfacebehavior extractionimitation learningfeedback loops

0 comments

The pith

A framework integrates LLMs with ROS so non-experts can program robots through natural language chat with automatic behavior extraction and feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ROS-LLM as a system that links large language models to the Robot Operating System, letting users describe tasks in plain language. The framework pulls behaviors from LLM responses, converts them into executable ROS actions or services, and runs them in one of three modes while incorporating imitation learning and feedback loops. Experiments test the approach on long-horizon tasks, tabletop rearrangements, and remote control. A sympathetic reader would care because the work aims to remove the need for expert coding when directing robots. If the mapping step holds, everyday users gain direct control over physical systems through conversation.

Core claim

The authors claim that connecting an AI agent to open-source and commercial LLMs inside ROS enables automatic extraction of behaviors from LLM output, their execution as ROS actions or services, support for sequence, behavior tree, and state machine modes, imitation learning to enlarge the action library, and reflection on human or environment feedback, with experiments confirming the setup handles diverse robotic scenarios.

What carries the argument

Automatic extraction of behaviors from LLM output and their direct mapping to executable ROS actions/services, which turns natural language into runnable robot programs across multiple structured modes.

If this is right

Non-experts can specify task requirements through a chat interface without writing code.
The same framework supports long-horizon tasks, tabletop rearrangements, and remote supervisory control.
Imitation learning expands the library of available robot actions over time.
LLM reflection improves execution by incorporating feedback from humans and the physical environment.
Open-source release of the code allows others to reproduce results and extend the system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could let non-programmers deploy robots in settings such as homes or small workshops where expert coders are unavailable.
Similar LLM-to-action pipelines might apply to other middleware besides ROS if the extraction step generalizes.
Long-term use could reveal whether repeated feedback loops reduce the rate of mapping errors over successive tasks.

Load-bearing premise

The automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention or failure across varied prompts and environments.

What would settle it

A test set of new prompts and environments in which the system fails to produce correct ROS mappings from LLM outputs in more than a small fraction of trials.

Figures

Figures reproduced from arXiv: 2406.19741 by Anbang Zhai, Antoine Grosnit, Cesar Cadena, Christopher E. Mower, Daniel Palenicek, Davide Tateo, Guangjian Tian, Haitham Bou-Ammar, Hongzhan Yu, Jan Peters, Jianye Hao, Jinlong Wang, Jonas Gonzalez-Billandon, Jun Wang, Kun Shao, Marco Hutter, Matthieu Zimmer, Puze Liu, Xingyue Quan, Xinyu Zhang, Yao Zhao, Yuhui Wan, Yuzheng Zhuang.

**Figure 2.** Figure 2: Our proposed ROS-LLM framework overview illustrates the integration of several com [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Real-world laboratory setup used in our experiments. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Detailed steps in the coffee-making process arranged in a modified Z-shaped flow across [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Detailed steps in the coffee-making process are depicted across twelve images: (a) picking [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Sequence for the 6-cube task, depicted over ten stages: (a) starting, (b) picking up a green [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Results depicting policy correction through human feedback, where orange indicates task [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Steps in the pasta-making process, depicted in five stages: (a) grating cheese, (b) pouring [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Remote supervisory control using (a) language interfaces, depicted through continuous [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Experiment setup for the human study [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Weighted NASA TLX results for remote supervisory control. [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 7.** Figure 7: These results are encouraging for the deployment of this framework in scenarios requiring complex sequential task execution, highlighting its potential reliability and effectiveness in practical applications. 5.2 Enhancing policy correction via human feedback Targeted human feedback has shown the potential to mitigate this degradation by correcting erroneous policy decisions dynamically. We noticed that d… view at source ↗

**Figure 12.** Figure 12: The setup for the Robot Air Hockey Challenge with two KUKA IIWA robot arms. [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

read the original abstract

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ROS-LLM packages LLM-robotics ideas into an open ROS framework with three behavior modes plus reflection, but the experiments supply no metrics on extraction reliability or failure rates.

read the letter

The main thing to know is that this is an engineering integration paper rather than a new method. It wires LLMs to ROS so users can describe tasks in chat, then automatically pulls out one of three behavior formats (sequence, behavior tree, state machine), runs the corresponding ROS actions or services, supports imitation learning to grow the action set, and adds an LLM reflection step that takes human or environment feedback. The code is open-sourced, which is the clearest positive here; anyone working in ROS can download it and see exactly how the pieces connect.

Referee Report

2 major / 0 minor

Summary. The manuscript presents ROS-LLM, a framework integrating large language models with ROS to enable non-experts to program robots via natural language chat. It features an AI agent supporting multiple LLMs, automatic extraction of LLM outputs into one of three behavior representations (sequence, behavior tree, state machine) for mapping to ROS actions/services, imitation learning to extend the action library, and LLM reflection using human/environment feedback. The authors claim extensive experiments demonstrate robustness, scalability, and versatility across long-horizon tasks, tabletop rearrangements, and remote supervisory control, with open-source code provided for reproduction.

Significance. If the automatic extraction and execution pipeline proves reliable with minimal intervention, the framework could lower barriers to embodied AI deployment by combining structured reasoning modes with ROS primitives. The open-source release is a clear strength for reproducibility. However, without quantitative validation the practical significance remains difficult to assess against prior ROS-LLM integrations.

major comments (2)

[Abstract] Abstract: the claim that 'extensive experiments validate the framework, showcasing robustness, scalability, and versatility' is unsupported by any reported metrics (success rates, parsing failure counts, human intervention frequency, or baselines). This directly affects the central assertion that automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention.
[Key features / Experimental validation] The description of the extraction and reflection mechanism (key features paragraph) asserts reliable mapping across prompt variations and environments, yet no quantitative evidence (e.g., extraction accuracy, retry counts, or task completion rates) is supplied for the long-horizon or tabletop scenarios. This leaves the 'intuitive for non-experts' requirement unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and agree that revisions are required to ensure claims are supported by the presented material.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'extensive experiments validate the framework, showcasing robustness, scalability, and versatility' is unsupported by any reported metrics (success rates, parsing failure counts, human intervention frequency, or baselines). This directly affects the central assertion that automatic extraction of behaviors from LLM output can be mapped reliably to executable ROS actions/services without frequent human intervention.

Authors: We agree that the abstract claim is not supported by quantitative metrics, as the manuscript presents only qualitative demonstrations of the framework in long-horizon tasks, tabletop rearrangements, and remote control scenarios. We will revise the abstract to describe these as illustrative examples of the framework's capabilities rather than claiming validation of robustness, scalability, or versatility through metrics. The central assertion about reliable mapping will also be qualified to reflect the absence of such data. revision: yes
Referee: [Key features / Experimental validation] The description of the extraction and reflection mechanism (key features paragraph) asserts reliable mapping across prompt variations and environments, yet no quantitative evidence (e.g., extraction accuracy, retry counts, or task completion rates) is supplied for the long-horizon or tabletop scenarios. This leaves the 'intuitive for non-experts' requirement unverified.

Authors: We acknowledge that assertions of reliable mapping and intuitiveness for non-experts in the key features and experimental sections lack supporting quantitative evidence such as accuracy rates or intervention counts. The current text relies on descriptive examples. We will revise these sections to remove or qualify claims of reliability and to clarify that the non-expert usability is a design goal illustrated by the chat interface and behavior modes, without empirical verification in the manuscript. A limitations discussion will be added if appropriate. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering integration validated externally

full rationale

The paper describes a software framework integrating ROS with LLMs for robot task execution via natural language. No mathematical derivations, fitted parameters, or equations are present. Validation relies on external robot performance in experiments (long-horizon tasks, rearrangements, remote control), not on self-referential definitions or self-citation chains. The extraction and mapping steps are implementation details whose reliability is claimed to be measured by system behavior, not reduced to inputs by construction. This matches the default non-circular case for systems papers.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework depends on the assumption that LLMs produce parsable structured outputs that map cleanly to ROS primitives and that imitation learning can reliably add new actions.

axioms (2)

domain assumption LLM-generated text can be automatically parsed into valid sequence, behavior tree, or state machine structures that execute correctly on ROS.
Invoked in the description of automatic extraction of behaviors from LLM output.
standard math ROS actions and services provide a stable interface for execution and feedback collection.
Background assumption of the entire integration layer.

pith-pipeline@v0.9.0 · 5809 in / 1214 out tokens · 20641 ms · 2026-05-23T23:44:25.062908+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems
cs.RO 2026-04 unverdicted novelty 6.0

Backdoor attacks aligned with JSON command formats in LLM robot controllers achieve 83% attack success rate while preserving over 93% clean accuracy and sub-second latency.
ORICF -- Open Robotics Inference and Control Framework
cs.RO 2026-05 unverdicted novelty 5.0

ORICF is a declarative, model-agnostic robotics framework with YAML specs and edge offloading that reduces robot compute utilization by up to 83% and energy by 66% in a ROS2 demo combining ASR, LLM, and CNN.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Barto and Sridhar Mahadevan

Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341–379, Oct 2003

work page 2003
[2]

Trac-ik: An open-source library for improved solving of generic inverse kinematics

Patrick Beeson and Barrett Ames. Trac-ik: An open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, 2015

work page 2015
[3]

Rt-h: Action hierarchies using language, 2024

Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, and Dorsa Sadigh. Rt-h: Action hierarchies using language, 2024

work page 2024
[4]

Calin Belta, Antonio Bicchi, Magnus Egerstedt, Emilio Frazzoli, Eric Klavins, and George J. Pappas. Symbolic planning and control of robot motion [grand challenges of robotics]. IEEE Robotics & Automation Magazine, 14(1):61–70, 2007

work page 2007
[5]

Bruyninckx

H. Bruyninckx. Open robot control software: the orocos project. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164) , volume 3, pages 2523–2528 vol.3, 2001

work page 2001
[6]

Yue Cao and C. S. George Lee. Robot behavior-tree-based task generation with large language models, 2023

work page 2023
[7]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021
[8]

ros_control: A generic and simple control framework for ros

Sachin Chitta, Eitan Marder-Eppstein, Wim Meeussen, Vijay Pradeep, Adolfo Rodríguez Tsouroukdissian, Jonathan Bohren, David Coleman, Bence Magyar, Gennaro Raiola, Mathias Lüdtke, and Enrique Fernandez Perdomo. ros_control: A generic and simple control framework for ros. Journal of Open Source Software, 2(20):456, 2017

work page 2017
[9]

Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study

David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a moveit! case study. arXiv preprint arXiv:1404.3785, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Everett and Alexander H

John G. Everett and Alexander H. Slocum. Automation and robotics opportunities: Construction versus manufacturing. Journal of Construction Engineering and Management , 120(2):443–452, 1994

work page 1994
[11]

tf: The transform library

Tully Foote. tf: The transform library. In 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA), pages 1–6, 2013

work page 2013
[12]

Mathematical capabilities of chatgpt, 2023

Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, and Julius Berner. Mathematical capabilities of chatgpt, 2023

work page 2023
[13]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y . Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024

work page 2024
[14]

Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

work page 2024
[15]

NASA task load index (TLX)

Sandra G Hart. NASA task load index (TLX). 1986

work page 1986
[16]

Huang, Edwin Olson, and David C

Albert S. Huang, Edwin Olson, and David C. Moore. Lcm: Lightweight communications and marshalling. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 4057–4062, 2010

work page 2010
[17]

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

work page 2022
[18]

Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, and Brian Ichter. Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

work page 2023
[19]

Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles

Martin Huber, Christopher E. Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles. Lbr-stack: Ros 2 and python integration of kuka fri for med and iiwa robots, 2024

work page 2024
[20]

Multimodal detection and classification of robot manipulation failures

Arda Inceoglu, Eren Erdal Aksoy, and Sanem Sariel. Multimodal detection and classification of robot manipulation failures. IEEE Robotics and Automation Letters , 9(2):1396–1403, 2024

work page 2024
[21]

A survey of behavior trees in robotics and ai

Matteo Iovino, Edvards Scukins, Jonathan Styrud, Petter Ögren, and Christian Smith. A survey of behavior trees in robotics and ai. Robotics and Autonomous Systems , 154:104096, 2022

work page 2022
[22]

Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

Riccardo Andrea Izzo, Gianluca Bardaro, and Matteo Matteucci. Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

work page 2024
[23]

Vima: General robot manipulation with multimodal prompts, 2023

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts, 2023

work page 2023
[24]

HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution

Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, and Jimmy Lin. HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution. arXiv:2307.16883, 2023

work page arXiv 2023
[25]

Understanding large-language model (llm)- powered human-robot interaction

Callie Y Kim, Christine P Lee, and Bilge Mutlu. Understanding large-language model (llm)- powered human-robot interaction. In Proceedings of the 2024 ACM/IEEE International Confer- ence on Human-Robot Interaction , pages 371–380, 2024. 23

work page 2024
[26]

Design and use paradigms for gazebo, an open-source multi-robot simulator

Nathan Koenig and Andrew Howard. Design and use paradigms for gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 2149–2154, Sendai, Japan, Sep 2004

work page 2004
[27]

Language models as zero-shot trajectory generators, 2023

Teyun Kwon, Norman Di Palo, and Edward Johns. Language models as zero-shot trajectory generators, 2023

work page 2023
[28]

Chain of code: Reasoning with a language model-augmented code emulator, 2023

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. Chain of code: Reasoning with a language model-augmented code emulator, 2023

work page 2023
[29]

Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

Xinrui Lin, Yangfan Wu, Huanyu Yang, Yu Zhang, Yanyong Zhang, and Jianmin Ji. Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

work page 2024
[30]

Interactive robot learning from verbal correction, 2023

Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, and Ching-An Cheng. Interactive robot learning from verbal correction, 2023

work page 2023
[31]

Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

Artem Lykov and Dzmitry Tsetserukou. Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

work page 2023
[32]

Robot op- erating system 2: Design, architecture, and uses in the wild

Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot op- erating system 2: Design, architecture, and uses in the wild. Science Robotics, 7(66):eabm6074, 2022

work page 2022
[33]

Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

Yusuke Mikami, Andrew Melnik, Jun Miura, and Ville Hautamäki. Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

work page 2024
[34]

Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction

Christopher Mower, Theodoros Stouraitis, Joao Moura, Christian Rauch, Lei Yan, Nazanin Za- mani Behabadi, Michael Gienger, Tom Vercauteren, Christos Bergeles, and Sethu Vijayakumar. Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction. In Conference on Robot Learning, pages 1411–1423. PMLR, 2023

work page 2023
[35]

Skill-based Shared Control

Christopher E Mower, Joao Moura, and Sethu Vijayakumar. Skill-based Shared Control. In Proceedings of Robotics: Science and Systems , Virtual, July 2021

work page 2021
[36]

Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles

Christopher E. Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles. Optas: An optimization-based task specification library for trajectory optimization and model predictive control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 9118–9124, 2023

work page 2023
[37]

Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

work page 2023
[38]

Apriltag: A robust and flexible visual fiducial system

Edwin Olson. Apriltag: A robust and flexible visual fiducial system. In2011 IEEE International Conference on Robotics and Automation , pages 3400–3407, 2011

work page 2011
[39]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Bern...

work page 2024
[40]

Murillo, and Mac Schwager

Pablo Pueyo, Eduardo Montijano, Ana C. Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models, 2024

work page 2024
[41]

Ros: an open-source robot operating system

Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, Andrew Y Ng, et al. Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe, Japan, 2009

work page 2009
[42]

Robust speech recognition via large-scale weak supervision

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023

work page 2023
[43]

Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn

Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn. Yell at your robot: Improving on-the-fly from language corrections, 2024

work page 2024
[44]

Tenenbaum, Leslie Pack Kaelbling, and Michael Katz

Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Pack Kaelbling, and Michael Katz. Generalized planning in pddl domains with pretrained large language models, 2023

work page 2023
[45]

Progprompt: Generating situated robot task plans using large language models, 2022

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models, 2022

work page 2022
[46]

Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Senevi- ratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Schärli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Agüera y Arcas, Dale Webster, Greg S. Corrad...

work page 2023
[47]

R. Smits. KDL: Kinematics and Dynamics Library. http://www.orocos.org/kdl

work page
[48]

Llm-planner: Few-shot grounded planning for embodied agents with large language models

Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 2998–3009, 2023. 25

work page 2023
[49]

To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, and Michael Gienger. To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

work page 2024
[50]

Llama: Open and efficient foundation language models, 2023

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023

work page 2023
[51]

Trinh, Yuhuai Wu, Quoc V

Trieu H. Trinh, Yuhuai Wu, Quoc V . Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, Jan 2024

work page 2024
[52]

Why can large language models generate correct chain-of-thoughts? 2023

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, and Haitham Bou-Ammar. Why can large language models generate correct chain-of-thoughts? 2023

work page 2023
[53]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017
[54]

Performance and usability evaluation scheme for mobile manipulator teleopera- tion

Yuhui Wan, Jingcheng Sun, Christopher Peers, Joseph Humphreys, Dimitrios Kanoulas, and Chengxu Zhou. Performance and usability evaluation scheme for mobile manipulator teleopera- tion. IEEE Transactions on Human-Machine Systems, 2023

work page 2023
[55]

Llm granularity for on-the-fly robot control, 2024

Peng Wang, Mattia Robbiani, and Zhihao Guo. Llm granularity for on-the-fly robot control, 2024

work page 2024
[56]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. 2022

work page 2022
[57]

Large language models for verifiable sequential decision-making in autonomous systems

Yunhao Yang, Jean-Raphael Gaglione, Cyrus Neary, et al. Large language models for verifiable sequential decision-making in autonomous systems. In 2nd Workshop on Language and Robot Learning: Language as Grounding , 2023

work page 2023
[58]

Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew R. Walter. Statler: State-maintaining language models for embodied reasoning, 2023

work page 2023
[59]

Socratic models: Composing zero-shot multimodal reasoning with language, 2022

Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, and Pete Florence. Socratic models: Composing zero-shot multimodal reasoning with language, 2022. 26

work page 2022

[1] [1]

Barto and Sridhar Mahadevan

Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341–379, Oct 2003

work page 2003

[2] [2]

Trac-ik: An open-source library for improved solving of generic inverse kinematics

Patrick Beeson and Barrett Ames. Trac-ik: An open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, 2015

work page 2015

[3] [3]

Rt-h: Action hierarchies using language, 2024

Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, and Dorsa Sadigh. Rt-h: Action hierarchies using language, 2024

work page 2024

[4] [4]

Calin Belta, Antonio Bicchi, Magnus Egerstedt, Emilio Frazzoli, Eric Klavins, and George J. Pappas. Symbolic planning and control of robot motion [grand challenges of robotics]. IEEE Robotics & Automation Magazine, 14(1):61–70, 2007

work page 2007

[5] [5]

Bruyninckx

H. Bruyninckx. Open robot control software: the orocos project. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164) , volume 3, pages 2523–2528 vol.3, 2001

work page 2001

[6] [6]

Yue Cao and C. S. George Lee. Robot behavior-tree-based task generation with large language models, 2023

work page 2023

[7] [7]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021

[8] [8]

ros_control: A generic and simple control framework for ros

Sachin Chitta, Eitan Marder-Eppstein, Wim Meeussen, Vijay Pradeep, Adolfo Rodríguez Tsouroukdissian, Jonathan Bohren, David Coleman, Bence Magyar, Gennaro Raiola, Mathias Lüdtke, and Enrique Fernandez Perdomo. ros_control: A generic and simple control framework for ros. Journal of Open Source Software, 2(20):456, 2017

work page 2017

[9] [9]

Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study

David Coleman, Ioan Sucan, Sachin Chitta, and Nikolaus Correll. Reducing the barrier to entry of complex robotic software: a moveit! case study. arXiv preprint arXiv:1404.3785, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Everett and Alexander H

John G. Everett and Alexander H. Slocum. Automation and robotics opportunities: Construction versus manufacturing. Journal of Construction Engineering and Management , 120(2):443–452, 1994

work page 1994

[11] [11]

tf: The transform library

Tully Foote. tf: The transform library. In 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA), pages 1–6, 2013

work page 2013

[12] [12]

Mathematical capabilities of chatgpt, 2023

Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, and Julius Berner. Mathematical capabilities of chatgpt, 2023

work page 2023

[13] [13]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y . Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024

work page 2024

[14] [14]

Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning, 2024

work page 2024

[15] [15]

NASA task load index (TLX)

Sandra G Hart. NASA task load index (TLX). 1986

work page 1986

[16] [16]

Huang, Edwin Olson, and David C

Albert S. Huang, Edwin Olson, and David C. Moore. Lcm: Lightweight communications and marshalling. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 4057–4062, 2010

work page 2010

[17] [17]

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

work page 2022

[18] [18]

Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, and Brian Ichter. Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023

work page 2023

[19] [19]

Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles

Martin Huber, Christopher E. Mower, Sebastien Ourselin, Tom Vercauteren, and Christos Bergeles. Lbr-stack: Ros 2 and python integration of kuka fri for med and iiwa robots, 2024

work page 2024

[20] [20]

Multimodal detection and classification of robot manipulation failures

Arda Inceoglu, Eren Erdal Aksoy, and Sanem Sariel. Multimodal detection and classification of robot manipulation failures. IEEE Robotics and Automation Letters , 9(2):1396–1403, 2024

work page 2024

[21] [21]

A survey of behavior trees in robotics and ai

Matteo Iovino, Edvards Scukins, Jonathan Styrud, Petter Ögren, and Christian Smith. A survey of behavior trees in robotics and ai. Robotics and Autonomous Systems , 154:104096, 2022

work page 2022

[22] [22]

Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

Riccardo Andrea Izzo, Gianluca Bardaro, and Matteo Matteucci. Btgenbot: Behavior tree generation for robotic tasks with lightweight llms, 2024

work page 2024

[23] [23]

Vima: General robot manipulation with multimodal prompts, 2023

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. Vima: General robot manipulation with multimodal prompts, 2023

work page 2023

[24] [24]

HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution

Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, and Jimmy Lin. HA- GRID: A human-llm collaborative dataset for generative information-seeking with attribution. arXiv:2307.16883, 2023

work page arXiv 2023

[25] [25]

Understanding large-language model (llm)- powered human-robot interaction

Callie Y Kim, Christine P Lee, and Bilge Mutlu. Understanding large-language model (llm)- powered human-robot interaction. In Proceedings of the 2024 ACM/IEEE International Confer- ence on Human-Robot Interaction , pages 371–380, 2024. 23

work page 2024

[26] [26]

Design and use paradigms for gazebo, an open-source multi-robot simulator

Nathan Koenig and Andrew Howard. Design and use paradigms for gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 2149–2154, Sendai, Japan, Sep 2004

work page 2004

[27] [27]

Language models as zero-shot trajectory generators, 2023

Teyun Kwon, Norman Di Palo, and Edward Johns. Language models as zero-shot trajectory generators, 2023

work page 2023

[28] [28]

Chain of code: Reasoning with a language model-augmented code emulator, 2023

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. Chain of code: Reasoning with a language model-augmented code emulator, 2023

work page 2023

[29] [29]

Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

Xinrui Lin, Yangfan Wu, Huanyu Yang, Yu Zhang, Yanyong Zhang, and Jianmin Ji. Clmasp: Coupling large language models with answer set programming for robotic task planning, 2024

work page 2024

[30] [30]

Interactive robot learning from verbal correction, 2023

Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, and Ching-An Cheng. Interactive robot learning from verbal correction, 2023

work page 2023

[31] [31]

Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

Artem Lykov and Dzmitry Tsetserukou. Llm-brain: Ai-driven fast generation of robot behaviour tree based on large language model, 2023

work page 2023

[32] [32]

Robot op- erating system 2: Design, architecture, and uses in the wild

Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot op- erating system 2: Design, architecture, and uses in the wild. Science Robotics, 7(66):eabm6074, 2022

work page 2022

[33] [33]

Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

Yusuke Mikami, Andrew Melnik, Jun Miura, and Ville Hautamäki. Natural language as policies: Reasoning for coordinate-level embodied control with llms, 2024

work page 2024

[34] [34]

Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction

Christopher Mower, Theodoros Stouraitis, Joao Moura, Christian Rauch, Lei Yan, Nazanin Za- mani Behabadi, Michael Gienger, Tom Vercauteren, Christos Bergeles, and Sethu Vijayakumar. Ros-pybullet interface: A framework for reliable contact simulation and human-robot interaction. In Conference on Robot Learning, pages 1411–1423. PMLR, 2023

work page 2023

[35] [35]

Skill-based Shared Control

Christopher E Mower, Joao Moura, and Sethu Vijayakumar. Skill-based Shared Control. In Proceedings of Robotics: Science and Systems , Virtual, July 2021

work page 2021

[36] [36]

Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles

Christopher E. Mower, João Moura, Nazanin Zamani Behabadi, Sethu Vijayakumar, Tom Vercauteren, and Christos Bergeles. Optas: An optimization-based task specification library for trajectory optimization and model predictive control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 9118–9124, 2023

work page 2023

[37] [37]

Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023

work page 2023

[38] [38]

Apriltag: A robust and flexible visual fiducial system

Edwin Olson. Apriltag: A robust and flexible visual fiducial system. In2011 IEEE International Conference on Robotics and Automation , pages 3400–3407, 2011

work page 2011

[39] [39]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Bern...

work page 2024

[40] [40]

Murillo, and Mac Schwager

Pablo Pueyo, Eduardo Montijano, Ana C. Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models, 2024

work page 2024

[41] [41]

Ros: an open-source robot operating system

Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, Andrew Y Ng, et al. Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe, Japan, 2009

work page 2009

[42] [42]

Robust speech recognition via large-scale weak supervision

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023

work page 2023

[43] [43]

Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn

Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, and Chelsea Finn. Yell at your robot: Improving on-the-fly from language corrections, 2024

work page 2024

[44] [44]

Tenenbaum, Leslie Pack Kaelbling, and Michael Katz

Tom Silver, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Pack Kaelbling, and Michael Katz. Generalized planning in pddl domains with pretrained large language models, 2023

work page 2023

[45] [45]

Progprompt: Generating situated robot task plans using large language models, 2022

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models, 2022

work page 2022

[46] [46]

Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Senevi- ratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Schärli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Agüera y Arcas, Dale Webster, Greg S. Corrad...

work page 2023

[47] [47]

R. Smits. KDL: Kinematics and Dynamics Library. http://www.orocos.org/kdl

work page

[48] [48]

Llm-planner: Few-shot grounded planning for embodied agents with large language models

Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 2998–3009, 2023. 25

work page 2023

[49] [49]

To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, and Michael Gienger. To help or not to help: Llm-based attentive support for human-robot group interactions, 2024

work page 2024

[50] [50]

Llama: Open and efficient foundation language models, 2023

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023

work page 2023

[51] [51]

Trinh, Yuhuai Wu, Quoc V

Trieu H. Trinh, Yuhuai Wu, Quoc V . Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, Jan 2024

work page 2024

[52] [52]

Why can large language models generate correct chain-of-thoughts? 2023

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, and Haitham Bou-Ammar. Why can large language models generate correct chain-of-thoughts? 2023

work page 2023

[53] [53]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017

[54] [54]

Performance and usability evaluation scheme for mobile manipulator teleopera- tion

Yuhui Wan, Jingcheng Sun, Christopher Peers, Joseph Humphreys, Dimitrios Kanoulas, and Chengxu Zhou. Performance and usability evaluation scheme for mobile manipulator teleopera- tion. IEEE Transactions on Human-Machine Systems, 2023

work page 2023

[55] [55]

Llm granularity for on-the-fly robot control, 2024

Peng Wang, Mattia Robbiani, and Zhihao Guo. Llm granularity for on-the-fly robot control, 2024

work page 2024

[56] [56]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. 2022

work page 2022

[57] [57]

Large language models for verifiable sequential decision-making in autonomous systems

Yunhao Yang, Jean-Raphael Gaglione, Cyrus Neary, et al. Large language models for verifiable sequential decision-making in autonomous systems. In 2nd Workshop on Language and Robot Learning: Language as Grounding , 2023

work page 2023

[58] [58]

Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew R. Walter. Statler: State-maintaining language models for embodied reasoning, 2023

work page 2023

[59] [59]

Socratic models: Composing zero-shot multimodal reasoning with language, 2022

Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, and Pete Florence. Socratic models: Composing zero-shot multimodal reasoning with language, 2022. 26

work page 2022