pith. machine review for the scientific record. sign in

arxiv: 2209.11302 · v1 · submitted 2022-09-22 · 💻 cs.RO · cs.AI· cs.CL· cs.LG

Recognition: 1 theorem link

· Lean Theorem

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-17 04:17 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CLcs.LG
keywords robot task planninglarge language modelsprogrammatic promptssituated environmentsVirtualHometabletop tasksexecutable plans
0
0 comments X

The pith

Structuring LLM prompts as executable programs lets robots generate valid task plans across any environment and capabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that large language models can produce robot task plans that stay feasible in the robot's actual situation by receiving prompts written in a program-like style. Instead of free-form text or scoring every possible next step, the prompt lists the available actions and objects as code definitions and supplies a few example programs the robot could run. A sympathetic reader would care because defining all the domain rules for each new robot or setting is normally very laborious, yet this structure lets the model draw on its broad knowledge while staying grounded. The approach reaches strong results on household tasks in simulation and transfers directly to a physical robot arm.

Core claim

ProgPrompt is a prompt structure that supplies the LLM with program-like specifications of the available actions and objects in the current environment together with example executable programs; this format produces plans that remain functional across different situated environments, robot capabilities, and tasks without generating actions impossible in the robot's context.

What carries the argument

The ProgPrompt structure: a prompt containing code-style definitions of actions and objects plus example programs that the robot can actually execute.

If this is right

  • Plans are produced without needing to enumerate every possible next action for scoring.
  • The same prompt template works for robots with different capabilities and in different environments.
  • State-of-the-art success rates are achieved on VirtualHome household tasks.
  • The method transfers to physical robot arms performing tabletop tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same style of prompt could be tried for planning problems outside robotics, such as software task automation.
  • Varying the number or complexity of example programs in the prompt might improve reliability on longer-horizon tasks.
  • If the LLM occasionally still suggests invalid steps, a lightweight execution check could filter them without losing the benefits of the prompt structure.

Load-bearing premise

That giving the LLM program-like lists of actions and objects plus example programs will stop it from outputting actions impossible in the robot's present context.

What would settle it

Run the prompt on a new scene containing an object the prompt declares unavailable and check whether the generated plan ever references that object or an unavailable action.

read the original abstract

Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions not possible on a given robot in its current context. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with example programs that can be executed. We make concrete recommendations about prompt structure and generation constraints through ablation experiments, demonstrate state of the art success rates in VirtualHome household tasks, and deploy our method on a physical robot arm for tabletop tasks. Website at progprompt.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that a programmatic LLM prompt structure—specifying available actions and objects in program-like form along with executable example programs—enables generation of situated robot task plans that remain functional across environments, robot capabilities, and tasks. It reports ablation experiments on prompt structure and generation constraints, state-of-the-art success rates on VirtualHome household tasks, and successful physical deployment on a robot arm for tabletop tasks.

Significance. If the empirical results hold, the work offers a practical reduction in domain-knowledge engineering for robot planning by leveraging LLMs through structured prompts rather than free-form generation or exhaustive enumeration of next steps.

major comments (1)
  1. [Abstract; Method description of prompt structure] The central claim that the prompt structure reliably produces only contextually feasible plans rests on the LLM's implicit adherence to the supplied action/object specifications and examples. The abstract notes ablation experiments on prompt structure and generation constraints, yet the manuscript provides no description of an explicit runtime validity filter, precondition checker, or post-generation verification step; this leaves open the possibility that stochastic outputs can still include syntactically valid but semantically impossible actions (e.g., referencing absent objects or unmet preconditions) in novel settings or longer horizons.
minor comments (2)
  1. [Experiments section] In the VirtualHome results, clarify the precise success-rate numbers, number of trials, and exact baseline methods used for the SOTA comparison.
  2. [Ablation experiments] Specify the exact generation constraints (e.g., temperature, sampling method, or output formatting rules) applied during LLM inference.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the practical value of structured prompting for reducing domain-knowledge engineering in robot task planning. We address the major comment below and will incorporate clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract; Method description of prompt structure] The central claim that the prompt structure reliably produces only contextually feasible plans rests on the LLM's implicit adherence to the supplied action/object specifications and examples. The abstract notes ablation experiments on prompt structure and generation constraints, yet the manuscript provides no description of an explicit runtime validity filter, precondition checker, or post-generation verification step; this leaves open the possibility that stochastic outputs can still include syntactically valid but semantically impossible actions (e.g., referencing absent objects or unmet preconditions) in novel settings or longer horizons.

    Authors: We thank the referee for this observation. Our method is deliberately designed without an explicit runtime validity filter, precondition checker, or post-generation verification step; the core idea is to leverage the LLM through a structured prompt that supplies the current environment's available actions and objects in program-like form together with executable example programs. This prompt is dynamically instantiated for each situated context, so the LLM is instructed to generate plans using only the listed primitives. Ablation experiments (reported in the manuscript) confirm that removing the action/object specifications or the examples substantially degrades success rates, supporting the value of this structure. While the stochastic nature of LLMs means invalid outputs remain theoretically possible, our VirtualHome results and physical robot deployments demonstrate that such cases are infrequent when the recommended prompt structure and generation constraints are used. We will revise the manuscript to (1) explicitly state the absence of an external verifier, (2) elaborate on how the prompt construction process enforces contextual adherence, and (3) add a short discussion of observed failure modes and behavior on longer horizons. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation on external benchmarks

full rationale

The paper presents a prompting method for LLMs to generate situated robot task plans and supports its claims through ablation experiments, success rates on the VirtualHome benchmark, and physical robot deployment. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the derivation. The central technique relies on explicit prompt structure (action/object specs plus executable examples) whose effectiveness is tested against independent external environments rather than reducing to inputs by construction. This is the most common honest outcome for an empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that LLMs can correctly interpret structured program prompts and generate only feasible actions without additional fine-tuning or verification.

axioms (1)
  • domain assumption LLMs possess sufficient commonsense and programming knowledge to produce valid executable plans when given environment specifications and examples
    This is the load-bearing premise that allows the method to work across different robots and tasks without explicit domain engineering.

pith-pipeline@v0.9.0 · 5508 in / 1036 out tokens · 32046 ms · 2026-05-17T04:17:03.442673+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with example programs that can be executed.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Using large language models for embodied planning introduces systematic safety risks

    cs.AI 2026-04 unverdicted novelty 7.0

    LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.

  2. ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

    cs.RO 2026-02 unverdicted novelty 7.0

    ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

  3. VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

    cs.RO 2023-07 unverdicted novelty 7.0

    VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

  4. Voyager: An Open-Ended Embodied Agent with Large Language Models

    cs.AI 2023-05 unverdicted novelty 7.0

    Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...

  5. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

    cs.AI 2023-04 accept novelty 7.0

    LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.

  6. When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

    cs.AI 2026-05 conditional novelty 6.0

    LongAct benchmark reveals top VLMs reach only 59% goal completion and 16% full success on long-horizon household tasks, while HoloMind agent improves results via DAG planner, multimodal spatial memory, episodic memory...

  7. From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 6.0

    AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bim...

  8. Re$^2$MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

    cs.CV 2026-04 unverdicted novelty 6.0

    Re²MoGen generates open-vocabulary motions via MCTS-enhanced LLM keyframe planning, pose-prior optimization with dynamic temporal matching fine-tuning, and physics-aware RL post-training, claiming SOTA performance.

  9. A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

    cs.RO 2026-04 unverdicted novelty 6.0

    A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user ...

  10. SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    cs.CR 2026-02 unverdicted novelty 6.0

    The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.

  11. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

    cs.AI 2023-05 conditional novelty 6.0

    GITM uses LLMs to generate action plans from text knowledge and memory, enabling agents to complete long-horizon Minecraft tasks at much higher success rates than prior RL methods.

  12. Reasoning with Language Model is Planning with World Model

    cs.CL 2023-05 unverdicted novelty 6.0

    RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.

  13. PaLM-E: An Embodied Multimodal Language Model

    cs.LG 2023-03 conditional novelty 6.0

    PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive t...

  14. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

    cs.AI 2023-02 conditional novelty 6.0

    DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.

  15. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    cs.SE 2026-04 accept novelty 5.0

    LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

  16. A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

    cs.AI 2026-03 unverdicted novelty 5.0

    HECG combines multi-dimensional metrics for strategy choice, ten-type error classification with recoverability details, and causal-context graphs to improve LLM agent reliability in complex tasks.

  17. LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

    cs.RO 2025-12 unverdicted novelty 4.0

    LEO-RobotAgent is a general-purpose framework that enables LLMs to independently plan, use tools, and collaborate with humans while operating multiple robot types for unpredictable tasks.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 17 Pith papers · 1 internal anchor

  1. [1]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y . Chebotar, P. Sermanet, N. Brown, T. Jack- son, L. Luu, S. Levine, K. Hausman, and B. Ichter, “Inner monologue: Embodied reasoning through planning with language models,” inarXiv preprint arXiv:2207.05608, 2022

  2. [2]

    Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,

    W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” arXiv preprint arXiv:2201.07207 , 2022

  3. [3]

    Socratic models: Composing zero-shot multimodal reasoning with language,

    A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V . Sindhwani, J. Lee, V . Vanhoucke, and P. Florence, “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv, 2022

  4. [4]

    Do as i can, not as i say: Grounding language in robotic affordances,

    M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y . Kuang, K.- H. Lee, S. Levine, Y . Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse,...

  5. [5]

    Strips: A new approach to the application of theorem proving to problem solving,

    R. E. Fikes and N. J. Nilsson, “Strips: A new approach to the application of theorem proving to problem solving,” in Proceedings of the 2nd International Joint Conference on Artificial Intelligence , ser. IJCAI’71. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1971, p. 608–620

  6. [6]

    Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning,

    C. R. Garrett, T. Lozano-P ´erez, and L. P. Kaelbling, “Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning,” Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, no. 1, pp. 440–448, Jun. 2020

  7. [7]

    Task planning in robotics: an empirical comparison of pddl-based and asp-based systems,

    Y . Jiang, S. Zhang, P. Khandelwal, and P. Stone, “Task planning in robotics: an empirical comparison of pddl-based and asp-based systems,” 2018

  8. [8]

    Virtualhome: Simulating household activities via programs,

    X. Puig, K. Ra, M. Boben, J. Li, T. Wang, S. Fidler, and A. Torralba, “Virtualhome: Simulating household activities via programs,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2018, pp. 8494–8502

  9. [9]

    ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks,

    M. Shridhar, J. Thomason, D. Gordon, Y . Bisk, W. Han, R. Mottaghi, L. Zettlemoyer, and D. Fox, “ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2020

  10. [10]

    A heuristic search approach to planning with temporally extended preferences,

    J. A. Baier, F. Bacchus, and S. A. McIlraith, “A heuristic search approach to planning with temporally extended preferences,” in Pro- ceedings of the 20th International Joint Conference on Artifical Intel- ligence, ser. IJCAI’07. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007, p. 1808–1815

  11. [11]

    Ff: The fast-forward planning system,

    J. Hoffmann, “Ff: The fast-forward planning system,” AI Magazine , vol. 22, no. 3, p. 57, Sep. 2001

  12. [12]

    The fast downward planning system,

    M. Helmert, “The fast downward planning system,” J. Artif. Int. Res. , vol. 26, no. 1, p. 191–246, jul 2006

  13. [13]

    A tutorial on planning graph based reachability heuristics,

    D. Bryce and S. Kambhampati, “A tutorial on planning graph based reachability heuristics,” AI Magazine, vol. 28, no. 1, p. 47, Mar. 2007

  14. [14]

    Search on the replay buffer: Bridging planning and reinforcement learning,

    B. Eysenbach, R. R. Salakhutdinov, and S. Levine, “Search on the replay buffer: Bridging planning and reinforcement learning,” in Advances in Neural Information Processing Systems , H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019

  15. [15]

    Neural task programming: Learning to generalize across hierarchical tasks,

    D. Xu, S. Nair, Y . Zhu, J. Gao, A. Garg, L. Fei-Fei, and S. Savarese, “Neural task programming: Learning to generalize across hierarchical tasks,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3795–3802

  16. [16]

    Regression planning networks,

    D. Xu, R. Mart ´ın-Mart´ın, D.-A. Huang, Y . Zhu, S. Savarese, and L. F. Fei-Fei, “Regression planning networks,” in Advances in Neural Infor- mation Processing Systems , H. Wallach, H. Larochelle, A. Beygelz- imer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019

  17. [17]

    Inventing relational state and action abstractions for effective and efficient bilevel planning,

    T. Silver, R. Chitnis, N. Kumar, W. McClinton, T. Lozano-Perez, L. P. Kaelbling, and J. Tenenbaum, “Inventing relational state and action abstractions for effective and efficient bilevel planning,” in The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) , 2022

  18. [18]

    Value function spaces: Skill-centric state abstractions for long-horizon reasoning,

    D. Shah, A. T. Toshev, S. Levine, and brian ichter, “Value function spaces: Skill-centric state abstractions for long-horizon reasoning,” in International Conference on Learning Representations , 2022

  19. [19]

    Universal planning networks: Learning generalizable representations for visuo- motor control,

    A. Srinivas, A. Jabri, P. Abbeel, S. Levine, and C. Finn, “Universal planning networks: Learning generalizable representations for visuo- motor control,” in Proceedings of the 35th International Conference on Machine Learning , ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 4732–4741

  20. [20]

    Learning plannable representations with causal infogan,

    T. Kurutach, A. Tamar, G. Yang, S. J. Russell, and P. Abbeel, “Learning plannable representations with causal infogan,” in Advances in Neural Information Processing Systems , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018

  21. [21]

    Grounding language to autonomously-acquired skills via goal genera- tion,

    A. Akakzia, C. Colas, P.-Y . Oudeyer, M. CHETOUANI, and O. Sigaud, “Grounding language to autonomously-acquired skills via goal genera- tion,” in International Conference on Learning Representations, 2021

  22. [22]

    Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation,

    S. Nair and C. Finn, “Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation,” in International Conference on Learning Representations , 2020

  23. [23]

    Language as an abstraction for hierarchical deep reinforcement learning,

    Y . Jiang, S. S. Gu, K. P. Murphy, and C. Finn, “Language as an abstraction for hierarchical deep reinforcement learning,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019

  24. [24]

    Ella: Exploration through learned language abstraction,

    S. Mirchandani, S. Karamcheti, and D. Sadigh, “Ella: Exploration through learned language abstraction,” in Advances in Neural Infor- mation Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 29 529–29 540

  25. [25]

    Skill induction and planning with latent language,

    P. Sharma, A. Torralba, and J. Andreas, “Skill induction and planning with latent language,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 1713–1726

  26. [26]

    Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs,

    Y . Zhu, J. Tremblay, S. Birchfield, and Y . Zhu, “Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs,” 2020

  27. [27]

    Language models are few-shot learners,

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhari- wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. A...

  28. [28]

    Evaluating large language models trained on code,

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavar- ian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert...

  29. [29]

    Effective approaches to attention-based neural machine translation,

    T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” inProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Lisbon, Portugal: Association for Computational Linguistics, Sept. 2015, pp. 1412–1421

  30. [30]

    Challenges in data-to-document generation,

    S. Wiseman, S. Shieber, and A. Rush, “Challenges in data-to-document generation,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing . Copenhagen, Denmark: Association for Computational Linguistics, Sept. 2017, pp. 2253–2263

  31. [31]

    The curious case of neural text degeneration,

    A. Holtzman, J. Buys, L. Du, M. Forbes, and Y . Choi, “The curious case of neural text degeneration,” in International Conference on Learning Representations, 2020

  32. [32]

    Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions,

    P. Jansen, “Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions,” in Findings of the Association for Computational Linguistics: EMNLP 2020 . Online: Association for Computational Linguistics, Nov. 2020, pp. 4412–4417

  33. [33]

    Pre-trained language models for interactive decision-making,

    S. Li, X. Puig, C. Paxton, Y . Du, C. Wang, L. Fan, T. Chen, D.- A. Huang, E. Aky ¨urek, A. Anandkumar, J. Andreas, I. Mordatch, A. Torralba, and Y . Zhu, “Pre-trained language models for interactive decision-making,” 2022

  34. [34]

    Mapping language models to grounded conceptual spaces,

    R. Patel and E. Pavlick, “Mapping language models to grounded conceptual spaces,” in International Conference on Learning Repre- sentations, 2022

  35. [35]

    Pre- train, prompt, and predict: A systematic survey of prompting methods in natural language processing,

    P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre- train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” 2021

  36. [36]

    Chain of thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” 2022

  37. [37]

    Object rearrangement using learned implicit collision functions,

    M. Danielczuk, A. Mousavian, C. Eppner, and D. Fox, “Object rearrangement using learned implicit collision functions,” IEEE In- ternational Conference on Robotics and Automation (ICRA) , 2021

  38. [38]

    Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,

    M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact- graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) , 2021, pp. 13 438–13 444

  39. [39]

    Open-vocabulary object detec- tion via vision and language knowledge distillation,

    X. Gu, T.-Y . Lin, W. Kuo, and Y . Cui, “Open-vocabulary object detec- tion via vision and language knowledge distillation,” in International Conference on Learning Representations , 2022