Recognition: 2 theorem links
· Lean TheoremCode as Policies: Language Model Programs for Embodied Control
Pith reviewed 2026-05-15 00:34 UTC · model grok-4.3
The pith
Language models write executable robot policies by composing code from a few example commands.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When provided as input several example language commands formatted as comments followed by corresponding policy code via few-shot prompting, LLMs can take in new commands and autonomously re-compose API calls to generate new policy code that exhibits spatial-geometric reasoning, generalizes to new instructions, and prescribes precise values to ambiguous descriptions depending on context.
What carries the argument
Hierarchical code generation through recursive prompting, where the model defines undefined functions on the fly to build complex policies that process perception outputs and parameterize control primitives.
If this is right
- Policies gain spatial-geometric reasoning by chaining classic logic and referencing libraries such as NumPy and Shapely.
- Generated policies generalize to new instructions without additional training or fine-tuning.
- Vague language like 'faster' is turned into concrete parameter values using behavioral commonsense encoded in the model.
- The same prompting approach raises state-of-the-art performance on the HumanEval code benchmark to 39.8 percent.
- The formulation supports both reactive policies such as impedance controllers and waypoint-based policies such as pick-and-place.
Where Pith is reading between the lines
- This method could allow rapid adaptation of robot behavior across different hardware by swapping only the low-level API definitions while keeping the high-level prompt structure fixed.
- Safety-critical applications would likely require an added runtime monitor layer because the paper's core claim assumes flawless first-try execution.
- Extending the recursive function definition pattern to multi-robot coordination or long-horizon tasks remains an open direction not tested in the current experiments.
Load-bearing premise
The code produced by the language model will execute correctly and safely on physical robots for novel commands without runtime errors or the need for extra verification.
What would settle it
Running the model on a new instruction such as 'move the mug faster toward the target while avoiding the obstacle' and observing that the generated code either crashes, produces unsafe velocities, or fails to complete the motion on the robot.
read the original abstract
Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes 'Code as Policies,' a framework in which LLMs trained on code completion are repurposed via few-shot prompting to synthesize executable Python robot policies from natural-language commands. Examples consist of language instructions formatted as comments paired with corresponding policy code that calls perception APIs, control primitives, and third-party libraries (NumPy, Shapely) for geometric reasoning and arithmetic. Hierarchical prompting (recursively defining undefined functions) is introduced to generate more complex policies. The approach is claimed to produce policies exhibiting spatial reasoning, generalization to novel instructions, and context-dependent parameter assignment, with demonstrations on multiple real robot platforms and an improvement to 39.8% on the HumanEval benchmark.
Significance. If the empirical claims are substantiated with quantitative robot-task metrics, the work would be significant for bridging LLMs and robotics by offering an interpretable, code-based mechanism for policy generation that supports generalization and commonsense without task-specific fine-tuning. The hierarchical code-generation technique also contributes to LLM program synthesis.
major comments (2)
- [Experimental Evaluation] The central claim that few-shot LLM-generated policies execute correctly and generalize on physical robots for novel commands is load-bearing yet supported only by qualitative success cases and videos. No success rates, trial counts, failure-mode analysis, or ablation studies over a held-out set of novel commands are reported in the experimental evaluation, leaving open the possibility that observed behaviors reflect prompt curation rather than reliable autonomous synthesis.
- [Real-Robot Demonstrations] The manuscript asserts that generated policies 'prescribe precise values to ambiguous descriptions' and execute safely on hardware, but provides no runtime verification, error-handling analysis, or discussion of failure modes (e.g., API misuse, unsafe velocities) that would be required to substantiate deployment claims.
minor comments (2)
- [Abstract] The abstract states an improvement 'to 39.8%' on HumanEval without clarifying the prior state-of-the-art baseline or the exact prompting setup used for that number.
- [Approach] Notation for policy code structure (e.g., how perception outputs are typed and passed to control primitives) is introduced informally; a short pseudocode template or explicit API signature table would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below and have revised the manuscript to include quantitative metrics and expanded analysis of real-robot execution.
read point-by-point responses
-
Referee: [Experimental Evaluation] The central claim that few-shot LLM-generated policies execute correctly and generalize on physical robots for novel commands is load-bearing yet supported only by qualitative success cases and videos. No success rates, trial counts, failure-mode analysis, or ablation studies over a held-out set of novel commands are reported in the experimental evaluation, leaving open the possibility that observed behaviors reflect prompt curation rather than reliable autonomous synthesis.
Authors: We agree that quantitative evaluation is important for substantiating the central claims. In the revised manuscript we have added a dedicated subsection to the experimental evaluation reporting success rates, trial counts, and failure-mode analysis over a held-out set of novel commands. We also include ablation studies comparing prompting variants to address concerns about prompt curation. revision: yes
-
Referee: [Real-Robot Demonstrations] The manuscript asserts that generated policies 'prescribe precise values to ambiguous descriptions' and execute safely on hardware, but provides no runtime verification, error-handling analysis, or discussion of failure modes (e.g., API misuse, unsafe velocities) that would be required to substantiate deployment claims.
Authors: We acknowledge that the original manuscript provided limited discussion of these practical aspects. The revision adds an expanded analysis of runtime verification, error-handling mechanisms in the generated policies, and explicit discussion of failure modes including API misuse and unsafe velocities, supported by examples from the robot experiments. revision: yes
Circularity Check
No significant circularity; empirical demonstration of LLM code generation for policies
full rationale
The manuscript presents an empirical technique for repurposing code-trained LLMs via few-shot prompting to synthesize robot policies. No mathematical derivation chain, equations, or fitted parameters exist that reduce outputs to inputs by construction. Claims rest on curated demonstrations, hierarchical prompting, and an external benchmark result (HumanEval), with no load-bearing self-citations or self-definitional steps. The approach applies known prompting methods to a new domain without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models trained on code completion can synthesize simple Python programs from docstrings
Forward citations
Cited by 23 Pith papers
-
BOOKMARKS: Efficient Active Storyline Memory for Role-playing
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
-
Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts
Octopus Protocol enables one-shot hardware onboarding for AI agents by running a five-stage LLM-driven pipeline that probes devices, infers capabilities, generates an MCP server, and deploys it for closed-loop control.
-
Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion
Action Agent pairs LLM-driven video generation with a flow-constrained diffusion transformer to produce velocity commands, raising video success to 86% and delivering 64.7% real-world navigation on a Unitree G1 humanoid.
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...
-
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control
GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
-
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...
-
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
VLAs-as-Tools pairs a VLM planner with specialized VLA executors via a new interface and Tool-Aligned Post-Training to raise long-horizon robot success rates on LIBERO-Long and RoboTwin benchmarks.
-
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bim...
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.
-
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
-
Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
COIN provides 50 interactive robotic tasks, a 1000-demonstration dataset collected via AR teleoperation, and metrics showing that CodeAsPolicy, VLA, and H-VLA models fail at causally-dependent interactive reasoning du...
-
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
ProGAL-VLA uses 3D graphs, symbolic sub-goals, and a Grounding Alignment Contrastive loss to ground actions on verified embeddings, raising robustness from 30.3% to 71.5% and ambiguity AUROC to 0.81 on robotic benchmarks.
-
A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
A physical agentic loop with execution-state monitoring improves robustness of language-guided grasping over open-loop execution by converting noisy telemetry into discrete outcome events that trigger retries or user ...
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
-
PaLM-E: An Embodied Multimodal Language Model
PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive t...
-
ORICF -- Open Robotics Inference and Control Framework
ORICF is a declarative, model-agnostic robotics framework with YAML specs and edge offloading that reduces robot compute utilization by up to 83% and energy by 66% in a ROS2 demo combining ASR, LLM, and CNN.
-
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents
ValuePlanner is a hierarchical architecture that uses LLMs to generate value-based subgoals and PDDL planners to produce executable actions, enabling self-directed behavior in embodied agents.
-
Environmental Understanding Vision-Language Model for Embodied Agent
EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
Reference graph
Works this paper leans on
-
[1]
Evaluating Large Language Models Trained on Code
M. Chen, J. T worek, H. Jun, Q. Y uan, H. P . d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code, ”arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Mdetr-modulated detection for end-to-end multi-modal understanding,
A. Kamath, M. Singh, Y . LeCun, G. Synnaeve, I. Misra, and N. Carion, “Mdetr-modulated detection for end-to-end multi-modal understanding,” in ICCV, 2021
work page 2021
-
[3]
Open-vocabulary object detection via vision and language knowledge distillation,
X. Gu, T .-Y . Lin, W . Kuo, and Y . Cui, “Open-vocabulary object detection via vision and language knowledge distillation, ”arXiv:2104.13921, 2021
-
[4]
S. Tellex, N. Gopalan, H. Kress-Gazit, and C. Matuszek, “Robots that use language, ”Review of Control, Robotics, and Autonomous Systems, 2020
work page 2020
-
[5]
Procedures as a representation for data in a computer program for understanding natural language,
T . Winograd, “Procedures as a representation for data in a computer program for understanding natural language, ”MIT PROJECT MAC, 1971
work page 1971
-
[6]
J. Dzifcak, M. Scheutz, C. Baral, and P . Schermerhorn, “What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution,” inICRA, 2009
work page 2009
-
[7]
W eakly supervised learning of semantic parsers for mapping instructions to actions,
Y . Artzi and L. Zettlemoyer, “W eakly supervised learning of semantic parsers for mapping instructions to actions, ”TACL, 2013
work page 2013
-
[8]
Language conditioned imitation learning over unstructured data,
C. Lynch and P . Sermanet, “Language conditioned imitation learning over unstructured data, ”arXiv:2005.07648, 2020
-
[9]
Bc-z: Zero-shot task generalization with robotic imitation learning,
E. Jang, A. Irpan, M. Khansari, D. Kappler, F . Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning, ” inCoRL, 2022
work page 2022
-
[10]
O. Mees, L. Hermann, E. Rosete-Beas, and W . Burgard, “Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks, ”RA-L, 2022
work page 2022
-
[11]
PaLM: Scaling Language Modeling with Pathways
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P . Barham, H. W . Chung, C. Sutton, S. Gehrmannet al., “Palm: Scaling language modeling with pathways, ”arXiv:2204.02311, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[12]
Language models are few-shot learners,
T . Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners, ”NeurIPS, 2020
work page 2020
-
[13]
OPT: Open Pre-trained Transformer Language Models
S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V . Linet al., “Opt: Open pre-trained transformer language models, ”arXiv:2205.01068, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[14]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,
W . Huang, P . Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” arXiv:2201.07207, 2022
-
[15]
Large Language Models are Zero-Shot Reasoners
T . Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large language models are zero-shot reasoners, ”arXiv:2205.11916, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Socratic models: Composing zero-shot multimodal reasoning with language,
A. Zeng, A. W ong, S. W elker, K. Choromanski, F . T ombari, A. Purohit, M. Ryoo, V . Sindhwani, J. Lee, V . V anhoucke et al. , “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv:2204.00598, 2022
-
[17]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzoget al., “Do as i can, not as i say: Grounding language in robotic affordances, ”arXiv:2204.01691, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Inner Monologue: Embodied Reasoning through Planning with Language Models
W . Huang, F . Xia, T . Xiao, H. Chan, J. Liang, P . Florence, A. Zeng, J. T omp- son, I. Mordatch, Y . Chebotar, P . Sermanet, N. Brown, T . Jackson, L. Luu, S. Levine, K. Hausman, and B. Ichter, “Inner monologue: Embodied reason- ing through planning with language models, ” inarXiv:2207.05608, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
P . Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. W ahid, L. Downs, A. W ong, J. Lee, I. Mordatch, and J. T ompson, “Implicit behavioral cloning, ” in CoRL, 2022
work page 2022
-
[20]
Learning visual affordances for robotic manipulation,
A. Zeng, “Learning visual affordances for robotic manipulation,” Ph.D. dissertation, Princeton University, 2019
work page 2019
-
[21]
Scalable deep reinforcement learning for vision-based robotic manipulation,
D. Kalashnikov, A. Irpan, P . Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . V anhouckeet al., “Scalable deep reinforcement learning for vision-based robotic manipulation, ” inCoRL, 2018
work page 2018
-
[22]
Training language models to follow instructions with human feedback
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. W ainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Rayet al., “Training language models to follow instructions with human feedback, ”arXiv:2203.02155, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
Compositionality decomposed: How do neural networks generalise?
D. Hupkes, V . Dankers, M. Mul, and E. Bruni, “Compositionality decomposed: How do neural networks generalise?”JAIR, 2020
work page 2020
-
[24]
C. Breazeal, K. Dautenhahn, and T . Kanda, “Social robotics,”Springer handbook of robotics, 2016
work page 2016
-
[25]
T oward understanding natural language directions,
T . Kollar, S. Tellex, D. Roy, and N. Roy, “T oward understanding natural language directions, ” inHRI, 2010
work page 2010
-
[26]
A survey of reinforcement learning informed by natural language,
J. Luketina, N. Nardelli, G. Farquhar, J. N. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T . Rocktäschel, “ A survey of reinforcement learning informed by natural language, ” inIJCAI, 2019
work page 2019
-
[27]
W alk the talk: Connecting language, knowledge, and action in route instructions,
M. MacMahon, B. Stankiewicz, and B. Kuipers, “W alk the talk: Connecting language, knowledge, and action in route instructions, ”AAAI, 2006
work page 2006
-
[28]
Learning to interpret natural language commands through human-robot dialog,
J. Thomason, S. Zhang, R. J. Mooney, and P . Stone, “Learning to interpret natural language commands through human-robot dialog, ” inIJCAI, 2015
work page 2015
-
[29]
Understanding natural language commands for robotic navigation and mobile manipulation,
S. Tellex, T . Kollar, S. Dickerson, M. W alter, A. Banerjee, S. Teller, and N. Roy, “Understanding natural language commands for robotic navigation and mobile manipulation, ” inAAAI, 2011
work page 2011
-
[30]
Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,
D. Shah, B. Osinski, B. Ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” arXiv:2207.04429, 2022
-
[31]
Learning to parse natural language commands to a robot control system,
C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox, “Learning to parse natural language commands to a robot control system,” inExperimental robotics, 2013
work page 2013
-
[32]
Jointly improving parsing and perception for natural language commands through human-robot dialog,
J. Thomason, A. Padmakumar, J. Sinapov, N. W alker, Y . Jiang, H. Y edidsion, J. Hart, P . Stone, and R. Mooney, “Jointly improving parsing and perception for natural language commands through human-robot dialog, ”JAIR, 2020
work page 2020
-
[33]
Learning language-conditioned robot behavior from offline data and crowd-sourced annotation,
S. Nair, E. Mitchell, K. Chen, S. Savarese, C. Finn et al. , “Learning language-conditioned robot behavior from offline data and crowd-sourced annotation, ” inCoRL, 2022
work page 2022
-
[34]
J. Andreas, D. Klein, and S. Levine, “Learning with latent language,” arXiv:1711.00482, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[35]
Correcting robot plans with natural language feedback,
P . Sharma, B. Sundaralingam, V . Blukis, C. Paxton, T . Hermans, A. T orralba, J. Andreas, and D. Fox, “Correcting robot plans with natural language feedback, ”arXiv:2204.05186, 2022
-
[36]
Cliport: What and where pathways for robotic manipulation,
M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation, ” inCoRL, 2021
work page 2021
-
[37]
Language-conditioned imitation learning for robot manipulation tasks,
S. Stepputtis, J. Campbell, M. Phielipp, S. Lee, C. Baral, and H. Ben Amor, “Language-conditioned imitation learning for robot manipulation tasks,” NeurIPS, 2020
work page 2020
-
[38]
Language as an abstraction for hierarchical deep reinforcement learning,
Y . Jiang, S. S. Gu, K. P . Murphy, and C. Finn, “Language as an abstraction for hierarchical deep reinforcement learning, ”NeurIPS, 2019
work page 2019
-
[39]
Pixl2r: Guiding reinforcement learning using natural language by mapping pixels to rewards,
P . Goyal, S. Niekum, and R. J. Mooney, “Pixl2r: Guiding reinforcement learning using natural language by mapping pixels to rewards,” arXiv:2007.15543, 2020
-
[40]
Self-educated language agent with hindsight experience replay for instruction following,
G. Cideron, M. Seurin, F . Strub, and O. Pietquin, “Self-educated language agent with hindsight experience replay for instruction following, ”DeepMind, 2019
work page 2019
-
[41]
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
D. Misra, J. Langford, and Y . Artzi, “Mapping instructions and visual obser- vations to actions with reinforcement learning, ”arXiv:1704.08795, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[42]
Grounding language to autonomously-acquired skills via goal generation,
A. Akakzia, C. Colas, P .-Y . Oudeyer, M. Chetouani, and O. Sigaud, “Grounding language to autonomously-acquired skills via goal generation,” arXiv:2006.07185, 2020
-
[43]
I. Drori, S. Zhang, R. Shuttleworth, L. T ang, A. Lu, E. Ke, K. Liu, L. Chen, S. Tran, N. Chenget al., “ A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level, ”PNAS, 2022
work page 2022
-
[44]
Solving Quantitative Reasoning Problems with Language Models
A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V . Ra- masesh, A. Slone, C. Anil, I. Schlag, T . Gutman-Soloet al., “Solving quan- titative reasoning problems with language models, ”arXiv:2206.14858, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[45]
Training Verifiers to Solve Math Word Problems
K. Cobbe, V . Kosaraju, M. Bavarian, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems,” arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[46]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
D. Zhou, N. Schärli, L. Hou, J. W ei, N. Scales, X. W ang, D. Schuurmans, O. Bousquet, Q. Le, and E. Chi, “Least-to-most prompting enables complex reasoning in large language models, ”arXiv:2205.10625, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[47]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
J. W ei, X. W ang, D. Schuurmans, M. Bosma, B. Ichter, F . Xia, E. Chi, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models, ”arXiv:2201.11903, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
Program Synthesis with Large Language Models
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le et al., “Program synthesis with large language models, ”arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
K. Ellis, C. W ong, M. Nye, M. Sable-Meyer, L. Cary, L. Morales, L. Hewitt, A. Solar-Lezama, and J. B. T enenbaum, “Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning, ”arXiv:2006.08381, 2020
-
[50]
Learning abstract structure for drawing by efficient motor program induction,
L. Tian, K. Ellis, M. Kryven, and J. T enenbaum, “Learning abstract structure for drawing by efficient motor program induction, ”NeurIPS, 2020
work page 2020
-
[51]
Learning to synthesize programs as interpretable and generalizable policies,
D. Trivedi, J. Zhang, S.-H. Sun, and J. J. Lim, “Learning to synthesize programs as interpretable and generalizable policies, ”NeurIPS, 2021
work page 2021
-
[52]
Composing pick-and-place tasks by grounding language,
O. Mees and W . Burgard, “Composing pick-and-place tasks by grounding language, ” inISER, 2020
work page 2020
-
[53]
W . Liu, C. Paxton, T . Hermans, and D. Fox, “Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects,” in ICRA, 2022
work page 2022
-
[54]
Sornet: Spatial object-centric representations for sequential manipulation,
W . Y uan, C. Paxton, K. Desingh, and D. Fox, “Sornet: Spatial object-centric representations for sequential manipulation, ” inCoRL, 2022
work page 2022
-
[55]
A. Bucker, L. Figueredo, S. Haddadin, A. Kapoor, S. Ma, and R. Bonatti, “Reshaping robot trajectories using natural language commands: A study of multi-modal data alignment using transformers, ”arXiv:2203.13411, 2022
-
[56]
Learning perceptual concepts by bootstrapping from human queries,
A. Bobu, C. Paxton, W . Y ang, B. Sundaralingam, Y .-W . Chao, M. Cakmak, and D. Fox, “Learning perceptual concepts by bootstrapping from human queries, ”RA-L, 2022
work page 2022
-
[57]
Recursively summarizing books with human feedback,
J. Wu, L. Ouyang, D. M. Ziegler, N. Stiennon, R. Lowe, J. Leike, and P . Christiano, “Recursively summarizing books with human feedback,” arXiv:2109.10862, 2021
-
[58]
A systematic evaluation of large language models of code,
F . F . Xu, U. Alon, G. Neubig, and V . J. Hellendoorn, “ A systematic evaluation of large language models of code, ” inMAPS, 2022
work page 2022
-
[59]
Xirl: Cross-embodiment inverse reinforcement learning,
K. Zakka, A. Zeng, P . Florence, J. T ompson, J. Bohg, and D. Dwibedi, “Xirl: Cross-embodiment inverse reinforcement learning, ” inCoRL. PMLR, 2022
work page 2022
-
[60]
A. Ganapathi, P . Florence, J. V arley, K. Burns, K. Goldberg, and A. Zeng, “Implicit kinematic policies: Unifying joint and cartesian action spaces in end-to-end robot learning,”arXiv:2203.01983, 2022. APPENDIX A. Prompt Engineering Using LMPs to reliably complete tasks via code generation requires careful prompt engineering. While these prompts do not h...
-
[61]
ret_val = ’yellow block’ # the blocks
Language-based reasoning:Full prompt: objs = [’green block’, ’green bowl’, ’yellow block’, ’yellow bowl’] # the yellow block. ret_val = ’yellow block’ # the blocks. ret_val = [’green block’, ’yellow block’]
-
[62]
First-party: Full prompt: from utils import get_pos, put_first_on_second objs = [’gray block’, ’gray bowl’] # put the gray block on the gray bowl. put_first_on_second(’gray block’, ’gray bowl’) objs = [’purple block’, ’purple bowl’] # move the purple bowl toward the left. target_pos = get_pos(’purple bowl’) + [-0.3, 0] put_first_on_second(’purple bowl’, t...
-
[63]
Combining language reasoning, third-party, and first-party libraries.: Full prompt: import numpy as np from utils import get_pos, put_first_on_second objs = [’cyan block’, ’cyan bowl’, ’pink bowl’] # put the cyan block in cyan bowl. put_first_on_second(’cyan block’, ’cyan bowl’) objs = [’gray block’, ’silver block’, ’gray bowl’] # place the top most block...
-
[64]
LMPs can be composed.: Full prompt: import numpy as np from utils import get_pos, put_first_on_second, parse_obj objs = [’yellow block’, ’yellow bowl’, ’gray block’, ’gray bowl’] # move the sun colored block toward the left. block_name = parse_obj(’sun colored block’) target_pos = get_pos(block_name) + [-0.3, 0] put_first_on_second(block_name, target_pos)...
-
[65]
find the name of the block closest to the blue bowl,
parse_obj prompt.: Full prompt: import numpy as np from utils import get_pos objs = [’brown bowl’, ’green block’, ’brown block’, ’green bowl’] # the blocks. ret_val = [’brown block’, ’green block’] # the sky colored block. ret_val = ’blue block’ objs = [’orange block’, ’cyan block’, ’purple bowl’, ’gray bowl’] # the right most block. block_names = [’orang...
-
[66]
Example Questions: Here are four types of benchmark questions and their examples: • V ector operations with Numpy: pts = interpolate_pts_np(start, end, n) • Simple controls: u = pd_control(x_curr, x_goal, x_dot, Kp, Kv) • Manipulating shapes with shapely: circle = make_circle(radius, center) • Using first-party libraries: ret_val = obj_shape_does_not_cont...
-
[67]
Generalization Analysis: W e analyze how well code- generation performs across the fives types of generalizations described in [23], where generalization is evaluated by comparing the examples given in the prompt with the new instructions given in the benchmark. W e give a description of the five types of generalization applied to our benchmark. Specifica...
-
[68]
draw a 5cm hexagon around the middle
-
[69]
draw a line that bisects the hexagon
-
[70]
make them both bigger
-
[71]
erase the hexagon and the line
-
[72]
draw the sun as a circle at the top right
-
[73]
draw the ground as a line at the bottom
-
[74]
draw a pyramid as a triangle on the ground
-
[75]
draw a smaller pyramid a little bit to the left
-
[76]
draw circles around the blocks
-
[77]
draw a square around the sweeter fruit I. Real-W orld T abletop Manipulation In this domain, a UR5e robot is tasked to manipulate objects on a tabletop according to natural language instructions. The robot is equipped with a suction gripper, and it can only perform pick and place actions parameterized by 2D top-down pick and place positions. The robot is ...
-
[78]
Put the blocks in a horizontal line near the top
-
[79]
Move the sky-colored block in between the red block and the second block from the left
-
[80]
Why did you move the green block?
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.