SkillWrapper: Generative Predicate Invention for Task-level Planning

Ahmed Jaafar; Benned Hedegaard; David Paulius; George Konidaris; Haotian Fu; Naman Shah; Shreyas S. Raman; Skye Thompson; Stefanie Tellex; Yichen Wei

arxiv: 2511.18203 · v6 · submitted 2025-11-22 · 💻 cs.RO

SkillWrapper: Generative Predicate Invention for Task-level Planning

Ziyi Yang , Benned Hedegaard , Ahmed Jaafar , Yichen Wei , Skye Thompson , Shreyas S. Raman , Haotian Fu , Stefanie Tellex

show 3 more authors

George Konidaris David Paulius Naman Shah

This is my paper

Pith reviewed 2026-05-17 05:39 UTC · model grok-4.3

classification 💻 cs.RO

keywords generative predicate inventionskill abstractionsymbolic operatorsrobot task planningfoundation modelsRGB observationsblack-box skillslong-horizon tasks

0 comments

The pith

A formal theory of generative predicate invention produces symbolic operators for provably sound and complete robot task planning from RGB images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a formal theory of generative predicate invention that turns foundation-model outputs into symbolic operators supporting sound and complete planning over black-box skills. This matters because it lets agents reason at a high level while executing low-level actions without needing access to internal skill states or hand-designed abstractions. SkillWrapper puts the theory into practice by directing foundation models to collect interaction data and learn human-interpretable representations solely from RGB observations. If the approach holds, robots can solve previously unseen long-horizon tasks by composing learned operators into plans that remain valid when executed in the real world.

Core claim

The authors present a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. SkillWrapper implements the theory by using foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills from RGB image observations alone, with empirical validation in simulation and on physical robots for long-horizon tasks.

What carries the argument

The formal theory of generative predicate invention, which defines the conditions under which generated predicates yield symbolic operators that preserve soundness and completeness for domain-independent planning.

If this is right

The resulting symbolic operators integrate directly with standard domain-independent planners for high-level task reasoning.
Representations learned in simulation or from collected data enable solving long-horizon tasks that were not encountered during training.
Planning proceeds using only RGB images even when the underlying skills remain black boxes with no exposed state.
The same learned abstractions support both simulated training and direct real-robot deployment without additional engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the formal properties transfer reliably, the method could reduce reliance on manually engineered predicates across many robot domains.
Active data collection guided by the theory might be adapted to handle partial observability or sensor noise in more complex settings.
The predicate invention process could be tested for compatibility with other high-level planners or combined with learned low-level controllers.

Load-bearing premise

The predicates generated by the foundation model must satisfy the formal completeness and soundness conditions required by the theory, and these properties must transfer when the black-box skills run on real robots from image inputs.

What would settle it

A concrete counterexample in which a plan produced by the learned operators cannot reach the goal despite each individual skill executing correctly on the robot would falsify the claim that the operators are sound and complete.

Figures

Figures reproduced from arXiv: 2511.18203 by Ahmed Jaafar, Benned Hedegaard, David Paulius, George Konidaris, Haotian Fu, Naman Shah, Shreyas S. Raman, Skye Thompson, Stefanie Tellex, Yichen Wei, Ziyi Yang.

**Figure 1.** Figure 1: Overview of SkillWrapper. For an agent equipped with black-box skills, SkillWrapper learns skill representations that are compatible with off-the-shelf planners. These representations are comprised of predicates invented by the foundation model. Given a novel planning problem described using the initial state and goal state as RGB images, a foundation model produces the corresponding abstract states by a… view at source ↗

**Figure 2.** Figure 2: Example of Predicate Invention. The initial states of two transitions are both said to satisfy the preconditions of certain operators learned from the same skill, while transition 1 is successful, but transition 2 is not. In this case, the first condition (precondition) is triggered, and the foundation model is prompted with both transitions to invent a new predicate. Empirical predicate selection. Althoug… view at source ↗

**Figure 3.** Figure 3: Robotouille environment. We first conduct experiments in Robotouille (Gonzalez-Pumariega et al., 2025), which is a simulated grid world kitchen domain with an agent that has five high-level skills: Pick, Place, Cut, Cook, and Stack. In the environment, there are several objects: a patty, lettuce, a top bun, and a bottom bun; there is also a cutting board and a stove for cutting the lettuce and cooking the … view at source ↗

**Figure 4.** Figure 4: Initial and Goal States for Real Robot Experiments. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Sequence of Bimanual Robot Skill Execution with Predicate Value Changes [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Bimanual Kuka Scenario Results over 5 iterations with invented predicate and learned [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Example task in Robotouille. (a) Initial state (b) Goal state [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Example task in Franka. (a) Initial state (b) Goal state [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Example task in Bimanual Kuka. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 17.** Figure 17: Predicate Invention Case #1 in Franka. Target predicate: GripperEmpty( Existing predicates: ∅ (a) ✓Stack(Bowl, Plate) (b) ×Stack(Bowl, Plate) GPT-5 ✓ plate top empty(?plate) ✓ plate is clean(?plate) ✓ plate is clean(?plate) Qwen3 ✗ stacked on (?pickupable, ?plate) ✗ on center of (?pickupable, ?plate) ✗ is fully supported (?pickupable, ?plate) [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗

**Figure 18.** Figure 18: Predicate Invention Case #2 in Franka. Target predicate: PlateIsDirty(? plate) Existing predicates: GripperEmpty(), Holding(? pickupable) (a) ✓Scoop(Knife, Jar) (b) ✗Scoop(Knife, Jar) GPT-5 ✓ Open(?openable) ✓ Open(?openable) ✓ Open(?openable) Qwen3 ✗UtensilInOpenable (?utensil, ?openable) ✗UtensilInOpening (?utensil, ?openable) ✗UtensilInOpenable (?utensil, ?openable) [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 19.** Figure 19: Predicate Invention Case #1 in Bi-Kuka. Target predicate: LidOff(? openable) Existing predicates:InLeftGripper(? openable), InRightGripper(? utensil) (a) ✓Open(Jar) (b) ×Open(Jar) GPT-5 ✓RightHandEmpty() ✓RightHandEmpty() ✗ LidAttached(?openable) Qwen3 ✗ FullyEnclosedByLeftGripper (?openable) ✗ FullyEnclosedByLeftGripper (?openable) ✗ FullyEnclosedByLeftGripper (?openable) [PITH_FULL_IMAGE:figures/full_f… view at source ↗

**Figure 20.** Figure 20: Predicate Invention Case #2 in Bi-Kuka. Target predicate: RightGripperEmpty() Existing predicates:InLeftGripper(? openable), LidOff(? openable) 35 [PITH_FULL_IMAGE:figures/full_fig_p035_20.png] view at source ↗

read the original abstract

Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a formal theory for generative predicate invention plus SkillWrapper to turn foundation-model outputs into plannable operators, but the soundness guarantees rest on predicates meeting conditions that the learning step does not enforce.

read the letter

The main takeaway is that this work supplies a formal theory specifying the properties predicates must have for sound and complete planning over skill abstractions, then builds SkillWrapper to generate those predicates from RGB images via foundation models and active robot data collection. That combination is the concrete step forward. Prior predicate-learning methods often stayed heuristic; here the authors try to state the exact conditions needed for planning guarantees and tie the learning procedure to them. The empirical side shows the approach handling unseen long-horizon tasks in simulation and on real robots with black-box skills, which is the practical payoff they emphasize. The active collection loop is a sensible way to gather the right data without hand-engineering predicates. Those pieces are worth crediting. The soft spot is exactly where the stress-test note flags it. The theory is conditional on predicates that correctly classify states, preserve transition semantics, and cover the relevant space. Foundation models are approximate and stochastic, and nothing in the described method adds verification or correction steps to ensure the outputs meet those conditions. Limited trajectories, whether simulated or collected, cannot certify behavior across the full state space or under real-robot shifts. If any predicate violates the assumptions, the soundness and completeness claims no longer hold. The abstract presents results at a high level without error bars, explicit baselines, or exclusion criteria, so it is difficult to gauge how robust the support actually is. This paper is aimed at robotics researchers working on hybrid symbolic-learning pipelines for long-horizon tasks. Readers already thinking about predicate abstraction or neuro-symbolic planning will find the framework and the real-robot experiments useful to build on or critique. It shows clear engagement with the planning literature and a reproducible direction, so it deserves a serious referee to examine the formal derivations and the experimental controls in detail. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces a formal theory of generative predicate invention for skill abstraction, which produces symbolic operators suitable for provably sound and complete planning. SkillWrapper is proposed as a practical method that employs foundation models to actively gather robot data from RGB observations and learn interpretable, plannable representations of black-box skills. Extensive experiments in simulation and on physical robots demonstrate the approach's ability to solve previously unseen long-horizon tasks.

Significance. Should the generated predicates reliably satisfy the formal conditions and the learned representations transfer effectively to real-world execution, this contribution would be significant. It bridges data-driven foundation models with symbolic AI planning, offering a pathway to guaranteed performance in complex robotic tasks without requiring full state observability or hand-crafted abstractions.

major comments (2)

[§3] The formal theory claims to yield provably sound and complete planning from predicates that meet specific conditions (e.g., accurate state classification and preservation of transition semantics). However, the generative process in SkillWrapper, which relies on foundation models trained on limited trajectories, provides no enforcement or verification mechanism to ensure these conditions are met, particularly regarding completeness over the full state space or under real-robot distribution shifts.
[§5] The empirical evaluation summarizes results at a high level without error bars, detailed baselines, or explicit exclusion criteria for successful task executions. This limits the ability to verify whether the performance gains support the central claim of enabling reliable planning for unseen tasks with black-box skills.

minor comments (2)

[Abstract] The abstract mentions 'extensive empirical evaluation' but provides no quantitative details; consider adding key metrics or success rates to better convey the strength of the results.
[Notation] Some notation for the invented predicates and operators could be clarified earlier in the paper to aid readers unfamiliar with the formal framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, with revisions indicated where appropriate to improve clarity and rigor.

read point-by-point responses

Referee: [§3] The formal theory claims to yield provably sound and complete planning from predicates that meet specific conditions (e.g., accurate state classification and preservation of transition semantics). However, the generative process in SkillWrapper, which relies on foundation models trained on limited trajectories, provides no enforcement or verification mechanism to ensure these conditions are met, particularly regarding completeness over the full state space or under real-robot distribution shifts.

Authors: We appreciate the referee's emphasis on the distinction between the formal theory and its practical realization. Section 3 presents sufficient conditions on predicates that guarantee sound and complete planning when those conditions hold; the theory itself is agnostic to the method of predicate generation. SkillWrapper is a practical, data-driven procedure that uses foundation models to propose predicates from limited RGB trajectories. We do not claim a formal enforcement or verification procedure, as exhaustive verification of completeness over the full (potentially continuous) state space is intractable and would be further complicated by distribution shifts on real robots. Instead, we rely on empirical validation across simulation and physical experiments showing successful planning on unseen long-horizon tasks. In the revised manuscript we will add a new subsection in §3 that explicitly discusses the gap between the theoretical conditions and the learned predicates, including potential failure modes under distribution shift and the role of empirical evidence in supporting the claims. revision: partial
Referee: [§5] The empirical evaluation summarizes results at a high level without error bars, detailed baselines, or explicit exclusion criteria for successful task executions. This limits the ability to verify whether the performance gains support the central claim of enabling reliable planning for unseen tasks with black-box skills.

Authors: We agree that the current empirical presentation would benefit from greater detail and transparency. In the revised version we will augment all tables and figures with error bars (standard deviation across repeated trials), expand the description of baselines and ablations with explicit implementation details, and add a dedicated paragraph specifying the success criteria and any exclusion rules used for task executions. These additions will make the performance gains more verifiable and directly support the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; formal theory and method are independent

full rationale

The paper introduces a formal theory of generative predicate invention that yields symbolic operators for provably sound and complete planning, conditional on predicates satisfying stated properties such as accurate state classification and transition preservation. SkillWrapper then uses foundation models and active data collection from RGB observations to produce those predicates. No equations, self-referential definitions, or reductions appear that make the planning guarantees equivalent to fitted parameters or prior self-citations by construction. The derivation relies on external foundation models and robot data, keeping the central claims self-contained rather than circular. This matches the default expectation for papers without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger reflects high-level claims rather than explicit equations or sections; the formal theory is presumed to introduce assumptions about predicate properties that are not detailed here.

axioms (1)

domain assumption Generated predicates satisfy the formal properties needed for sound and complete planning
Invoked as the basis for the provable guarantees stated in the abstract.

invented entities (1)

Generative predicates invented by foundation models no independent evidence
purpose: To produce human-interpretable symbolic abstractions of black-box skills from RGB observations
New postulated mechanism that converts sensory data into plannable operators; no independent falsifiable handle is described in the abstract.

pith-pipeline@v0.9.0 · 5561 in / 1208 out tokens · 34918 ms · 2026-05-17T05:39:17.859665+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
cs.AI 2026-05 unverdicted novelty 6.0

BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances . In Proceedings of the 6th Conference on Robot Learning (CoRL), pp.\ 287--318, 14--18 Dec 2022

work page 2022
[3]

Auto RT : Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montserrat Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, et al. Auto RT : Embodied Foundation Models for Large Scale Orchestration of Robotic Agents . In First Workshop on Vision-Language Models for Navigation and Manipulation (VLMNM) at ICRA 2024, 2024

work page 2024
[4]

A Review of Learning Planning Action Models

Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. A Review of Learning Planning Action Models . The Knowledge Engineering Review, 33: 0 e20, 2018

work page 2018
[5]

Predicate Invention from Pixels via Pretrained Vision-Language Models

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Tom \'a s Lozano-P \'e rez, and Leslie Pack Kaelbling. Predicate Invention from Pixels via Pretrained Vision-Language Models . In AAAI 2025 Workshop on Language Models for Planning (LM4Plan), 2025

work page 2025
[6]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control . In Proceedings of the 7th Conference on Robot Learning, pp.\ 2165--2183, 06--09 Nov 2023

work page 2023
[8]

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 14455--14465, 2024

work page 2024
[9]

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

William Chen, Oier Mees, Aviral Kumar, and Sergey Levine. Vision-Language Models Provide Promptable Representations for Reinforcement Learning . Transactions on Machine Learning Research (TMLR), 2025. ISSN 2835-8856

work page 2025
[10]

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, and Yang Liu. EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models . In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 14291--14302, 2024

work page 2024
[11]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Cheng Chi, Zhenjia Xu, Chuer Pan, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, Russ Tedrake, and Shuran Song. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024
[12]

An incremental constraint-based framework for task and motion planning

Neil T Dantam, Zachary K Kingston, Swarat Chaudhuri, and Lydia E Kavraki. An incremental constraint-based framework for task and motion planning. The International Journal of Robotics Research, 37 0 (10): 0 1134--1151, 2018

work page 2018
[13]

Doncieux, D

S. Doncieux, D. Filliat, N. D \' az-Rodr \' guez, T. Hospedales, R. Duro, A. Coninx, D.M. Roijers, B. Girard, N. Perrin, and O. Sigaud. Open-ended learning: a conceptual framework based on representational redescription. Frontiers in Neurorobotics, 12: 0 59, 2018

work page 2018
[14]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An Embodied ...

work page 2023
[15]

Adaptive Procedural Task Generation for Hard-Exploration Problems

Kuan Fang, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. Adaptive Procedural Task Generation for Hard-Exploration Problems . In Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021

work page 2021
[16]

Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

Kuan Fang, Toki Migimatsu, Ajay Mandlekar, Li Fei-Fei, and Jeannette Bohg. Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks . Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 1--8, 2022

work page 2023
[17]

MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

Kuan Fang, Fangchen Liu, Pieter Abbeel, and Sergey Levine. MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting . Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024
[18]

Integrated Task and Motion Planning

Caelan Reed Garrett, Rohan Chitnis, Rachel Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tom \'a s Lozano-P \'e rez. Integrated Task and Motion Planning . Annual Review of Control, Robotics, and Autonomous Systems, 4: 0 265--293, 2021

work page 2021
[19]

Robotouille: An Asynchronous Planning Benchmark for LLM Agents

Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, and Sanjiban Choudhury. Robotouille: An Asynchronous Planning Benchmark for LLM Agents . In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

work page 2025
[20]

Multi-skill Mobile Manipulation for Object Rearrangement

Jiayuan Gu, Devendra Singh Chaplot, Hao Su, and Jitendra Malik. Multi-skill Mobile Manipulation for Object Rearrangement . In Proceedings of the 11th International Conference on Learning Representations (ICML), 2022

work page 2022
[21]

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha, Pete Florence, and Shuran Song. Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition . In Proceedings of the 7th Conference on Robot Learning (CoRL), pp.\ 3766--3777, 2023

work page 2023
[22]

InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024
[23]

3D-LLM: Injecting the 3D World into Large Language Models

Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, and Chuang Gan. 3D-LLM: Injecting the 3D World into Large Language Models . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp.\ 20482--20494, 2023

work page 2023
[24]

Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning . arXiv preprint arXiv:2311.17842, 2023

work page arXiv 2023
[25]

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents . In Proceedings of the 39th International Conference on Machine Learning (ICML), pp.\ 9118--9147, 2022

work page 2022
[26]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner Monologue: Embodied Reasoning through Planning with Language Models . In Proceedings of the 6th Conference on Ro...

work page 2023
[27]

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, and Yunzhu Li. RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation . In Proceedings of the 8th Conference on Robot Learning, pp.\ 3027--3052, 2025

work page 2025
[28]

Prioritized Level Replay

Minqi Jiang, Edward Grefenstette, and Tim Rockt \"a schel. Prioritized Level Replay . In Proceedings of the 38th International Conference on Machine Learning (ICML), pp.\ 4940--4950. PMLR, 2021

work page 2021
[29]

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 2901--2910, 2017

work page 2017
[30]

Le, and Roni Stern

Brendan Juba, Hai S. Le, and Roni Stern. Safe Learning of Lifted Action Models . In Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning (KR) , pp.\ 379--389, 11 2021

work page 2021
[31]

Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Paul Saldyt, and Anil B Murthy. Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks . In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024
[32]

K* and partial order reduction for top-quality planning

Michael Katz and Junkyu Lee. K* and partial order reduction for top-quality planning. In Proceedings of the 16th Annual Symposium on Combinatorial Search (SoCS 2023). AAAI Press, 2023

work page 2023
[33]

On the Necessity of Abstraction

George Konidaris. On the Necessity of Abstraction . Current Opinion in Behavioral Sciences, 29: 0 1--7, 2019. ISSN 2352-1546

work page 2019
[34]

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

George Konidaris and Andrew Barto. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining . In Advances in Neural Information Processing Systems (NIPS), volume 22, 2009

work page 2009
[35]

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

George Konidaris, Leslie Pack Kaelbling, and Tomas Lozano-Pérez. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning . Journal of Artificial Intelligence Research, 61: 0 215--289, 2018

work page 2018
[36]

Planning for Learning Object Properties

Leonardo Lamanna, Luciano Serafini, Mohamadreza Faridghasemnia, Alessandro Saffiotti, Alessandro Saetti, Alfonso Gerevini, and Paolo Traverso. Planning for Learning Object Properties . Proceedings of the AAAI Conference on Artificial Intelligence, 37 0 (10): 0 12005--12013, Jun. 2023

work page 2023
[37]

Embodied Active Learning of Relational State Abstractions for Bilevel Planning

Amber Li and Tom Silver. Embodied Active Learning of Relational State Abstractions for Bilevel Planning . In Proceedings of The 2nd Conference on Lifelong Learning Agents (CoLLAs), pp.\ 358--375, 2023

work page 2023
[38]

LEAGUE++: Empowering Continual Robot Learning via Guided Skill Acquisition with Large Language Models

Zhaoyi Li, Kelin Yu, Shuo Cheng, and Danfei Xu. LEAGUE++: Empowering Continual Robot Learning via Guided Skill Acquisition with Large Language Models . In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024

work page 2024
[39]

Tenenbaum, Tom Silver, Joao F

Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, Joao F. Henriques, and Kevin Ellis. VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning . In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

work page 2025
[40]

OpenEQA: Embodied Question Answering in the Era of Foundation Models

Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul Mcvay, Oleksandr Maksymets, Sergio Arnaud, Karmesh Yadav, Qiyang Li, Ben Newman, Mohit Sharma, Vincent Berges, Shiqi Zhang, Pulkit Agrawal, Yonatan Bisk, Dhruv Batra, Mrinal Kalakrishnan, Franziska Meier, Chris Paxton, Alexander Sax, and Aravind ...

work page 2024
[41]

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B Tenenbaum, and Jiajun Wu. The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision . In Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019

work page 2019
[42]

McDermott, M

D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld, and D. Wilkins. PDDL -- The Planning Domain Definition Language . Technical report, CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, 1998

work page 1998
[43]

Grounding Predicates through Actions

Toki Migimatsu and Jeannette Bohg. Grounding Predicates through Actions . In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), pp.\ 3498--3504, 2022

work page 2022
[44]

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2024

work page 2024
[45]

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, and Brian Ichter. PIVOT: Iterative Visual Prompting Elicits Ac...

work page 2024
[46]

Introducing GPT-5 , 2025

OpenAI. Introducing GPT-5 , 2025. URL https://openai.com/index/introducing-gpt-5/. Accessed:

work page 2025
[47]

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Shreyas Sundara Raman, Vanya Cohen, Ifrah Idrees, Eric Rosen, Ray Mooney, Stefanie Tellex, and David Paulius. CAPE: Corrective Actions from Precondition Errors using Large Language Models . In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 14070--14077, 2024

work page 2024
[48]

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, and Niko Suenderhauf. SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning . In Proceedings of the 7th Conference on Robot Learning (CoRL), volume 229, pp.\ 23--72, 06--09 Nov 2023

work page 2023
[49]

Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, and Dorsa Sadigh

Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, and Dorsa Sadigh. Explore until Confident: Efficient Exploration for Embodied Question Answering . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024
[50]

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Pierre Sermanet, Tianli Ding, Jeffrey Zhao, Fei Xia, Debidatta Dwibedi, Keerthana Gopalakrishnan, Christine Chan, Gabriel Dulac-Arnold, Sharath Maddineni, Nikhil J Joshi, et al. RoboVQA: Multimodal Long-Horizon Reasoning for Robotics . In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 645--652. IEEE, 2024

work page 2024
[51]

Anytime Integrated Task and Motion Policies for Stochastic Environments

Naman Shah, Deepak Kala Vasudevan, Kislay Kumar, Pranav Kamojjhala, and Siddharth Srivastava. Anytime Integrated Task and Motion Policies for Stochastic Environments . In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9285--9291. IEEE, 2020

work page 2020
[52]

From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning,

Naman Shah, Jayesh Nagpal, Pulkit Verma, and Siddharth Srivastava. From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions and Models for Planning from Raw Data . arXiv preprint arXiv:2402.11871, 2024

work page arXiv 2024
[53]

C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27 0 (3): 0 379--423, 1948

work page 1948
[54]

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation . In Proceedings of the 6th Conference on Robot Learning, volume 205, pp.\ 785--799, 14--18 Dec 2023

work page 2023
[55]

Tenenbaum

Tom Silver, Rohan Chitnis, Nishanth Kumar, Willie McClinton, Tomás Lozano-Pérez, Leslie Kaelbling, and Joshua B. Tenenbaum. Predicate Invention for Bilevel Planning . Proceedings of the AAAI Conference on Artificial Intelligence, 37 0 (10): 0 12120--12129, Jun. 2023

work page 2023
[56]

Distilling Internet-Scale Vision-Language Models into Embodied Agents

Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, and Ishita Dasgupta. Distilling Internet-Scale Vision-Language Models into Embodied Agents . In Proceedings of the Fortieth International Conference on Machine Learning (ICML), pp.\ 32797--32818, 2023

work page 2023
[57]

ViperGPT: Visual Inference via Python Execution for Reasoning

D \' dac Sur \' s, Sachit Menon, and Carl Vondrick. ViperGPT: Visual Inference via Python Execution for Reasoning . In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 11888--11898, October 2023

work page 2023
[58]

Sutton, Doina Precup, and Satinder Singh

Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning . Artificial Intelligence, 112 0 (1): 0 181--211, 1999

work page 1999
[59]

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladim\' r Vondru s , Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Traini...

work page 2021
[60]

On the Planning Abilities of Large Language Models - A Critical Investigation

Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the Planning Abilities of Large Language Models - A Critical Investigation . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp.\ 75993--76005, 2023

work page 2023
[61]

Discovering User-Interpretable Capabilities of Black-Box Planning Agents

Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava. Discovering User-Interpretable Capabilities of Black-Box Planning Agents . In Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning (KR), volume 19, pp.\ 362--372, 2022

work page 2022
[62]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An Open-Ended Embodied Agent with Large Language Models . Transactions on Machine Learning Research (TMLR), 2024 a . ISSN 2835-8856

work page 2024
[63]

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions . arXiv preprint arXiv:1901.01753, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[64]

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, and Zackory Erickson. RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback . In Proceedings of the 41st International Conference on Machine Learning, volume 235, pp.\ 51484--51501, 21--27 Jul 2024 b

work page 2024
[65]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation . In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024 c

work page 2024
[66]

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects . In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 17868--17879, 2024

work page 2024
[67]

Neuro-Symbolic Learning of Lifted Action Models from Visual Traces

Kai Xi, Stephen Gould, and Sylvie Thiébaux. Neuro-Symbolic Learning of Lifted Action Models from Visual Traces . Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 34 0 (1): 0 653--662, May 2024

work page 2024
[68]

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Haoran Tan, Chencheng Jiang, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, et al. Octopus: Embodied Vision-Language Programmer from Environmental Feedback . In Proceedings of the 2024 European Conference on Computer Vision (ECCV), pp.\ 20--38, 2024

work page 2024
[69]

ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation

Naoki Yokoyama, Alex Clegg, Joanne Truong, Eric Undersander, Tsung-Yen Yang, Sergio Arnaud, Sehoon Ha, Dhruv Batra, and Akshara Rai. ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation . IEEE Robotics and Automation Letters, 9 0 (1): 0 779--786, 2024

work page 2024
[70]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[71]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[72]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances . In Proceedings of the 6th Conference on Robot Learning (CoRL), pp.\ 287--318, 14--18 Dec 2022

work page 2022

[3] [3]

Auto RT : Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montserrat Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, et al. Auto RT : Embodied Foundation Models for Large Scale Orchestration of Robotic Agents . In First Workshop on Vision-Language Models for Navigation and Manipulation (VLMNM) at ICRA 2024, 2024

work page 2024

[4] [4]

A Review of Learning Planning Action Models

Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. A Review of Learning Planning Action Models . The Knowledge Engineering Review, 33: 0 e20, 2018

work page 2018

[5] [5]

Predicate Invention from Pixels via Pretrained Vision-Language Models

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Tom \'a s Lozano-P \'e rez, and Leslie Pack Kaelbling. Predicate Invention from Pixels via Pretrained Vision-Language Models . In AAAI 2025 Workshop on Language Models for Planning (LM4Plan), 2025

work page 2025

[6] [6]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control . In Proceedings of the 7th Conference on Robot Learning, pp.\ 2165--2183, 06--09 Nov 2023

work page 2023

[8] [8]

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 14455--14465, 2024

work page 2024

[9] [9]

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

William Chen, Oier Mees, Aviral Kumar, and Sergey Levine. Vision-Language Models Provide Promptable Representations for Reinforcement Learning . Transactions on Machine Learning Research (TMLR), 2025. ISSN 2835-8856

work page 2025

[10] [10]

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, and Yang Liu. EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models . In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 14291--14302, 2024

work page 2024

[11] [11]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Cheng Chi, Zhenjia Xu, Chuer Pan, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, Russ Tedrake, and Shuran Song. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024

[12] [12]

An incremental constraint-based framework for task and motion planning

Neil T Dantam, Zachary K Kingston, Swarat Chaudhuri, and Lydia E Kavraki. An incremental constraint-based framework for task and motion planning. The International Journal of Robotics Research, 37 0 (10): 0 1134--1151, 2018

work page 2018

[13] [13]

Doncieux, D

S. Doncieux, D. Filliat, N. D \' az-Rodr \' guez, T. Hospedales, R. Duro, A. Coninx, D.M. Roijers, B. Girard, N. Perrin, and O. Sigaud. Open-ended learning: a conceptual framework based on representational redescription. Frontiers in Neurorobotics, 12: 0 59, 2018

work page 2018

[14] [14]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An Embodied ...

work page 2023

[15] [15]

Adaptive Procedural Task Generation for Hard-Exploration Problems

Kuan Fang, Yuke Zhu, Silvio Savarese, and Li Fei-Fei. Adaptive Procedural Task Generation for Hard-Exploration Problems . In Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021

work page 2021

[16] [16]

Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

Kuan Fang, Toki Migimatsu, Ajay Mandlekar, Li Fei-Fei, and Jeannette Bohg. Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks . Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 1--8, 2022

work page 2023

[17] [17]

MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

Kuan Fang, Fangchen Liu, Pieter Abbeel, and Sergey Levine. MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting . Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024

[18] [18]

Integrated Task and Motion Planning

Caelan Reed Garrett, Rohan Chitnis, Rachel Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tom \'a s Lozano-P \'e rez. Integrated Task and Motion Planning . Annual Review of Control, Robotics, and Autonomous Systems, 4: 0 265--293, 2021

work page 2021

[19] [19]

Robotouille: An Asynchronous Planning Benchmark for LLM Agents

Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, and Sanjiban Choudhury. Robotouille: An Asynchronous Planning Benchmark for LLM Agents . In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

work page 2025

[20] [20]

Multi-skill Mobile Manipulation for Object Rearrangement

Jiayuan Gu, Devendra Singh Chaplot, Hao Su, and Jitendra Malik. Multi-skill Mobile Manipulation for Object Rearrangement . In Proceedings of the 11th International Conference on Learning Representations (ICML), 2022

work page 2022

[21] [21]

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha, Pete Florence, and Shuran Song. Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition . In Proceedings of the 7th Conference on Robot Learning (CoRL), pp.\ 3766--3777, 2023

work page 2023

[22] [22]

InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024

[23] [23]

3D-LLM: Injecting the 3D World into Large Language Models

Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, and Chuang Gan. 3D-LLM: Injecting the 3D World into Large Language Models . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp.\ 20482--20494, 2023

work page 2023

[24] [24]

Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning . arXiv preprint arXiv:2311.17842, 2023

work page arXiv 2023

[25] [25]

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents . In Proceedings of the 39th International Conference on Machine Learning (ICML), pp.\ 9118--9147, 2022

work page 2022

[26] [26]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner Monologue: Embodied Reasoning through Planning with Language Models . In Proceedings of the 6th Conference on Ro...

work page 2023

[27] [27]

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, and Yunzhu Li. RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation . In Proceedings of the 8th Conference on Robot Learning, pp.\ 3027--3052, 2025

work page 2025

[28] [28]

Prioritized Level Replay

Minqi Jiang, Edward Grefenstette, and Tim Rockt \"a schel. Prioritized Level Replay . In Proceedings of the 38th International Conference on Machine Learning (ICML), pp.\ 4940--4950. PMLR, 2021

work page 2021

[29] [29]

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 2901--2910, 2017

work page 2017

[30] [30]

Le, and Roni Stern

Brendan Juba, Hai S. Le, and Roni Stern. Safe Learning of Lifted Action Models . In Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning (KR) , pp.\ 379--389, 11 2021

work page 2021

[31] [31]

Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Paul Saldyt, and Anil B Murthy. Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks . In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024

[32] [32]

K* and partial order reduction for top-quality planning

Michael Katz and Junkyu Lee. K* and partial order reduction for top-quality planning. In Proceedings of the 16th Annual Symposium on Combinatorial Search (SoCS 2023). AAAI Press, 2023

work page 2023

[33] [33]

On the Necessity of Abstraction

George Konidaris. On the Necessity of Abstraction . Current Opinion in Behavioral Sciences, 29: 0 1--7, 2019. ISSN 2352-1546

work page 2019

[34] [34]

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

George Konidaris and Andrew Barto. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining . In Advances in Neural Information Processing Systems (NIPS), volume 22, 2009

work page 2009

[35] [35]

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

George Konidaris, Leslie Pack Kaelbling, and Tomas Lozano-Pérez. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning . Journal of Artificial Intelligence Research, 61: 0 215--289, 2018

work page 2018

[36] [36]

Planning for Learning Object Properties

Leonardo Lamanna, Luciano Serafini, Mohamadreza Faridghasemnia, Alessandro Saffiotti, Alessandro Saetti, Alfonso Gerevini, and Paolo Traverso. Planning for Learning Object Properties . Proceedings of the AAAI Conference on Artificial Intelligence, 37 0 (10): 0 12005--12013, Jun. 2023

work page 2023

[37] [37]

Embodied Active Learning of Relational State Abstractions for Bilevel Planning

Amber Li and Tom Silver. Embodied Active Learning of Relational State Abstractions for Bilevel Planning . In Proceedings of The 2nd Conference on Lifelong Learning Agents (CoLLAs), pp.\ 358--375, 2023

work page 2023

[38] [38]

LEAGUE++: Empowering Continual Robot Learning via Guided Skill Acquisition with Large Language Models

Zhaoyi Li, Kelin Yu, Shuo Cheng, and Danfei Xu. LEAGUE++: Empowering Continual Robot Learning via Guided Skill Acquisition with Large Language Models . In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024

work page 2024

[39] [39]

Tenenbaum, Tom Silver, Joao F

Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, Joao F. Henriques, and Kevin Ellis. VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning . In Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

work page 2025

[40] [40]

OpenEQA: Embodied Question Answering in the Era of Foundation Models

Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul Mcvay, Oleksandr Maksymets, Sergio Arnaud, Karmesh Yadav, Qiyang Li, Ben Newman, Mohit Sharma, Vincent Berges, Shiqi Zhang, Pulkit Agrawal, Yonatan Bisk, Dhruv Batra, Mrinal Kalakrishnan, Franziska Meier, Chris Paxton, Alexander Sax, and Aravind ...

work page 2024

[41] [41]

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B Tenenbaum, and Jiajun Wu. The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision . In Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019

work page 2019

[42] [42]

McDermott, M

D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld, and D. Wilkins. PDDL -- The Planning Domain Definition Language . Technical report, CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, 1998

work page 1998

[43] [43]

Grounding Predicates through Actions

Toki Migimatsu and Jeannette Bohg. Grounding Predicates through Actions . In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), pp.\ 3498--3504, 2022

work page 2022

[44] [44]

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, and Ping Luo. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2024

work page 2024

[45] [45]

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, and Brian Ichter. PIVOT: Iterative Visual Prompting Elicits Ac...

work page 2024

[46] [46]

Introducing GPT-5 , 2025

OpenAI. Introducing GPT-5 , 2025. URL https://openai.com/index/introducing-gpt-5/. Accessed:

work page 2025

[47] [47]

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Shreyas Sundara Raman, Vanya Cohen, Ifrah Idrees, Eric Rosen, Ray Mooney, Stefanie Tellex, and David Paulius. CAPE: Corrective Actions from Precondition Errors using Large Language Models . In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 14070--14077, 2024

work page 2024

[48] [48]

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, and Niko Suenderhauf. SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning . In Proceedings of the 7th Conference on Robot Learning (CoRL), volume 229, pp.\ 23--72, 06--09 Nov 2023

work page 2023

[49] [49]

Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, and Dorsa Sadigh

Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, and Dorsa Sadigh. Explore until Confident: Efficient Exploration for Embodied Question Answering . In Proceedings of Robotics: Science and Systems (RSS) XX, 2024

work page 2024

[50] [50]

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Pierre Sermanet, Tianli Ding, Jeffrey Zhao, Fei Xia, Debidatta Dwibedi, Keerthana Gopalakrishnan, Christine Chan, Gabriel Dulac-Arnold, Sharath Maddineni, Nikhil J Joshi, et al. RoboVQA: Multimodal Long-Horizon Reasoning for Robotics . In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 645--652. IEEE, 2024

work page 2024

[51] [51]

Anytime Integrated Task and Motion Policies for Stochastic Environments

Naman Shah, Deepak Kala Vasudevan, Kislay Kumar, Pranav Kamojjhala, and Siddharth Srivastava. Anytime Integrated Task and Motion Policies for Stochastic Environments . In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 9285--9291. IEEE, 2020

work page 2020

[52] [52]

From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning,

Naman Shah, Jayesh Nagpal, Pulkit Verma, and Siddharth Srivastava. From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions and Models for Planning from Raw Data . arXiv preprint arXiv:2402.11871, 2024

work page arXiv 2024

[53] [53]

C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27 0 (3): 0 379--423, 1948

work page 1948

[54] [54]

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation . In Proceedings of the 6th Conference on Robot Learning, volume 205, pp.\ 785--799, 14--18 Dec 2023

work page 2023

[55] [55]

Tenenbaum

Tom Silver, Rohan Chitnis, Nishanth Kumar, Willie McClinton, Tomás Lozano-Pérez, Leslie Kaelbling, and Joshua B. Tenenbaum. Predicate Invention for Bilevel Planning . Proceedings of the AAAI Conference on Artificial Intelligence, 37 0 (10): 0 12120--12129, Jun. 2023

work page 2023

[56] [56]

Distilling Internet-Scale Vision-Language Models into Embodied Agents

Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, and Ishita Dasgupta. Distilling Internet-Scale Vision-Language Models into Embodied Agents . In Proceedings of the Fortieth International Conference on Machine Learning (ICML), pp.\ 32797--32818, 2023

work page 2023

[57] [57]

ViperGPT: Visual Inference via Python Execution for Reasoning

D \' dac Sur \' s, Sachit Menon, and Carl Vondrick. ViperGPT: Visual Inference via Python Execution for Reasoning . In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 11888--11898, October 2023

work page 2023

[58] [58]

Sutton, Doina Precup, and Satinder Singh

Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning . Artificial Intelligence, 112 0 (1): 0 181--211, 1999

work page 1999

[59] [59]

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladim\' r Vondru s , Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Traini...

work page 2021

[60] [60]

On the Planning Abilities of Large Language Models - A Critical Investigation

Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the Planning Abilities of Large Language Models - A Critical Investigation . In Advances in Neural Information Processing Systems (NeurIPS), volume 36, pp.\ 75993--76005, 2023

work page 2023

[61] [61]

Discovering User-Interpretable Capabilities of Black-Box Planning Agents

Pulkit Verma, Shashank Rao Marpally, and Siddharth Srivastava. Discovering User-Interpretable Capabilities of Black-Box Planning Agents . In Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning (KR), volume 19, pp.\ 362--372, 2022

work page 2022

[62] [62]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An Open-Ended Embodied Agent with Large Language Models . Transactions on Machine Learning Research (TMLR), 2024 a . ISSN 2835-8856

work page 2024

[63] [63]

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions . arXiv preprint arXiv:1901.01753, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[64] [64]

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, and Zackory Erickson. RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback . In Proceedings of the 41st International Conference on Machine Learning, volume 235, pp.\ 51484--51501, 21--27 Jul 2024 b

work page 2024

[65] [65]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, and Chuang Gan. RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation . In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024 c

work page 2024

[66] [66]

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects . In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 17868--17879, 2024

work page 2024

[67] [67]

Neuro-Symbolic Learning of Lifted Action Models from Visual Traces

Kai Xi, Stephen Gould, and Sylvie Thiébaux. Neuro-Symbolic Learning of Lifted Action Models from Visual Traces . Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 34 0 (1): 0 653--662, May 2024

work page 2024

[68] [68]

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Haoran Tan, Chencheng Jiang, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, et al. Octopus: Embodied Vision-Language Programmer from Environmental Feedback . In Proceedings of the 2024 European Conference on Computer Vision (ECCV), pp.\ 20--38, 2024

work page 2024

[69] [69]

ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation

Naoki Yokoyama, Alex Clegg, Joanne Truong, Eric Undersander, Tsung-Yen Yang, Sergio Arnaud, Sehoon Ha, Dhruv Batra, and Akshara Rai. ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation . IEEE Robotics and Automation Letters, 9 0 (1): 0 779--786, 2024

work page 2024

[70] [70]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[71] [71]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[72] [72]

0362 #1 ^H 2

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page arXiv