UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

Cewu Lu; Haoming Ye; Panpan Cai; Yunxiao Xiao

arxiv: 2507.21545 · v3 · submitted 2025-07-29 · 💻 cs.RO

UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

Haoming Ye , Yunxiao Xiao , Cewu Lu , Panpan Cai This is my paper

Pith reviewed 2026-05-19 02:55 UTC · model grok-4.3

classification 💻 cs.RO

keywords robot task planningPDDL domain learningpretraining from demonstrationszero-shot generalizationsymbolic planningmanipulation videoscompositional generalization

0 comments

The pith

A single PDDL domain pretrained from robot videos supports zero-shot planning on unseen manipulation tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that atomic planning domains can be extracted from a large collection of real robot videos and then fused on demand into meta-domains that let symbolic planners solve new task combinations without hand-crafted rules or task-specific retraining. A sympathetic reader would care because language-model planners often lose track of long-horizon constraints while purely symbolic planners require expert domain engineering that does not scale. If the method works, a robot could absorb broad action knowledge from demonstrations once and then compose reliable plans for novel goals that share substructures with the original data. The central object is the unified domain that stores operators, predicates, and causal edges so they can be retrieved and combined without introducing planning errors.

Core claim

Processing 12393 manipulation videos yields a unified PDDL domain containing 3137 operators, 2875 predicates, and 16481 causal edges. For any target task class the system retrieves the relevant atomic pieces and fuses them into a meta-domain that supplies the symbolic structure needed for compositional planning. This pipeline produces plans for complex unseen tasks in a zero-shot setting, with measured gains in both task success and plan optimality over direct LLM and LLM-PDDL baselines.

What carries the argument

The unified PDDL domain built by extracting atomic domains from videos and then retrieving and fusing them into task-specific meta-domains.

If this is right

Robots gain the ability to plan long sequences by composing pre-learned causal edges rather than rediscovering them for each new goal.
Plan optimality improves because the meta-domain inherits real-world ordering and precondition constraints captured from demonstrations.
Zero-shot generalization extends to any task class whose sub-actions appear in the video corpus, without requiring new domain authoring.
Symbolic planning becomes more tightly grounded in perception because predicates and operators are derived directly from observed manipulation sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the video corpus grows, the same extraction and fusion steps could cover an expanding range of manipulation skills with no additional expert input.
The retrieval-plus-fusion pattern might transfer to other formalisms such as temporal logic or STRIPS variants once similar atomic extraction is defined.
Running the fusion step incrementally as new videos arrive could keep the unified domain current without full re-pretraining.

Load-bearing premise

Atomic domains pulled from the videos accurately encode the implicit constraints and causal relations required for safe, error-free fusion into meta-domains that generalize to unseen tasks.

What would settle it

Deploy the fused meta-domains on a held-out set of tasks and check whether any generated plan produces physical violations or unsafe actions that contradict the constraints observed in the original video set.

Figures

Figures reproduced from arXiv: 2507.21545 by Cewu Lu, Haoming Ye, Panpan Cai, Yunxiao Xiao.

**Figure 2.** Figure 2: Overview of UniDomain. See detailed descriptions in Section 3. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison results of UniDomain and state-of-the-art methods on unseen evaluation tasks: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Results for ablation studies on domain generation: (a) ablation on the atomic domain [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Results for ablation study of the UniDo [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Task used in the overview figure 2: "Move the corn from the pot into the orange bowl, wipe [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Execution sequence of the task shown in Figure 2, planned by UniDomain. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Arrange all blocks into two separate stacks on the table. The first stack should have blocks 1, 3, 5, and 7 in order from top to bottom. The second stack should have blocks 2, 4, 6, and 8 in order from top to bottom [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 10.** Figure 10: There is a block in the green drawer. Please put it on the table, push it and put it in the yellow drawer [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 12.** Figure 12: Put the jujube in the green bowl. And put the white plate on the rack [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 15.** Figure 15: There are a spoon, a tissue, an orange block in the green drawer. Stir the bowl and put the spoon in the cup, put the orange block into the yellow drawer, wipe the bowl and scrunch the tissue on the table. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

read the original abstract

Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniDomain scales up PDDL domain learning from thousands of videos for robot planning, but the extraction and fusion steps lack reported validation against ground truth.

read the letter

The punchline is that this paper extracts atomic PDDL domains from 12k real manipulation videos, builds a unified domain with thousands of operators and predicates, then retrieves and fuses them for zero-shot planning on new tasks. That pipeline is new compared to handcrafted domains or direct LLM planning hybrids in the cited work. The scale and the reported gains—up to 58% better success and 160% better optimality on held-out real tasks—are the parts that stand out as concrete progress toward less hand-engineering in symbolic robot planning. The experiments on diverse tasks give some evidence that the approach can handle compositional generalization better than the baselines they compare against. The soft spot is exactly where the stress-test note flags: there is no quantitative check shown on whether the extracted predicates, operators, or causal edges actually match the true constraints in the videos. If mis-extractions happen at scale, they would carry straight into the fused meta-domains and could explain or undermine the performance numbers. The abstract gives no details on validation procedures or inconsistency checks during fusion, so the central claim rests on an unverified link. This is the kind of paper that would interest people working on hybrid LLM-symbolic planning for manipulation. A reader who wants data-driven ways to enlarge PDDL domains would get value from the scale and the fusion idea, provided the methods section supplies the missing validation. I would send it to peer review so referees can examine the extraction fidelity and run their own checks on the reported operators and edges.

Referee Report

2 major / 2 minor

Summary. The manuscript presents UniDomain, a framework that extracts atomic PDDL domains from 12,393 real-world robot manipulation videos to construct a unified domain containing 3137 operators, 2875 predicates, and 16481 causal edges. For a given target task class, relevant atomic domains are retrieved and fused into meta-domains that support compositional, zero-shot planning. Experiments on diverse real-world tasks report up to 58% higher task success and 160% better plan optimality than LLM and LLM-PDDL baselines.

Significance. If the extracted atomic domains faithfully encode implicit constraints and causal structure, the approach could meaningfully advance scalable, generalizable symbolic planning for robotics by moving beyond handcrafted or narrow domains. The scale of the demonstration corpus and the reported gains over strong baselines indicate practical potential for reducing reliance on manual domain engineering, provided the fusion step preserves soundness.

major comments (2)

[Abstract / Domain Extraction] Abstract and domain-construction description: the central claim that the unified domain enables reliable zero-shot generalization rests on the fidelity of predicate/operator extraction and causal-edge inference from video demonstrations. No quantitative validation (e.g., precision/recall against ground-truth annotations or execution-trace consistency checks) is reported for the 2875 predicates or 16481 causal edges; systematic mis-extraction would directly undermine the 58 % success and 160 % optimality gains.
[Meta-Domain Fusion] Fusion and retrieval step: the manuscript states that atomic domains are systematically fused into meta-domains, yet provides no analysis of how inconsistencies (e.g., conflicting preconditions or effect ordering) are detected or resolved. Without such checks, it is unclear whether the reported performance improvements can be attributed to the learned symbolic structure rather than incidental factors in the planning pipeline.

minor comments (2)

[Retrieval Procedure] Clarify the exact criteria used to decide which atomic domains are retrieved for a given task class; the current description leaves the retrieval mechanism somewhat underspecified.
[Unified Domain Statistics] Add a table or figure summarizing the distribution of operator arity, predicate types, and causal-edge density across the unified domain to help readers assess its complexity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the presentation of our methods.

read point-by-point responses

Referee: [Abstract / Domain Extraction] Abstract and domain-construction description: the central claim that the unified domain enables reliable zero-shot generalization rests on the fidelity of predicate/operator extraction and causal-edge inference from video demonstrations. No quantitative validation (e.g., precision/recall against ground-truth annotations or execution-trace consistency checks) is reported for the 2875 predicates or 16481 causal edges; systematic mis-extraction would directly undermine the 58 % success and 160 % optimality gains.

Authors: We agree that explicit quantitative validation of the predicate and operator extraction, as well as causal-edge inference, would provide stronger support for the claims. The manuscript currently demonstrates the quality of the extracted domain indirectly through large-scale real-world experiments that yield substantial gains over baselines. We acknowledge that direct precision/recall metrics or systematic execution-trace checks against ground-truth annotations are not reported. In the revised version, we will expand the domain extraction section to include a human evaluation on a representative sample of the data, reporting precision and recall figures, along with additional consistency checks where execution traces can be obtained. revision: yes
Referee: [Meta-Domain Fusion] Fusion and retrieval step: the manuscript states that atomic domains are systematically fused into meta-domains, yet provides no analysis of how inconsistencies (e.g., conflicting preconditions or effect ordering) are detected or resolved. Without such checks, it is unclear whether the reported performance improvements can be attributed to the learned symbolic structure rather than incidental factors in the planning pipeline.

Authors: The referee is correct that the current description of the fusion step is high-level and does not detail inconsistency detection or resolution. The manuscript emphasizes systematic retrieval and fusion to produce meta-domains that support compositional zero-shot planning, but does not analyze conflict handling. We will revise the methods section to provide a precise description of the fusion algorithm, including how conflicting preconditions or effect orderings are identified and resolved (e.g., via constraint prioritization and consistency verification), and report empirical observations on conflict frequency and resolution outcomes from our experiments. revision: yes

Circularity Check

0 steps flagged

Data-driven extraction and held-out evaluation keep circularity low

full rationale

The derivation proceeds by extracting atomic PDDL domains from 12,393 videos, unifying them into a single domain, retrieving relevant atomics for a target task class, and fusing them into meta-domains. Reported success and optimality metrics are measured on held-out real-world tasks rather than being direct algebraic consequences of any fitted parameters or self-referential definitions. No load-bearing self-citation chains, ansatzes smuggled via prior work, or renaming of known results appear in the described pipeline. The central generalization claim therefore rests on empirical performance outside the training corpus.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that video demonstrations contain sufficient information to extract sound PDDL operators and predicates and that the fusion procedure preserves correctness; no explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Video demonstrations of manipulation tasks contain the causal structure and constraints needed to derive valid PDDL operators and predicates.
Invoked when the paper states that atomic domains are extracted from 12,393 videos to form the unified domain.
domain assumption Relevant atomic domains can be retrieved and fused without introducing planning inconsistencies or unsafe actions for unseen tasks.
Invoked in the description of retrieving atomics and systematically fusing them into meta-domains for target task classes.

pith-pipeline@v0.9.0 · 5722 in / 1437 out tokens · 24008 ms · 2026-05-19T02:55:59.465576+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges... hierarchical fusion... meta-domain
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

closed-loop verification with an LLM to ensure syntactic correctness, solvability, and commonsense alignment

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 5 internal anchors

[1]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

On the limit of language models as planning formalizers

Cassie Huang and Li Zhang. On the limit of language models as planning formalizers. arXiv preprint arXiv:2412.09879, 2024

work page arXiv 2024
[4]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Dynamic planning with an llm

Gautier Dagan, Frank Keller, and Alex Lascarides Keller. Dynamic planning with an llm. In Proceedings of the Language Gamification Workshop 2024 at NeurIPS, pages 1–14. Neural Information Processing Systems Foundation (NeurIPS), October 2024. Language Gamification Workshop 2024 at NeurIPS ; Conference date: 14-12-2024 Through 14-12-2024

work page 2024
[6]

PDDLEGO: Iterative planning in textual environments

Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, and Niket Tandon. PDDLEGO: Iterative planning in textual environments. In Danushka Bollegala and Vered Shwartz, editors, Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 212–221, Mexico City, Mexico, June 2024. Association for Computa...

work page 2024
[7]

Pddl| the planning domain definition language

Constructions Aeronautiques, Adele Howe, Craig Knoblock, ISI Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, David Wilkins Sri, Anthony Barrett, Dave Christianson, et al. Pddl| the planning domain definition language. Technical Report, Tech. Rep., 1998

work page 1998
[8]

The fast downward planning system

Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006

work page 2006
[9]

Autogpt+ p: Affordance- based task planning with large language models

Timo Birr, Christoph Pohl, Abdelrahman Younes, and Tamim Asfour. Autogpt+ p: Affordance- based task planning with large language models. arXiv preprint arXiv:2402.10778, 2024

work page arXiv 2024
[10]

Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori

Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori. Vision- language interpreter for robot task planning. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2051–2058, 2024

work page 2051
[11]

Leveraging pre-trained large language models to construct and utilize world models for model-based task planning

Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[12]

Interpret: Interactive predicate learning from language feedback for generalizable task planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning. arXiv preprint arXiv:2405.19758, 2024

work page arXiv 2024
[13]

Towards robust LLM-driven planning from minimal text descriptions

Elliot Gestrin, Marco Kuhlmann, and Jendrik Seipp. Towards robust LLM-driven planning from minimal text descriptions. In ICAPS 2024 Workshop on Human-Aware Explainable Planning, 2024

work page 2024
[14]

Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu

Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu. Generating symbolic world models via test-time scaling of large language models. Transactions on Machine Learning Research, 2025

work page 2025
[15]

RDT-1b: a diffusion foundation model for bimanual manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. In The Thirteenth International Conference on Learning Representations, 2025. 10

work page 2025
[16]

Open- VLA: An open-source vision-language-action model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open- VLA: An open-source vision-language-action model. In 8th Annual Conference on Robot Lear...

work page 2024
[17]

π0: A vision-language-action flow model for general robot control, 2024

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky. π0: A visi...

work page 2024
[18]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. International Journal of Machine Learning and Cybernetics, pages 1–65, 2024

work page 2024
[20]

Yu, and Jianfeng Gao

Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, et al. A survey on post-training of large language models.arXiv preprint arXiv:2503.06072, 2025

work page arXiv 2025
[21]

Code as policies: Language model programs for embodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500, 2023

work page 2023
[22]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023

work page 2023
[23]

Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning

Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, and Lei Ma. Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2081–2088, 2024

work page 2024
[24]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Progprompt: Generating situated robot task plans using large language models

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023

work page 2023
[26]

Do as i can, not as i say: Grounding language in robotic affordances

brian ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander T Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar ...

work page 2022
[27]

Saycanpay: Heuristic planning with large language models using learnable domain knowledge

Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. Saycanpay: Heuristic planning with large language models using learnable domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20123–20133, 2024

work page 2024
[28]

11 Innermonologue: Embodied reasoning through planning with language models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan James Richard Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Andrew Ichter. 11 Innermonologue: Embodied reasoning through planning with language models. 2022. CoRL 2022 (to appear)

work page 2022
[29]

Reflex- ion: language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflex- ion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 8634–8652. Curran Associates, Inc., 2023

work page 2023
[30]

Large language models as commonsense knowledge for large-scale task planning

Zirui Zhao, Wee Sun Lee, and David Hsu. Large language models as commonsense knowledge for large-scale task planning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 31967–31987. Curran Associates, Inc., 2023

work page 2023
[31]

Chain- of-symbol prompting for spatial reasoning in large language models

Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, and Yue Zhang. Chain- of-symbol prompting for spatial reasoning in large language models. In First Conference on Language Modeling, 2024

work page 2024
[32]

Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning. In First Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024, 2024

work page 2024
[33]

Siegel, Jiahai Feng, Noa Korneev, Joshua B

Lio Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, and Jacob Andreas. Learning adaptive planning representations with natural language guidance. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[34]

Language-augmented symbolic planner for open-world task planning

Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, and Jia Pan. Language-augmented symbolic planner for open-world task planning. arXiv preprint arXiv:2407.09792, 2024

work page arXiv 2024
[35]

Learning compositional behaviors from demonstration and language

Weiyu Liu, Neil Nie, Ruohan Zhang, Jiayuan Mao, and Jiajun Wu. Learning compositional behaviors from demonstration and language. In 8th Annual Conference on Robot Learning, 2024

work page 2024
[36]

Predicate invention from pixels via pretrained vision-language models

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Predicate invention from pixels via pretrained vision-language models. arXiv preprint arXiv:2501.00296, 2024

work page arXiv 2024
[37]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In CoRL Workshop on Learning Robot Fine and Dexterous Manipulation: Perception and Control, 2024

work page 2024
[38]

You only teach once: Learn one- shot bimanual robotic manipulation from video demonstra- tions

Huayi Zhou, Ruixiang Wang, Yunxin Tai, Yueci Deng, Guiliang Liu, and Kui Jia. You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations. arXiv preprint arXiv:2501.14208, 2025

work page arXiv 2025
[39]

When video coding meets multimodal large language models: A unified paradigm for video coding

Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, and Shiqi Wang. When video coding meets multimodal large language models: A unified paradigm for video coding. arXiv preprint arXiv:2408.08093, 2024

work page arXiv 2024
[40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[41]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023

work page 2023
[42]

Mpnet: Masked and permuted pre-training for language understanding

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33:16857–16867, 2020. 12

work page 2020
[43]

Maria Fox and Derek Long. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61–124, 2003

work page 2003
[44]

Håkan LS Younes and Michael L Littman. Ppddl1. 0: An extension to pddl for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162, 2:99, 2004

work page 2004
[45]

Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer

Scott Sanner et al. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University, 32:27, 2010. 13 A Technical Appendices and Supplementary Material A.1 The Task Included in Overview The task used in the overview is shown in 7. The language instruction is “Move the corn from the pot into the ora...

work page 2010
[46]

pick_from_rack (bowl)

work page
[47]

place_on_table (bowl)

work page
[48]

pick_from_pot (corn)

work page
[49]

open_drawer (drawer_yellow)

work page
[50]

pick_from_drawer (towel)

work page
[51]

place_in_drawer (towel)

work page
[52]

Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer

close_drawer (drawer_yellow) The execution results are shown in 7. Figure 6: Task used in the overview figure 2: "Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer." 14 Figure 7: Execution sequence of the task shown in Figure 2, planned by UniDomain. A.2 Evaluation Tasks We pr...

work page

[1] [1]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

On the limit of language models as planning formalizers

Cassie Huang and Li Zhang. On the limit of language models as planning formalizers. arXiv preprint arXiv:2412.09879, 2024

work page arXiv 2024

[4] [4]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

Dynamic planning with an llm

Gautier Dagan, Frank Keller, and Alex Lascarides Keller. Dynamic planning with an llm. In Proceedings of the Language Gamification Workshop 2024 at NeurIPS, pages 1–14. Neural Information Processing Systems Foundation (NeurIPS), October 2024. Language Gamification Workshop 2024 at NeurIPS ; Conference date: 14-12-2024 Through 14-12-2024

work page 2024

[6] [6]

PDDLEGO: Iterative planning in textual environments

Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, and Niket Tandon. PDDLEGO: Iterative planning in textual environments. In Danushka Bollegala and Vered Shwartz, editors, Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 212–221, Mexico City, Mexico, June 2024. Association for Computa...

work page 2024

[7] [7]

Pddl| the planning domain definition language

Constructions Aeronautiques, Adele Howe, Craig Knoblock, ISI Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, David Wilkins Sri, Anthony Barrett, Dave Christianson, et al. Pddl| the planning domain definition language. Technical Report, Tech. Rep., 1998

work page 1998

[8] [8]

The fast downward planning system

Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006

work page 2006

[9] [9]

Autogpt+ p: Affordance- based task planning with large language models

Timo Birr, Christoph Pohl, Abdelrahman Younes, and Tamim Asfour. Autogpt+ p: Affordance- based task planning with large language models. arXiv preprint arXiv:2402.10778, 2024

work page arXiv 2024

[10] [10]

Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori

Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori. Vision- language interpreter for robot task planning. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2051–2058, 2024

work page 2051

[11] [11]

Leveraging pre-trained large language models to construct and utilize world models for model-based task planning

Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[12] [12]

Interpret: Interactive predicate learning from language feedback for generalizable task planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning. arXiv preprint arXiv:2405.19758, 2024

work page arXiv 2024

[13] [13]

Towards robust LLM-driven planning from minimal text descriptions

Elliot Gestrin, Marco Kuhlmann, and Jendrik Seipp. Towards robust LLM-driven planning from minimal text descriptions. In ICAPS 2024 Workshop on Human-Aware Explainable Planning, 2024

work page 2024

[14] [14]

Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu

Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu. Generating symbolic world models via test-time scaling of large language models. Transactions on Machine Learning Research, 2025

work page 2025

[15] [15]

RDT-1b: a diffusion foundation model for bimanual manipulation

Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. In The Thirteenth International Conference on Learning Representations, 2025. 10

work page 2025

[16] [16]

Open- VLA: An open-source vision-language-action model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open- VLA: An open-source vision-language-action model. In 8th Annual Conference on Robot Lear...

work page 2024

[17] [17]

π0: A vision-language-action flow model for general robot control, 2024

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky. π0: A visi...

work page 2024

[18] [18]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. International Journal of Machine Learning and Cybernetics, pages 1–65, 2024

work page 2024

[20] [20]

Yu, and Jianfeng Gao

Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, et al. A survey on post-training of large language models.arXiv preprint arXiv:2503.06072, 2025

work page arXiv 2025

[21] [21]

Code as policies: Language model programs for embodied control

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500, 2023

work page 2023

[22] [22]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023

work page 2023

[23] [23]

Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning

Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, and Lei Ma. Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2081–2088, 2024

work page 2024

[24] [24]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Progprompt: Generating situated robot task plans using large language models

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023

work page 2023

[26] [26]

Do as i can, not as i say: Grounding language in robotic affordances

brian ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander T Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar ...

work page 2022

[27] [27]

Saycanpay: Heuristic planning with large language models using learnable domain knowledge

Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. Saycanpay: Heuristic planning with large language models using learnable domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20123–20133, 2024

work page 2024

[28] [28]

11 Innermonologue: Embodied reasoning through planning with language models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan James Richard Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Andrew Ichter. 11 Innermonologue: Embodied reasoning through planning with language models. 2022. CoRL 2022 (to appear)

work page 2022

[29] [29]

Reflex- ion: language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflex- ion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 8634–8652. Curran Associates, Inc., 2023

work page 2023

[30] [30]

Large language models as commonsense knowledge for large-scale task planning

Zirui Zhao, Wee Sun Lee, and David Hsu. Large language models as commonsense knowledge for large-scale task planning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 31967–31987. Curran Associates, Inc., 2023

work page 2023

[31] [31]

Chain- of-symbol prompting for spatial reasoning in large language models

Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, and Yue Zhang. Chain- of-symbol prompting for spatial reasoning in large language models. In First Conference on Language Modeling, 2024

work page 2024

[32] [32]

Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning. In First Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024, 2024

work page 2024

[33] [33]

Siegel, Jiahai Feng, Noa Korneev, Joshua B

Lio Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, and Jacob Andreas. Learning adaptive planning representations with natural language guidance. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[34] [34]

Language-augmented symbolic planner for open-world task planning

Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, and Jia Pan. Language-augmented symbolic planner for open-world task planning. arXiv preprint arXiv:2407.09792, 2024

work page arXiv 2024

[35] [35]

Learning compositional behaviors from demonstration and language

Weiyu Liu, Neil Nie, Ruohan Zhang, Jiayuan Mao, and Jiajun Wu. Learning compositional behaviors from demonstration and language. In 8th Annual Conference on Robot Learning, 2024

work page 2024

[36] [36]

Predicate invention from pixels via pretrained vision-language models

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Predicate invention from pixels via pretrained vision-language models. arXiv preprint arXiv:2501.00296, 2024

work page arXiv 2024

[37] [37]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In CoRL Workshop on Learning Robot Fine and Dexterous Manipulation: Perception and Control, 2024

work page 2024

[38] [38]

You only teach once: Learn one- shot bimanual robotic manipulation from video demonstra- tions

Huayi Zhou, Ruixiang Wang, Yunxin Tai, Yueci Deng, Guiliang Liu, and Kui Jia. You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations. arXiv preprint arXiv:2501.14208, 2025

work page arXiv 2025

[39] [39]

When video coding meets multimodal large language models: A unified paradigm for video coding

Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, and Shiqi Wang. When video coding meets multimodal large language models: A unified paradigm for video coding. arXiv preprint arXiv:2408.08093, 2024

work page arXiv 2024

[40] [40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021

[41] [41]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023

work page 2023

[42] [42]

Mpnet: Masked and permuted pre-training for language understanding

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33:16857–16867, 2020. 12

work page 2020

[43] [43]

Maria Fox and Derek Long. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61–124, 2003

work page 2003

[44] [44]

Håkan LS Younes and Michael L Littman. Ppddl1. 0: An extension to pddl for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162, 2:99, 2004

work page 2004

[45] [45]

Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer

Scott Sanner et al. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University, 32:27, 2010. 13 A Technical Appendices and Supplementary Material A.1 The Task Included in Overview The task used in the overview is shown in 7. The language instruction is “Move the corn from the pot into the ora...

work page 2010

[46] [46]

pick_from_rack (bowl)

work page

[47] [47]

place_on_table (bowl)

work page

[48] [48]

pick_from_pot (corn)

work page

[49] [49]

open_drawer (drawer_yellow)

work page

[50] [50]

pick_from_drawer (towel)

work page

[51] [51]

place_in_drawer (towel)

work page

[52] [52]

Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer

close_drawer (drawer_yellow) The execution results are shown in 7. Figure 6: Task used in the overview figure 2: "Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer." 14 Figure 7: Execution sequence of the task shown in Figure 2, planned by UniDomain. A.2 Evaluation Tasks We pr...

work page