UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
Pith reviewed 2026-05-19 02:55 UTC · model grok-4.3
The pith
A single PDDL domain pretrained from robot videos supports zero-shot planning on unseen manipulation tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Processing 12393 manipulation videos yields a unified PDDL domain containing 3137 operators, 2875 predicates, and 16481 causal edges. For any target task class the system retrieves the relevant atomic pieces and fuses them into a meta-domain that supplies the symbolic structure needed for compositional planning. This pipeline produces plans for complex unseen tasks in a zero-shot setting, with measured gains in both task success and plan optimality over direct LLM and LLM-PDDL baselines.
What carries the argument
The unified PDDL domain built by extracting atomic domains from videos and then retrieving and fusing them into task-specific meta-domains.
If this is right
- Robots gain the ability to plan long sequences by composing pre-learned causal edges rather than rediscovering them for each new goal.
- Plan optimality improves because the meta-domain inherits real-world ordering and precondition constraints captured from demonstrations.
- Zero-shot generalization extends to any task class whose sub-actions appear in the video corpus, without requiring new domain authoring.
- Symbolic planning becomes more tightly grounded in perception because predicates and operators are derived directly from observed manipulation sequences.
Where Pith is reading between the lines
- If the video corpus grows, the same extraction and fusion steps could cover an expanding range of manipulation skills with no additional expert input.
- The retrieval-plus-fusion pattern might transfer to other formalisms such as temporal logic or STRIPS variants once similar atomic extraction is defined.
- Running the fusion step incrementally as new videos arrive could keep the unified domain current without full re-pretraining.
Load-bearing premise
Atomic domains pulled from the videos accurately encode the implicit constraints and causal relations required for safe, error-free fusion into meta-domains that generalize to unseen tasks.
What would settle it
Deploy the fused meta-domains on a held-out set of tasks and check whether any generated plan produces physical violations or unsafe actions that contradict the constraints observed in the original video set.
Figures
read the original abstract
Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents UniDomain, a framework that extracts atomic PDDL domains from 12,393 real-world robot manipulation videos to construct a unified domain containing 3137 operators, 2875 predicates, and 16481 causal edges. For a given target task class, relevant atomic domains are retrieved and fused into meta-domains that support compositional, zero-shot planning. Experiments on diverse real-world tasks report up to 58% higher task success and 160% better plan optimality than LLM and LLM-PDDL baselines.
Significance. If the extracted atomic domains faithfully encode implicit constraints and causal structure, the approach could meaningfully advance scalable, generalizable symbolic planning for robotics by moving beyond handcrafted or narrow domains. The scale of the demonstration corpus and the reported gains over strong baselines indicate practical potential for reducing reliance on manual domain engineering, provided the fusion step preserves soundness.
major comments (2)
- [Abstract / Domain Extraction] Abstract and domain-construction description: the central claim that the unified domain enables reliable zero-shot generalization rests on the fidelity of predicate/operator extraction and causal-edge inference from video demonstrations. No quantitative validation (e.g., precision/recall against ground-truth annotations or execution-trace consistency checks) is reported for the 2875 predicates or 16481 causal edges; systematic mis-extraction would directly undermine the 58 % success and 160 % optimality gains.
- [Meta-Domain Fusion] Fusion and retrieval step: the manuscript states that atomic domains are systematically fused into meta-domains, yet provides no analysis of how inconsistencies (e.g., conflicting preconditions or effect ordering) are detected or resolved. Without such checks, it is unclear whether the reported performance improvements can be attributed to the learned symbolic structure rather than incidental factors in the planning pipeline.
minor comments (2)
- [Retrieval Procedure] Clarify the exact criteria used to decide which atomic domains are retrieved for a given task class; the current description leaves the retrieval mechanism somewhat underspecified.
- [Unified Domain Statistics] Add a table or figure summarizing the distribution of operator arity, predicate types, and causal-edge density across the unified domain to help readers assess its complexity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the presentation of our methods.
read point-by-point responses
-
Referee: [Abstract / Domain Extraction] Abstract and domain-construction description: the central claim that the unified domain enables reliable zero-shot generalization rests on the fidelity of predicate/operator extraction and causal-edge inference from video demonstrations. No quantitative validation (e.g., precision/recall against ground-truth annotations or execution-trace consistency checks) is reported for the 2875 predicates or 16481 causal edges; systematic mis-extraction would directly undermine the 58 % success and 160 % optimality gains.
Authors: We agree that explicit quantitative validation of the predicate and operator extraction, as well as causal-edge inference, would provide stronger support for the claims. The manuscript currently demonstrates the quality of the extracted domain indirectly through large-scale real-world experiments that yield substantial gains over baselines. We acknowledge that direct precision/recall metrics or systematic execution-trace checks against ground-truth annotations are not reported. In the revised version, we will expand the domain extraction section to include a human evaluation on a representative sample of the data, reporting precision and recall figures, along with additional consistency checks where execution traces can be obtained. revision: yes
-
Referee: [Meta-Domain Fusion] Fusion and retrieval step: the manuscript states that atomic domains are systematically fused into meta-domains, yet provides no analysis of how inconsistencies (e.g., conflicting preconditions or effect ordering) are detected or resolved. Without such checks, it is unclear whether the reported performance improvements can be attributed to the learned symbolic structure rather than incidental factors in the planning pipeline.
Authors: The referee is correct that the current description of the fusion step is high-level and does not detail inconsistency detection or resolution. The manuscript emphasizes systematic retrieval and fusion to produce meta-domains that support compositional zero-shot planning, but does not analyze conflict handling. We will revise the methods section to provide a precise description of the fusion algorithm, including how conflicting preconditions or effect orderings are identified and resolved (e.g., via constraint prioritization and consistency verification), and report empirical observations on conflict frequency and resolution outcomes from our experiments. revision: yes
Circularity Check
Data-driven extraction and held-out evaluation keep circularity low
full rationale
The derivation proceeds by extracting atomic PDDL domains from 12,393 videos, unifying them into a single domain, retrieving relevant atomics for a target task class, and fusing them into meta-domains. Reported success and optimality metrics are measured on held-out real-world tasks rather than being direct algebraic consequences of any fitted parameters or self-referential definitions. No load-bearing self-citation chains, ansatzes smuggled via prior work, or renaming of known results appear in the described pipeline. The central generalization claim therefore rests on empirical performance outside the training corpus.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Video demonstrations of manipulation tasks contain the causal structure and constraints needed to derive valid PDDL operators and predicates.
- domain assumption Relevant atomic domains can be retrieved and fused without introducing planning inconsistencies or unsafe actions for unseen tasks.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges... hierarchical fusion... meta-domain
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
closed-loop verification with an LLM to ensure syntactic correctness, solvability, and commonsense alignment
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
On the limit of language models as planning formalizers
Cassie Huang and Li Zhang. On the limit of language models as planning formalizers. arXiv preprint arXiv:2412.09879, 2024
-
[4]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Gautier Dagan, Frank Keller, and Alex Lascarides Keller. Dynamic planning with an llm. In Proceedings of the Language Gamification Workshop 2024 at NeurIPS, pages 1–14. Neural Information Processing Systems Foundation (NeurIPS), October 2024. Language Gamification Workshop 2024 at NeurIPS ; Conference date: 14-12-2024 Through 14-12-2024
work page 2024
-
[6]
PDDLEGO: Iterative planning in textual environments
Li Zhang, Peter Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, and Niket Tandon. PDDLEGO: Iterative planning in textual environments. In Danushka Bollegala and Vered Shwartz, editors, Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 212–221, Mexico City, Mexico, June 2024. Association for Computa...
work page 2024
-
[7]
Pddl| the planning domain definition language
Constructions Aeronautiques, Adele Howe, Craig Knoblock, ISI Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, David Wilkins Sri, Anthony Barrett, Dave Christianson, et al. Pddl| the planning domain definition language. Technical Report, Tech. Rep., 1998
work page 1998
-
[8]
The fast downward planning system
Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006
work page 2006
-
[9]
Autogpt+ p: Affordance- based task planning with large language models
Timo Birr, Christoph Pohl, Abdelrahman Younes, and Tamim Asfour. Autogpt+ p: Affordance- based task planning with large language models. arXiv preprint arXiv:2402.10778, 2024
-
[10]
Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori. Vision- language interpreter for robot task planning. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2051–2058, 2024
work page 2051
-
[11]
Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[12]
Interpret: Interactive predicate learning from language feedback for generalizable task planning
Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. Interpret: Interactive predicate learning from language feedback for generalizable task planning. arXiv preprint arXiv:2405.19758, 2024
-
[13]
Towards robust LLM-driven planning from minimal text descriptions
Elliot Gestrin, Marco Kuhlmann, and Jendrik Seipp. Towards robust LLM-driven planning from minimal text descriptions. In ICAPS 2024 Workshop on Human-Aware Explainable Planning, 2024
work page 2024
-
[14]
Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu
Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge lin, and Weiyang Liu. Generating symbolic world models via test-time scaling of large language models. Transactions on Machine Learning Research, 2025
work page 2025
-
[15]
RDT-1b: a diffusion foundation model for bimanual manipulation
Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1b: a diffusion foundation model for bimanual manipulation. In The Thirteenth International Conference on Learning Representations, 2025. 10
work page 2025
-
[16]
Open- VLA: An open-source vision-language-action model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Open- VLA: An open-source vision-language-action model. In 8th Annual Conference on Robot Lear...
work page 2024
-
[17]
π0: A vision-language-action flow model for general robot control, 2024
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky. π0: A visi...
work page 2024
-
[18]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. arXiv preprint arXiv:2403.12945, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. International Journal of Machine Learning and Cybernetics, pages 1–65, 2024
work page 2024
-
[20]
Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, et al. A survey on post-training of large language models.arXiv preprint arXiv:2503.06072, 2025
-
[21]
Code as policies: Language model programs for embodied control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500, 2023
work page 2023
-
[22]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[23]
Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning
Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, and Lei Ma. Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 2081–2088, 2024
work page 2024
-
[24]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Progprompt: Generating situated robot task plans using large language models
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023
work page 2023
-
[26]
Do as i can, not as i say: Grounding language in robotic affordances
brian ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander T Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar ...
work page 2022
-
[27]
Saycanpay: Heuristic planning with large language models using learnable domain knowledge
Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. Saycanpay: Heuristic planning with large language models using learnable domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20123–20133, 2024
work page 2024
-
[28]
11 Innermonologue: Embodied reasoning through planning with language models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan James Richard Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Andrew Ichter. 11 Innermonologue: Embodied reasoning through planning with language models. 2022. CoRL 2022 (to appear)
work page 2022
-
[29]
Reflex- ion: language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflex- ion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 8634–8652. Curran Associates, Inc., 2023
work page 2023
-
[30]
Large language models as commonsense knowledge for large-scale task planning
Zirui Zhao, Wee Sun Lee, and David Hsu. Large language models as commonsense knowledge for large-scale task planning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 31967–31987. Curran Associates, Inc., 2023
work page 2023
-
[31]
Chain- of-symbol prompting for spatial reasoning in large language models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, and Yue Zhang. Chain- of-symbol prompting for spatial reasoning in large language models. In First Conference on Language Modeling, 2024
work page 2024
-
[32]
Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning
Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning. In First Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024, 2024
work page 2024
-
[33]
Siegel, Jiahai Feng, Noa Korneev, Joshua B
Lio Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, and Jacob Andreas. Learning adaptive planning representations with natural language guidance. In International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[34]
Language-augmented symbolic planner for open-world task planning
Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, and Jia Pan. Language-augmented symbolic planner for open-world task planning. arXiv preprint arXiv:2407.09792, 2024
-
[35]
Learning compositional behaviors from demonstration and language
Weiyu Liu, Neil Nie, Ruohan Zhang, Jiayuan Mao, and Jiajun Wu. Learning compositional behaviors from demonstration and language. In 8th Annual Conference on Robot Learning, 2024
work page 2024
-
[36]
Predicate invention from pixels via pretrained vision-language models
Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Predicate invention from pixels via pretrained vision-language models. arXiv preprint arXiv:2501.00296, 2024
-
[37]
Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning
Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, and Yuke Zhu. Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning. In CoRL Workshop on Learning Robot Fine and Dexterous Manipulation: Perception and Control, 2024
work page 2024
-
[38]
You only teach once: Learn one- shot bimanual robotic manipulation from video demonstra- tions
Huayi Zhou, Ruixiang Wang, Yunxin Tai, Yueci Deng, Guiliang Liu, and Kui Jia. You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations. arXiv preprint arXiv:2501.14208, 2025
-
[39]
When video coding meets multimodal large language models: A unified paradigm for video coding
Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, and Shiqi Wang. When video coding meets multimodal large language models: A unified paradigm for video coding. arXiv preprint arXiv:2408.08093, 2024
-
[40]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021
work page 2021
-
[41]
Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023
work page 2023
-
[42]
Mpnet: Masked and permuted pre-training for language understanding
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33:16857–16867, 2020. 12
work page 2020
-
[43]
Maria Fox and Derek Long. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61–124, 2003
work page 2003
-
[44]
Håkan LS Younes and Michael L Littman. Ppddl1. 0: An extension to pddl for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162, 2:99, 2004
work page 2004
-
[45]
Scott Sanner et al. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University, 32:27, 2010. 13 A Technical Appendices and Supplementary Material A.1 The Task Included in Overview The task used in the overview is shown in 7. The language instruction is “Move the corn from the pot into the ora...
work page 2010
-
[46]
pick_from_rack (bowl)
-
[47]
place_on_table (bowl)
-
[48]
pick_from_pot (corn)
-
[49]
open_drawer (drawer_yellow)
-
[50]
pick_from_drawer (towel)
-
[51]
place_in_drawer (towel)
-
[52]
close_drawer (drawer_yellow) The execution results are shown in 7. Figure 6: Task used in the overview figure 2: "Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer." 14 Figure 7: Execution sequence of the task shown in Figure 2, planned by UniDomain. A.2 Evaluation Tasks We pr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.