Iterative Formalization and Planning in Partially Observable Environments

Jesse Thomason; Liancheng Gong; Li Zhang; Wang Zhu

arxiv: 2505.13126 · v3 · submitted 2025-05-19 · 💻 cs.AI · cs.CL

Iterative Formalization and Planning in Partially Observable Environments

Liancheng Gong , Wang Zhu , Jesse Thomason , Li Zhang This is my paper

Pith reviewed 2026-05-22 14:21 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords PDDLpartially observable environmentsLLM planningiterative formalizationepisode decompositionplanning robustnessknowledge transfer

0 comments

The pith

PDDLego improves planning success in partial observation settings

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how large language models can handle planning when the environment state is only partially visible. It breaks the overall task into smaller episodes that are fully observable, formalizes each in PDDL, solves them one by one, and composes the results. This process requires no model fine-tuning, no example prompts, and no demonstration trajectories. If the approach holds, it would allow more reliable planning in realistic incomplete-information settings and let learned domain knowledge carry over to new tasks.

Core claim

PDDLego is a framework that iteratively formalizes, plans, grows, and refines PDDL representations by decomposing the environment and the goal into fully observable episodes. Without finetuning, in-context exemplars, or trajectories, PDDLego improves planning success and exhibits robustness against problem complexity compared to end-to-end approaches in partially observable environments. The domain knowledge captured after a successful trial can benefit future tasks.

What carries the argument

Iterative decomposition of the partially observable environment and goal into a sequence of fully observable episodes, each formalized in PDDL and composed into an overall plan.

Load-bearing premise

The language model can reliably split the partial observations and goal into fully observable episodes without dropping hidden information that later connects the solutions.

What would settle it

A test environment in which episode-level plans succeed individually yet the combined plan fails when run against the original partial observations due to missing hidden-state details.

Figures

Figures reproduced from arXiv: 2505.13126 by Jesse Thomason, Liancheng Gong, Li Zhang, Wang Zhu.

**Figure 2.** Figure 2: An illustration of PDDLego+, using LLM-asformalizer. Input environmental observations into LLM to generate PDDL representations, which are input into solver to output an action plan. The plan is executed in simulation resulting in new observations to grow PDDL. When errors occur, the LLM refines the PDDL. Unlike PDDLego, which assumes a fixed domain file, PDDL ego+ revises both DF and PF throughout intera… view at source ↗

**Figure 3.** Figure 3: An illustration of a framework based on LLMas-planner which we consider as a baseline. The LLM directly generates an action plan to be executed. for LLM-as-formalizer due to the completeness assumption of planning languages like PDDL, requiring techniques such as goal decomposition and iterative generation (Zhang et al., 2024). Proposed Method. Our PDDLego+ framework is illustrated in [PITH_FULL_IMAGE:f… view at source ↗

**Figure 4.** Figure 4: Illustration of the CoinCollector environment. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Success rate of two baselines PlanGen and PDDLego and our method PDDLego+ across four models. PDDL ego+ shows higher success in 6 out 8 model-simulation combinations. In the more challenging ALFWorld, PDDL ego+ outperforms PlanGen for every model. QwQ-32B, Llama-3.1-70B, GPT-4o-mini, and DeepSeek-R1-Distill-Qwen-32B, but found they perform significantly worse (in line with the findings of Huang and Zhang … view at source ↗

**Figure 7.** Figure 7: Average number of successful actions exe [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 9.** Figure 9: Solver (blue) and simulation (red) error counts [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 11.** Figure 11: Error breakdown for randomly selected er [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: Ablation study of the o3-mini + PDDLego+ framework on ALFWorld, comparing four prompt variants: plain, plain + hint, plain + goal, and detailed. For the PlanGen, the prompt starts with a single instruction that positions the LLM as a decision maker to generate a plan. A brief header reminds the model that all actions must be strictly grounded in the current textual observation and that it may not inve… view at source ↗

read the original abstract

Using LLMs not to predict plans but to formalize an environment into the Planning Domain Definition Language (PDDL) has been shown to improve performance and control. While most existing methodology only applies to fully observable environments, we adapt to the more realistic and challenging partially observable environments without sufficient information to make a complete plan. We propose PDDLego, a framework to iteratively formalize, plan, grow, and refine PDDL representations by decomposing the environment and the goal into fully observable episodes. Without finetuning, in-context exemplars, or trajectories, PDDLego improves planning success and exhibits robustness against problem complexity compared to end-to-end approaches. We also show that the domain knowledge captured after a successful trial can benefit future tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PDDLego breaks partial-observability problems into iterative fully-observable PDDL episodes, a useful extension of prior LLM formalization work, though the robustness claims rest on thin evidence so far.

read the letter

The main point is that this paper takes the existing idea of using LLMs to turn environments into PDDL and adds an iterative loop that splits a partially observable task into a sequence of fully observable episodes, solves each with a planner, and refines the domain as it goes. It does this without fine-tuning or in-context examples, and it claims the resulting domain knowledge transfers to later tasks. That addresses a real gap, since most prior LLM-to-PDDL work stayed in fully observable settings and real robotics problems rarely give complete state information up front. The decomposition strategy itself is the clearest new piece, and the high-level description of how episodes are grown and refined looks workable on paper. The abstract also says the method beats end-to-end baselines on success rate and scales better with complexity, which would be practically useful if it holds. The soft spot is that the provided abstract gives no numbers, no baselines, and no error breakdown, so it is difficult to tell whether the gains survive controls for prompt choice or problem selection. The stress-test concern about hidden-state information dropping out between episodes is worth checking in the full text; if the paper only shows successful runs without an invariant or recovery mechanism for variables revealed late, that would limit how far the robustness claim can be trusted. Readers working on LLM planners for robotics or other incomplete-information domains would find the method description worth reading. The work is coherent enough and targets a genuine practical issue, so it deserves a serious referee who can ask for the missing quantitative details and a closer look at cross-episode state threading.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PDDLego, a framework in which LLMs are used to iteratively formalize a partially observable environment and goal into PDDL by decomposing the problem into a sequence of fully observable episodes. Within each episode a classical planner is invoked; the resulting plan and observations are used to grow and refine the PDDL domain. The central claims are that this procedure yields higher planning success and greater robustness to increasing problem complexity than end-to-end LLM planners, requires neither fine-tuning nor in-context exemplars nor trajectories, and that the acquired domain knowledge transfers to subsequent tasks.

Significance. If the empirical claims are substantiated, the work would provide a concrete, training-free bridge between LLM-based environment modeling and symbolic planning in realistic POMDPs. The iterative decomposition-plus-refinement loop and the demonstrated knowledge reuse across tasks are the most distinctive contributions.

major comments (2)

The load-bearing assumption that decomposition into fully observable episodes preserves all hidden-state dependencies necessary for later episodes is not accompanied by a formal invariant or by a systematic empirical stress test. Section 3 describes the iterative formalization and refinement loop but supplies no argument showing that variables revealed only after an action (object locations, preconditions, etc.) are correctly threaded across episode boundaries; an early omission would render subsequent PDDL domains irrecoverable and would undermine the robustness-to-complexity claim.
The abstract asserts performance gains and robustness, yet the soundness assessment notes the absence of quantitative results, baselines, or error analysis in the provided summary. The experimental section must report success rates, problem-complexity scaling curves, and controlled comparisons against end-to-end LLM planners (with identical prompt engineering) so that the claimed improvements can be verified.

minor comments (2)

Clarify the precise criteria used by the LLM to decide when an episode is fully observable and when refinement of the PDDL domain is triggered.
Add a reproducibility checklist or pseudocode for the overall PDDLego loop, including how observations are mapped back into PDDL predicates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our approach and results.

read point-by-point responses

Referee: The load-bearing assumption that decomposition into fully observable episodes preserves all hidden-state dependencies necessary for later episodes is not accompanied by a formal invariant or by a systematic empirical stress test. Section 3 describes the iterative formalization and refinement loop but supplies no argument showing that variables revealed only after an action (object locations, preconditions, etc.) are correctly threaded across episode boundaries; an early omission would render subsequent PDDL domains irrecoverable and would undermine the robustness-to-complexity claim.

Authors: We agree that a formal invariant would provide stronger theoretical grounding. The PDDLego loop is designed so that each episode's observations and planner outcomes are used to extend the domain with newly revealed predicates and objects before the next episode begins. We have expanded Section 3 with an explicit description of this threading mechanism and added a new appendix containing systematic stress tests on POMDPs engineered to expose early-omission risks. These experiments show that the refinement process recovers the necessary state information in the evaluated domains. revision: yes
Referee: The abstract asserts performance gains and robustness, yet the soundness assessment notes the absence of quantitative results, baselines, or error analysis in the provided summary. The experimental section must report success rates, problem-complexity scaling curves, and controlled comparisons against end-to-end LLM planners (with identical prompt engineering) so that the claimed improvements can be verified.

Authors: The full experimental section already reports success rates, problem-complexity scaling curves, and direct comparisons against end-to-end LLM planners that use identical prompt engineering. We have also included an error analysis of failure cases. In the revision we have reorganized the experimental section to make these quantitative results and baseline details more prominent and have added a summary table of key metrics for easier verification. revision: partial

Circularity Check

0 steps flagged

No circularity: procedural framework with independent empirical claims

full rationale

The paper presents PDDLego as an iterative procedural method that decomposes POMDPs into fully observable PDDL episodes, formalizes them, plans, and refines without any closed-form equations, fitted parameters, or derivations. No step reduces a claimed prediction or success metric to a quantity defined by the same inputs or by self-citation chains. The central claims rest on described algorithmic steps and reported empirical robustness rather than self-referential definitions or load-bearing prior results from the same authors. The derivation chain is therefore self-contained and does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that LLMs can produce accurate and composable PDDL fragments from partial observations; no free parameters, mathematical axioms, or new invented entities are introduced beyond standard PDDL semantics and LLM prompting.

axioms (1)

domain assumption LLMs can produce syntactically valid and semantically useful PDDL descriptions of environment dynamics and goals from natural-language or partial-state input.
Invoked throughout the abstract as the mechanism enabling iterative formalization without finetuning.

pith-pipeline@v0.9.0 · 5647 in / 1243 out tokens · 28972 ms · 2026-05-22T14:21:06.874203+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PDDLego+ iteratively formalizes, plans, grows, and refines PDDL representations by decomposing the environment and the goal into fully observable episodes.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Two-phase error refinement: inner solver-error loop, outer simulation-error loop.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
cs.RO 2026-02 unverdicted novelty 6.0

KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classica...

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brah- man, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, and Xiang Ren

Embodied agent interface: Benchmarking llms for embodied decision making.arXiv preprint arXiv:2410.07166. Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brah- man, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, and Xiang Ren. 2023. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks.Advances in N...

work page arXiv 2023
[2]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Llm+ p: Empowering large language mod- els with optimal planning proficiency.arXiv preprint arXiv:2304.11477. Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, and Chris Callison-Burch. 2023. Faithful chain-of- thought reasoning. InProceedings of the 13th In- ternational Joint Conference on Natural Language Proces...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Clin: A continually learning language agent for rapid task adaptation and generalization.Preprint, arXiv:2310.10134. Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, and Hamid Palangi

work page arXiv
[4]

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht

Plangen: A multi-agent framework for gener- ating planning and reasoning trajectories for complex problem solving.Preprint, arXiv:2502.16111. Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. 2021. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. InProceedings of the Internat...

work page arXiv 2021
[5]

Karthik Valmeekam, Sarath Sreedharan, Matthew Mar- quez, Alberto Olmo, and Subbarao Kambhampati

Planbench: An extensible benchmark for eval- uating large language models on planning and reason- ing about change.Advances in Neural Information Processing Systems, 36. Karthik Valmeekam, Sarath Sreedharan, Matthew Mar- quez, Alberto Olmo, and Subbarao Kambhampati

work page
[6]

Solver failed to produce a valid plan

On the planning abilities of large language models (a critical investigation with a proposed benchmark).Preprint, arXiv:2302.06706. Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, and Peter Jansen. 2023. Byte- sized32: A corpus and challenge task for generating task-specific world models expressed as text games. Preprint, arXiv:2305....

work page arXiv 2023
[7]

You are in the backyard

move south EXEC RESULT OK, patio door opened, revealing backyard, agent in backyard (= patio) Observation 2 You open the patio door, revealing the backyard. You are in the backyard. Through an open patio door, to the North you see the kitchen. To the South you see the driveway. To the East you see the street. To the West you see a closed patio door. DF 2 ...

work page
[8]

No door to open

move south DeepSeek-R1 GPT-4.1 o3-mini Metric PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ trial_count 100 100 100 100 100 100 100 100 100 succeed_count 55 5 41 94 12 48 52 49 86 success_rate 55% 5% 41% 94% 12% 48% 52% 49% 86% total_solver_errors – – 177 – – 114 – – 43 total_solver_fixed – – 134 – – 67 – – 32 solver_error_fix...

work page
[9]

(Error fixed!) Observation 3 You are in the driveway

move south EXEC RESULT OK, moved to driveway. (Error fixed!) Observation 3 You are in the driveway. To the North you see the backyard. DF 3 (unchanged) PF 3 (:init (at driveway) (no-door backyard driveway south) (no-door backyard street east) (*@\textbf{(no-door driveway backyard north)}@*) ) (:goal (at street) ) SOLVER PLAN

work page
[10]

Three planning frameworks per LLM; dashes mark metrics that are not defined for that framework

move east DeepSeek-R1 GPT-4.1 o3-mini Metric PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ trial_count 100 100 100 100 100 100 100 100 100 succeed_count 17 4 19 5 1 7 5 3 38 success_rate 17% 4% 19% 5% 1% 7% 5% 3% 38% total_solver_errors – – 359 – – 181 – – 296 total_solver_fixed – – 292 – – 104 – – 266 solver_error_fix_rate – ...

work page
[11]

I’m not sure what you mean

move north EXEC RESULT OK, moved to the supermarket and saw a coin there! Task accomplished. SUMMARY Errors encountered and fixed Missing (no-door ...) relation -> added after sim-error All subsequent plans executed without error Final state DF: full action model with door-exists / door DeepSeek-R1 GPT-4.1 Metric simple detailed simple detailed trial_coun...

work page
[13]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) You should have a goal in the problem file like this: (:goal (at ?location) ) where location should be somewhere not visited Note: in problem file's init, you shouldn't have "not ()" but only the single status E.2: CoinCollector – DetailedPDDLego+ Prompt (after generatingPFandDF) ...

work page
[15]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) You should have a goal in the problem file like this: (:goal (at ?location) ) where location should be somewhere not visited Note: in problem file's init, you shouldn't have "not ()" but only the single status This is previous domain file: (define (domain exploration) (:requiremen...

work page
[17]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) Note: in problem file's init, you shouldn't have "not ()" but only the single status E.4: CoinCollector – SimplePDDLego+Prompt (after generatingPFandDF) Please provide the output in strict JSON format, without any additional text or explanation, including a PDDL domain file as'df'...

work page
[18]

:action open-door :parameters (?loc1 - location ?loc2 - location ?dir - direction)

work page
[19]

No additional memory available

:action move :parameters (?from - location ?to - location ?dir - direction) Note: in problem file's init, you shouldn't have "not ()" but only the single status This is previous domain file: (define (domain exploration) (:requirements :strips) (:types location direction) (:predicates (at ?loc - location) (connected ?loc1 - location ?loc2 - location ?dir -...

work page
[35]

For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it

If there are multiple actions needed to complete the task, you can break them down into smaller subgoals. For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it. In summary, the first stage is all about finding the object. This might involve going to an unvisited receptacle and opening it if ne...

work page
[38]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. E.7: ALFWorld – Detailed Prompt (after generatingPFandDF) Please provide the output in strict...

work page
[46]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - sinkbasinReceptacle)

work page
[48]

The process involves two main stages:

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ?sharp_o - sharpObject) You must go to a receptacle first in order to use/open it or take/put objects from/on it. The process involves two main stages:

work page
[49]

Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle

Always searching for the aim Object first!!! In this stage, your goal is to go to and may need to open new, unvisited recepatacles until you find the object mentioned in the task. Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle. You can only use the GotoLocation action to travel to a new location and ...

work page
[50]

Remember your goal is Your task is to: put some cloth on bathtubbasin

After you seeing the aim object in any receptacle, using the Object to Complete the Task: After you have located the object (the object may have some numbers added), you should always first pick up the object from that receptacle and update your goal to focus on how the object is used to complete the task. Remember your goal is Your task is to: put some c...

work page
[51]

For example, go to fridge, then cool the object with receptacle

If you want to heat, clean, and cool an object, after you go to that aim receptacle, do not put the object in the receptacle but do the action directly. For example, go to fridge, then cool the object with receptacle

work page
[52]

Don't forget to put the sharp object back to the receptacle after you finish slicing

If you want to slice an object, you should first go to the receptacle where both the sharp object and the aim object are located and ONLY pick up the sharp object then do the slice action. Don't forget to put the sharp object back to the receptacle after you finish slicing

work page
[53]

You don't need to take the lamp but directly use it

If you want to examine or look at an object with a lamp, you should first go to the receptacle where the object is located and then pick it up and take the USE action of the lamp. You don't need to take the lamp but directly use it

work page
[54]

For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it

If there are multiple actions needed to complete the task, you can break them down into smaller subgoals. For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it. In summary, the first stage is all about finding the object, this might involve going to an unvisited receptacle and opening it if ne...

work page
[57]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. This is previous domain file: (define (domain room_env) (:requirements :strips :typing :negat...

work page
[61]

take an object from another receptacle :action PickupObject :parameters (?o - object ?r - receptacle)

work page
[62]

put object into/on/in another receptacle :action PutObject :parameters (?o - object ?r - receptacle)

work page
[63]

using an object/receptacle by turning it on/ off with a switch :action useObject :parameters (?o - object)

work page
[65]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - receptacle)

work page
[67]

Goal 1.2: If the receptacle is closed, use the OpenObject action to reveal its contents

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ? sharp_o - object) Your process involves two main stages with the following subgoals: Stage 1: Search for the Target Object Goal 1.1: Move to a new, unvisited receptacle using the GotoLocation action. Goal 1.2: If the receptacle is closed, use the OpenObjec...

work page
[70]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. E.9: ALFWorld – Simple Prompt (after generatingPFandDF) Please provide the output in strict J...

work page
[71]

go to a receptacle :action GotoLocation :parameters (?from - receptacle ?to - receptacle)

work page
[72]

open a receptacle if it is closed :action OpenObject :parameters (?r - receptacle)

work page
[73]

close a receptacle :action CloseObject :parameters (?r - receptacle)

work page
[74]

take an object from another receptacle :action PickupObject :parameters (?o - object ?r - receptacle )

work page
[75]

put object into/on/in another receptacle :action PutObject :parameters (?o - object ?r - receptacle )

work page
[76]

using an object/receptacle by turning it on/off with a switch :action useObject :parameters (?o - object)

work page
[77]

heat an object using a receptacle :action HeatObject :parameters (?o - object ?r - microwaveReceptacle)

work page
[78]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - receptacle )

work page
[79]

cool an object using a receptacle :action CoolObject :parameters (?o - object ?r - fridgeReceptacle)

work page
[80]

Goal 1.2: If the receptacle is closed, use the OpenObject action to reveal its contents

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ?sharp_o - object) Your process involves two main stages with the following subgoals: Stage 1: Search for the Target Object Goal 1.1: Move to a new, unvisited receptacle using the GotoLocation action. Goal 1.2: If the receptacle is closed, use the OpenObject...

work page
[83]

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. This is previous domain file: (define (domain cleaning_task) (:requirements :strips :typing :...

work page
[84]

Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle

Always searching for the aim Object first!!! In this stage, your goal is to go to and may need to open new, unvisited recepatacles until you find the object mentioned in the task. Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle. You can only use the GotoLocation action to travel to a new location and ...

work page
[85]

This may involve more than simply transferring it from one place to another

Using the Object to Complete the Task: Once you have located and picked up the object, update your goal to focus on how the object is used to complete the task. This may involve more than simply transferring it from one place to another. For example: You might examine the object or a nearby receptacle to gather information. You may need to use another too...

work page
[86]

towelholder1

some receptacles have numbers in their names. Always keep them as they are. For example, "towelholder1" should not be changed to " towelholder"

work page
[87]

Your initial goal should always be to go to a new location instead of put something into somewhere

work page
[88]

actions": [

Do not enter stage 2 when not finishing stage 1. Memory of past steps: Action: look around You are in the middle of a room. Looking quickly around you, you see a armchair 2, a armchair 1, a coffeetable 2, a coffeetable 1, a diningtable 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1. If there are errors or obstacles, here is the message: No ...

work page

[1] [1]

Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brah- man, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, and Xiang Ren

Embodied agent interface: Benchmarking llms for embodied decision making.arXiv preprint arXiv:2410.07166. Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brah- man, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, and Xiang Ren. 2023. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks.Advances in N...

work page arXiv 2023

[2] [2]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Llm+ p: Empowering large language mod- els with optimal planning proficiency.arXiv preprint arXiv:2304.11477. Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, and Chris Callison-Burch. 2023. Faithful chain-of- thought reasoning. InProceedings of the 13th In- ternational Joint Conference on Natural Language Proces...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Clin: A continually learning language agent for rapid task adaptation and generalization.Preprint, arXiv:2310.10134. Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, and Hamid Palangi

work page arXiv

[4] [4]

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht

Plangen: A multi-agent framework for gener- ating planning and reasoning trajectories for complex problem solving.Preprint, arXiv:2502.16111. Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. 2021. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. InProceedings of the Internat...

work page arXiv 2021

[5] [5]

Karthik Valmeekam, Sarath Sreedharan, Matthew Mar- quez, Alberto Olmo, and Subbarao Kambhampati

Planbench: An extensible benchmark for eval- uating large language models on planning and reason- ing about change.Advances in Neural Information Processing Systems, 36. Karthik Valmeekam, Sarath Sreedharan, Matthew Mar- quez, Alberto Olmo, and Subbarao Kambhampati

work page

[6] [6]

Solver failed to produce a valid plan

On the planning abilities of large language models (a critical investigation with a proposed benchmark).Preprint, arXiv:2302.06706. Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, and Peter Jansen. 2023. Byte- sized32: A corpus and challenge task for generating task-specific world models expressed as text games. Preprint, arXiv:2305....

work page arXiv 2023

[7] [7]

You are in the backyard

move south EXEC RESULT OK, patio door opened, revealing backyard, agent in backyard (= patio) Observation 2 You open the patio door, revealing the backyard. You are in the backyard. Through an open patio door, to the North you see the kitchen. To the South you see the driveway. To the East you see the street. To the West you see a closed patio door. DF 2 ...

work page

[8] [8]

No door to open

move south DeepSeek-R1 GPT-4.1 o3-mini Metric PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ trial_count 100 100 100 100 100 100 100 100 100 succeed_count 55 5 41 94 12 48 52 49 86 success_rate 55% 5% 41% 94% 12% 48% 52% 49% 86% total_solver_errors – – 177 – – 114 – – 43 total_solver_fixed – – 134 – – 67 – – 32 solver_error_fix...

work page

[9] [9]

(Error fixed!) Observation 3 You are in the driveway

move south EXEC RESULT OK, moved to driveway. (Error fixed!) Observation 3 You are in the driveway. To the North you see the backyard. DF 3 (unchanged) PF 3 (:init (at driveway) (no-door backyard driveway south) (no-door backyard street east) (*@\textbf{(no-door driveway backyard north)}@*) ) (:goal (at street) ) SOLVER PLAN

work page

[10] [10]

Three planning frameworks per LLM; dashes mark metrics that are not defined for that framework

move east DeepSeek-R1 GPT-4.1 o3-mini Metric PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ PlanGen PDDLego PDDLego+ trial_count 100 100 100 100 100 100 100 100 100 succeed_count 17 4 19 5 1 7 5 3 38 success_rate 17% 4% 19% 5% 1% 7% 5% 3% 38% total_solver_errors – – 359 – – 181 – – 296 total_solver_fixed – – 292 – – 104 – – 266 solver_error_fix_rate – ...

work page

[11] [11]

I’m not sure what you mean

move north EXEC RESULT OK, moved to the supermarket and saw a coin there! Task accomplished. SUMMARY Errors encountered and fixed Missing (no-door ...) relation -> added after sim-error All subsequent plans executed without error Final state DF: full action model with door-exists / door DeepSeek-R1 GPT-4.1 Metric simple detailed simple detailed trial_coun...

work page

[12] [13]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) You should have a goal in the problem file like this: (:goal (at ?location) ) where location should be somewhere not visited Note: in problem file's init, you shouldn't have "not ()" but only the single status E.2: CoinCollector – DetailedPDDLego+ Prompt (after generatingPFandDF) ...

work page

[13] [15]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) You should have a goal in the problem file like this: (:goal (at ?location) ) where location should be somewhere not visited Note: in problem file's init, you shouldn't have "not ()" but only the single status This is previous domain file: (define (domain exploration) (:requiremen...

work page

[14] [17]

df": "

:action move :parameters (?from - location ?to - location ?dir - direction) Note: in problem file's init, you shouldn't have "not ()" but only the single status E.4: CoinCollector – SimplePDDLego+Prompt (after generatingPFandDF) Please provide the output in strict JSON format, without any additional text or explanation, including a PDDL domain file as'df'...

work page

[15] [18]

:action open-door :parameters (?loc1 - location ?loc2 - location ?dir - direction)

work page

[16] [19]

No additional memory available

:action move :parameters (?from - location ?to - location ?dir - direction) Note: in problem file's init, you shouldn't have "not ()" but only the single status This is previous domain file: (define (domain exploration) (:requirements :strips) (:types location direction) (:predicates (at ?loc - location) (connected ?loc1 - location ?loc2 - location ?dir -...

work page

[17] [35]

For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it

If there are multiple actions needed to complete the task, you can break them down into smaller subgoals. For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it. In summary, the first stage is all about finding the object. This might involve going to an unvisited receptacle and opening it if ne...

work page

[18] [38]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. E.7: ALFWorld – Detailed Prompt (after generatingPFandDF) Please provide the output in strict...

work page

[19] [46]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - sinkbasinReceptacle)

work page

[20] [48]

The process involves two main stages:

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ?sharp_o - sharpObject) You must go to a receptacle first in order to use/open it or take/put objects from/on it. The process involves two main stages:

work page

[21] [49]

Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle

Always searching for the aim Object first!!! In this stage, your goal is to go to and may need to open new, unvisited recepatacles until you find the object mentioned in the task. Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle. You can only use the GotoLocation action to travel to a new location and ...

work page

[22] [50]

Remember your goal is Your task is to: put some cloth on bathtubbasin

After you seeing the aim object in any receptacle, using the Object to Complete the Task: After you have located the object (the object may have some numbers added), you should always first pick up the object from that receptacle and update your goal to focus on how the object is used to complete the task. Remember your goal is Your task is to: put some c...

work page

[23] [51]

For example, go to fridge, then cool the object with receptacle

If you want to heat, clean, and cool an object, after you go to that aim receptacle, do not put the object in the receptacle but do the action directly. For example, go to fridge, then cool the object with receptacle

work page

[24] [52]

Don't forget to put the sharp object back to the receptacle after you finish slicing

If you want to slice an object, you should first go to the receptacle where both the sharp object and the aim object are located and ONLY pick up the sharp object then do the slice action. Don't forget to put the sharp object back to the receptacle after you finish slicing

work page

[25] [53]

You don't need to take the lamp but directly use it

If you want to examine or look at an object with a lamp, you should first go to the receptacle where the object is located and then pick it up and take the USE action of the lamp. You don't need to take the lamp but directly use it

work page

[26] [54]

For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it

If there are multiple actions needed to complete the task, you can break them down into smaller subgoals. For example, if you need to slice and then heat an object, first focus on slicing it, and then move on to heating it. In summary, the first stage is all about finding the object, this might involve going to an unvisited receptacle and opening it if ne...

work page

[27] [57]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. This is previous domain file: (define (domain room_env) (:requirements :strips :typing :negat...

work page

[28] [61]

take an object from another receptacle :action PickupObject :parameters (?o - object ?r - receptacle)

work page

[29] [62]

put object into/on/in another receptacle :action PutObject :parameters (?o - object ?r - receptacle)

work page

[30] [63]

using an object/receptacle by turning it on/ off with a switch :action useObject :parameters (?o - object)

work page

[31] [65]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - receptacle)

work page

[32] [67]

Goal 1.2: If the receptacle is closed, use the OpenObject action to reveal its contents

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ? sharp_o - object) Your process involves two main stages with the following subgoals: Stage 1: Search for the Target Object Goal 1.1: Move to a new, unvisited receptacle using the GotoLocation action. Goal 1.2: If the receptacle is closed, use the OpenObjec...

work page

[33] [70]

df": "

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. E.9: ALFWorld – Simple Prompt (after generatingPFandDF) Please provide the output in strict J...

work page

[34] [71]

go to a receptacle :action GotoLocation :parameters (?from - receptacle ?to - receptacle)

work page

[35] [72]

open a receptacle if it is closed :action OpenObject :parameters (?r - receptacle)

work page

[36] [73]

close a receptacle :action CloseObject :parameters (?r - receptacle)

work page

[37] [74]

take an object from another receptacle :action PickupObject :parameters (?o - object ?r - receptacle )

work page

[38] [75]

put object into/on/in another receptacle :action PutObject :parameters (?o - object ?r - receptacle )

work page

[39] [76]

using an object/receptacle by turning it on/off with a switch :action useObject :parameters (?o - object)

work page

[40] [77]

heat an object using a receptacle :action HeatObject :parameters (?o - object ?r - microwaveReceptacle)

work page

[41] [78]

clean an object using a receptacle :action CleanObject :parameters (?o - object ?r - receptacle )

work page

[42] [79]

cool an object using a receptacle :action CoolObject :parameters (?o - object ?r - fridgeReceptacle)

work page

[43] [80]

Goal 1.2: If the receptacle is closed, use the OpenObject action to reveal its contents

slice an object using a sharp object :action SliceObject :parameters (?r - receptacle ?co - object ?sharp_o - object) Your process involves two main stages with the following subgoals: Stage 1: Search for the Target Object Goal 1.1: Move to a new, unvisited receptacle using the GotoLocation action. Goal 1.2: If the receptacle is closed, use the OpenObject...

work page

[44] [83]

Do not enter stage 2 when not finishing stage 1. Note: Always include :negative preconditions in your :requirements whenever you use (not) or delete effects, and never leave an : precondition or :effect block empty, either omit it or include at least one literal. This is previous domain file: (define (domain cleaning_task) (:requirements :strips :typing :...

work page

[45] [84]

Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle

Always searching for the aim Object first!!! In this stage, your goal is to go to and may need to open new, unvisited recepatacles until you find the object mentioned in the task. Some receptacles cannot be opened so you can directly see what objects after you go to that receptacle. You can only use the GotoLocation action to travel to a new location and ...

work page

[46] [85]

This may involve more than simply transferring it from one place to another

Using the Object to Complete the Task: Once you have located and picked up the object, update your goal to focus on how the object is used to complete the task. This may involve more than simply transferring it from one place to another. For example: You might examine the object or a nearby receptacle to gather information. You may need to use another too...

work page

[47] [86]

towelholder1

some receptacles have numbers in their names. Always keep them as they are. For example, "towelholder1" should not be changed to " towelholder"

work page

[48] [87]

Your initial goal should always be to go to a new location instead of put something into somewhere

work page

[49] [88]

actions": [

Do not enter stage 2 when not finishing stage 1. Memory of past steps: Action: look around You are in the middle of a room. Looking quickly around you, you see a armchair 2, a armchair 1, a coffeetable 2, a coffeetable 1, a diningtable 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1. If there are errors or obstacles, here is the message: No ...

work page