arxiv: 2603.08388 · v4 · submitted 2026-03-09 · 💻 cs.AI

Recognition: no theorem link

A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

Cong Cao , Jingyao Zhang , Kun Tong

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:44 UTC · model grok-4.3

classification 💻 cs.AI

keywords autonomous agentserror classificationcausal graphsLLM action generationstrategy selectionfailure analysishierarchical frameworkmulti-dimensional metrics

0 comments

The pith

The HECG framework improves autonomous agents by aligning quantitative metrics with semantic scores, classifying failures into ten error types, and retrieving causal subgraphs for more reliable strategy selection and recovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the Hierarchical Error-Corrective Graph Framework to strengthen agents that generate actions with large language models during complex multi-step tasks. It targets imprecise strategy choice, vague failure feedback, and limited context use by combining task quality, cost, reward, and semantic scores for selection, by breaking errors into ten specific categories with severity and recoverability details, and by building causal graphs from past states and actions to pull relevant subgraphs. These elements aim to cut negative transfer, supply clear correction guidance, and capture structural task relationships beyond vector similarity. A reader would care because agents often repeat mistakes when they lack structured ways to learn from quantitative and contextual signals together.

Core claim

The HECG framework incorporates MDTS for multi-dimensional alignment between quantitative performance and semantic context, EMC for structured attribution of task failures into ten error types, and CCGR for identifying relevant subgraphs from causal dependencies, enabling more precise strategy selection, root-cause analysis, and improved execution reliability in complex multi-step tasks.

What carries the argument

The Hierarchical Error-Corrective Graph that stores executed actions, states, and transferable strategies as nodes connected by causal dependency edges, operated by MDTS for metric alignment, EMC for ten-type error decomposition, and CCGR for subgraph retrieval.

If this is right

Strategy selection draws on combined quality, cost, reward, and LLM semantic scores instead of single metrics to lower the chance of negative transfer.
Failures receive attribution to one of ten error types with severity, typical actions, and recoverability data for targeted optimization.
Context retrieval uses causal edges in graphs to find structurally related past sequences rather than relying only on vector similarity.
Overall execution reliability rises in multi-step tasks through better use of historical states, actions, and events.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This structure could layer onto existing LLM planning loops to add explicit error tracking without redesigning the core generator.
The graph of past executions might support building reusable libraries that transfer successful patterns across related but non-identical tasks.
Testing on domains with rapidly changing conditions would reveal whether the fixed ten error categories need expansion or domain tuning.

Load-bearing premise

The ten error categories cover all failure modes in dynamic environments and causal graphs from historical states can be built and queried efficiently without missing key dependencies or adding overhead.

What would settle it

Run the agent on a task containing a failure outside the ten defined error types or where graph construction omits a critical causal link, then measure whether success rate, recovery speed, or adaptation quality shows no gain over a baseline agent without HECG.

Figures

Figures reproduced from arXiv: 2603.08388 by Cong Cao, Jingyao Zhang, Kun Tong.

**Figure 1.** Figure 1: A categorization of autonomous robot methods. corrective refinement. Furthermore, retrieval or reuse of prior experience in dynamic environments typically depends on flat similarity matching, overlooking the causal and sequential dependencies embedded in historical state–action trajectories, thereby constraining generalization and longhorizon adaptability. This paper proposes a Hierarchical Error Correct… view at source ↗

**Figure 2.** Figure 2: Structure of HECG Transition Policy.The agent selects among alternative action strategies by integrating task value, cost, risk, and LLM-based scores, and determines the final action via a softmax policy under partial observability. Example: Transition Selection with LLM Participation. Consider a node 𝑣𝑖 corresponding to the subtask “pick up a mug from the table”, with three outgoing transitions: • Main ed… view at source ↗

**Figure 3.** Figure 3: LLM Goal Compliance Evaluation Results. • LLM Planner (Flat): A conventional LLM-based planner that generates the entire action sequence in a single pass, without hierarchical correction or replanning mechanisms. • HECG w/o Transition: A variant of our model that includes hierarchical error correction but removes the learned transition policy, allowing us to measure the impact of explicit state transition… view at source ↗

**Figure 4.** Figure 4: Heatmap of Goal Compliance. (average 0.400), which limits its final performance. While it performs competitively on tasks such as put dishwasher in livingroom and bedroom (final score 0.427), it also exhibits unstable behavior on some tasks (e.g., setup table in livingroom and bedroom, final score 0.133). Overall, GPT-5 Mini achieves the best balance between recall, precision, and sequence efficiency, ind… view at source ↗

**Figure 5.** Figure 5: Evaluation Metrics of Task Plan executions. For each execution, TSR is computed as the ratio of successfully achieved goals to the total number of goals, and the final TSR is obtained by averaging this ratio across all executions. Formally defined as follows: TSR = 1 𝑁 ∑ 𝑁 𝑖=1 𝑁success 𝑁total (18) where 𝑁success is the number of goals completed successfully and 𝑁total is the total number of expected goals… view at source ↗

**Figure 6.** Figure 6: Threshold Sensitivity Analysis & Comprehensive Performance Score By Policy Variant and Model weights this option). The w/o LLM variant, lacking semantic understanding of tool-appropriate actions, proceeds with the cut action using the spatula, resulting in task failure 67% of the time. 4.5.2. Error-Triggered Transition Analysis We further analyze how the different policy variants respond to error conditio… view at source ↗

**Figure 7.** Figure 7: Transition Type Distribution Under Different Error Regimes [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Task-Level Comprehensive Performance Score by Policy Variant and Model Term affects long-horizon planning, often increasing recovery steps while producing mixed effects on TSR, and the Cost Term improves efficiency by reducing unnecessary recovery actions, though its absence causes moderate TSR degradation across tasks. Threshold sensitivity analysis further demonstrates that the Full Policy is robust acr… view at source ↗

read the original abstract

We propose a Hierarchical Error-Corrective Graph FrameworkforAutonomousAgentswithLLM-BasedActionGeneration(HECG),whichincorporates three core innovations: (1) Multi-Dimensional Transferable Strategy (MDTS): by integrating task quality metrics (Q), confidence/cost metrics (C), reward metrics (R), and LLM-based semantic reasoning scores (LLM-Score), MDTS achieves multi-dimensional alignment between quantitative performance and semantic context, enabling more precise selection of high-quality candidate strate gies and effectively reducing the risk of negative transfer. (2) Error Matrix Classification (EMC): unlike simple confusion matrices or overall performance metrics, EMC provides structured attribution of task failures by categorizing errors into ten types, such as Strategy Errors (Strategy Whe) and Script Parsing Errors (Script-Parsing-Error), and decomposing them according to severity, typical actions, error descriptions, and recoverability. This allows precise analysis of the root causes of task failures, offering clear guidance for subsequent error correction and strategy optimization rather than relying solely on overall success rates or single performance metrics. (3) Causal-Context Graph Retrieval (CCGR): to enhance agent retrieval capabilities in dynamic task environments, we construct graphs from historical states, actions, and event sequences, where nodes store executed actions, next-step actions, execution states, transferable strategies, and other relevant information, and edges represent causal dependencies such as preconditions for transitions between nodes. CCGR identifies subgraphs most relevant to the current task context, effectively capturing structural relationships beyond vector similarity, allowing agents to fully leverage contextual information, accelerate strategy adaptation, and improve execution reliability in complex, multi-step tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a design proposal for an LLM agent framework using multi-dimensional strategies, ten error types, and causal graphs, but it has no experiments or results to support any performance claims.

read the letter

This paper outlines a hierarchical framework for LLM agents that aims to improve reliability through better strategy selection, error analysis, and context retrieval. The main contribution is the combination of multi-dimensional metrics for strategies, a detailed ten-type error matrix, and causal graph-based retrieval. What is new is how these are assembled into one system. MDTS integrates quantitative metrics with semantic LLM scores to pick strategies more carefully. EMC breaks down failures into specific categories like strategy and parsing errors, with notes on how recoverable they are. CCGR builds graphs from past executions to find relevant causal subgraphs for the current task. This goes beyond standard retrieval methods and could help with root cause analysis in complex scenarios. The description is straightforward and the motivation for handling multi-step tasks in dynamic environments is reasonable. It gives credit to the idea of structured error handling over just looking at overall success rates. The soft spot is obvious: no experiments or results are presented. Claims about improved precision and reliability are not tested against any baselines or datasets. This makes it hard to judge if the ten error types are comprehensive or if the graphs can be maintained efficiently without missing dependencies. The work stays at the level of a design proposal. This would be of interest to researchers focused on making autonomous agents more robust using LLMs. Readers looking for validated methods might find it preliminary, but those exploring new frameworks could get ideas from it. I would not push for peer review yet. Adding some small-scale experiments or case studies would make it stronger and more likely to get constructive feedback from referees.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes the Hierarchical Error-Corrective Graph Framework (HECG) for autonomous agents that use LLM-based action generation. It introduces three components: Multi-Dimensional Transferable Strategy (MDTS), which integrates task quality (Q), confidence/cost (C), reward (R), and LLM-Score metrics to align quantitative performance with semantic context for improved strategy selection and reduced negative transfer; Error Matrix Classification (EMC), which attributes failures to ten error types (e.g., Strategy Errors, Script-Parsing-Error) decomposed by severity, typical actions, descriptions, and recoverability for root-cause analysis; and Causal-Context Graph Retrieval (CCGR), which builds graphs from historical states, actions, and event sequences with causal-dependency edges to retrieve relevant subgraphs for better adaptation in multi-step tasks.

Significance. If the claimed benefits hold, the framework could advance reliable LLM-driven agents by replacing aggregate success rates with structured, multi-dimensional error attribution and causal retrieval that captures dependencies beyond vector similarity. The explicit ten-category error taxonomy and graph-based context modeling offer interpretable mechanisms for strategy optimization that address common limitations in current agent systems.

major comments (1)

[Abstract] Abstract: the claims that MDTS enables 'more precise selection of high-quality candidate strategies' and 'effectively reducing the risk of negative transfer', that EMC provides 'precise analysis of the root causes of task failures', and that CCGR 'improve[s] execution reliability in complex, multi-step tasks' are advanced without any experimental results, baselines, success rates, ablation studies, overhead measurements, or validation data reported in the manuscript. This absence leaves the central performance assertions unsupported.

minor comments (1)

[Abstract] The title and abstract contain concatenated text without spaces (e.g., 'FrameworkforAutonomousAgentswithLLM-BasedActionGeneration' and 'strate gies').

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript proposing the Hierarchical Error-Corrective Graph Framework (HECG). We agree that the abstract advances performance claims without supporting empirical evidence, which is a valid concern given the current content of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claims that MDTS enables 'more precise selection of high-quality candidate strategies' and 'effectively reducing the risk of negative transfer', that EMC provides 'precise analysis of the root causes of task failures', and that CCGR 'improve[s] execution reliability in complex, multi-step tasks' are advanced without any experimental results, baselines, success rates, ablation studies, overhead measurements, or validation data reported in the manuscript. This absence leaves the central performance assertions unsupported.

Authors: We concur with this assessment. The submitted manuscript focuses on describing the design of the three core components (MDTS, EMC, and CCGR) and does not include any experimental results, baselines, success rates, ablation studies, overhead measurements, or validation data. To address this, we will revise the abstract to present the stated benefits as intended outcomes of the framework design rather than established results. We will also add a dedicated experimental section to the revised manuscript that includes comparisons against baselines, quantitative success rates, ablation studies on each component, and overhead measurements to provide empirical support for the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: framework defined descriptively without reduction to inputs

full rationale

The paper proposes HECG via three explicitly defined components (MDTS, EMC, CCGR) whose claimed benefits for strategy selection and reliability are stated as direct consequences of the design choices in the abstract and full text. No equations, fitted parameters, predictions, or self-citations appear that would reduce any result to its own inputs by construction. The load-bearing steps are definitional rather than derivational, leaving the manuscript self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

4 free parameters · 2 axioms · 3 invented entities

The framework rests on domain assumptions about error categorization and graph causality plus several metric definitions that function as free parameters.

free parameters (4)

Task quality metrics (Q)
Used in MDTS for strategy selection; definition and weighting not specified.
Confidence/cost metrics (C)
Integrated into multi-dimensional alignment; scaling and combination rules unspecified.
Reward metrics (R)
Part of MDTS scoring; how rewards are computed or normalized is undefined.
LLM-Score
Semantic reasoning score from LLM; prompting method and aggregation details are free choices.

axioms (2)

domain assumption Errors in agent tasks can be exhaustively partitioned into ten distinct types with associated severity, actions, descriptions, and recoverability
Invoked by EMC to provide structured attribution instead of overall success rates.
domain assumption Historical states, actions, and event sequences can be represented as graphs with nodes containing actions and states and edges encoding causal preconditions
Basis for CCGR subgraph retrieval.

invented entities (3)

Multi-Dimensional Transferable Strategy (MDTS) no independent evidence
purpose: Achieve alignment between quantitative metrics and semantic context for strategy selection
New named component introduced to reduce negative transfer risk.
Error Matrix Classification (EMC) no independent evidence
purpose: Provide structured root-cause analysis of failures beyond confusion matrices
New ten-type categorization scheme.
Causal-Context Graph Retrieval (CCGR) no independent evidence
purpose: Capture structural causal relationships for context retrieval beyond vector similarity
New graph construction and retrieval method for dynamic tasks.

pith-pipeline@v0.9.0 · 5600 in / 1650 out tokens · 44689 ms · 2026-05-15T14:44:34.435289+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 6 internal anchors

[1]

Z. Zhao, S. Cheng, Y. Ding, Z. Zhou, S. Zhang, D. Xu, and Y. Zhao. A Survey of Optimization-Based Task and Motion Planning: From Classical to Learning Approaches.IEEE/ASME Transactions on Mechatronics, 30:2799–2825, 2024

work page 2024
[2]

H. Zhao, Y. Guo, Y. Liu, and J. Jin. Multirobot unknown environ- ment exploration and obstacle avoidance based on a Voronoi dia- gram and reinforcement learning.Expert Systems with Applications, 264:125900, 2025

work page 2025
[3]

Şenbaşlar, B., & Sukhatme, G. S.: Dream: Decentralized real- time asynchronous probabilistic trajectory planning for collision-free multi-robot navigation in cluttered environments.IEEE Transactions on Robotics(2024)

work page 2024
[4]

Garrabé,É.,Teixeira,P.,Khoramshahi,M.,&Doncieux,S.:Enhanc- ing Robustness in Language-Driven Robotics: A Modular Approach to Failure Reduction.arXiv preprint arXiv:2411.05474(2024)

work page arXiv 2024
[5]

Ahn, M., Brohan,A., Brown, N., Chebotar, Y.,Cortes, O., David, B., ...&Zeng,A.:DoasIcan,notasIsay:Groundinglanguageinrobotic affordances.arXiv preprint arXiv:2204.01691(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[6]

In:Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

Joublin, F., Ceravola, A., Smirnov, P., Ocker, F., Deigmoeller, J., Belardinelli, A., Wang, C., Hasler, S., Tanneberg, D., & Gienger, M.: Copal: Corrective planning of robot actions with large language models. In:Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 8664–8670. IEEE (2024)

work page 2024
[7]

T., Lu, K., & Havoutis, I.: InteLiPlan: An Interactive Lightweight LLM-Based Planner for Domestic Robot Autonomy

Ly, K. T., Lu, K., & Havoutis, I.: InteLiPlan: An Interactive Lightweight LLM-Based Planner for Domestic Robot Autonomy. IEEE Robotics and Automation Letters(2026)

work page 2026
[8]

Singh, V

I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, et al. ProgPrompt: Generating situated robot task plans using large language models.arXiv preprint arXiv:2209.11302, 2022

work page arXiv 2022
[9]

L., Wang, W., Wang, R., Suh, D., Amosa, T

Obi, I., Venkatesh, V. L., Wang, W., Wang, R., Suh, D., Amosa, T. I., ... & Min, B. C.: SafePlan: Leveraging formal logic and chain- of-thought reasoning for enhanced safety in LLM-based robotic task planning.arXiv preprint arXiv:2503.06892(2025)

work page arXiv 2025
[10]

1233–1239

Ao, J., Wu, F., Wu, Y., Swiki, A., & Haddadin, S.: LLM-as-BT- Planner: Leveraging LLMs for behavior tree generation in robot task planning.In:Proceedingsofthe2025IEEEInternationalConference on Robotics and Automation (ICRA), pp. 1233–1239. IEEE (2025)

work page 2025
[11]

Borate, S., Pardeshi, V., & Vadali, M.: LLM-Based General- izable Hierarchical Task Planning and Execution for Heteroge- neous Robot Teams with Event-Driven Replanning.arXiv preprint arXiv:2511.22354(2025)

work page arXiv 2025
[12]

Robot planning with LLMs.Nature Machine Intelligence, vol. 7, p. 521, Apr. 2025, doi:10.1038/s42256-025-01036-4. :contentRefer- ence[oaicite:0]index=0

work page doi:10.1038/s42256-025-01036-4 2025
[13]

Integratinglargelanguagemodels for intuitive robot navigation,

Z.Xue,A.Elksnis,andN.Wang,“Integratinglargelanguagemodels for intuitive robot navigation,”Frontiers in Robotics and AI, vol. 12, article 1627937, Sept. 2025, doi:10.3389/frobt.2025.1627937. :con- tentReference[oaicite:1]index=1

work page doi:10.3389/frobt.2025.1627937 2025
[14]

A survey on integration of large language models with intelligent robots,

Y. Kim, D. Kim, J. Choi, J. Park, N. Oh, and D. Park, “A survey on integration of large language models with intelligent robots,” Intelligent Service Robotics, vol. 17, no. 5, pp. 1091–1107, 2024

work page 2024
[15]

Siciliano, O

B. Siciliano, O. Khatib, and T. Kröger, Eds.,Springer Handbook of Robotics. Berlin, Germany: Springer, 2008

work page 2008
[16]

Colledanchise and P

M. Colledanchise and P. Ögren,Behavior Trees in Robotics and AI: An Introduction. Boca Raton, FL, USA: CRC Press, 2018

work page 2018
[17]

SDA- PLANNER: State-dependency aware adaptive planner for embodied task planning,

Z. Shen, C. Gao, J. Yuan, T. Zhu, X. Fu, and Q. Sun, “SDA- PLANNER: State-dependency aware adaptive planner for embodied task planning,”arXiv preprint arXiv:2509.26375, 2025

work page arXiv 2025
[18]

Grounding language models with semantic digital twins for robotic planning,

M. Naeem, A. Melnik, and M. Beetz, “Grounding language models with semantic digital twins for robotic planning,”arXiv preprint arXiv:2506.16493, 2025

work page arXiv 2025
[19]

In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R

Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierar- chical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pp. 3675–3683 (2016)

work page 2016
[20]

T., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.In:NeurIPS2020,AdvancesinNeuralInformationProcessing Systems (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W. T., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.In:NeurIPS2020,AdvancesinNeuralInformationProcessing Systems (2020)

work page 2020
[21]

W.: Retrieval augmented language model pre-training

Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M. W.: Retrieval augmented language model pre-training. In: Proc. 37th International Conference on Machine Learning (ICML), pp. 3929–3938. PMLR (2020)

work page 2020
[22]

A Generalist Agent

Reed,S.,Zolna,K.,Parisotto,E.,Colmenarejo,S.G.,Novikov,A.,et al.: A Generalist Agent. arXiv:2205.06175 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

S., O’Brien, J

Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., Bernstein, M. S.: Generative agents: Interactive simulacra of human behavior. In: Proc. 36th Annual ACM Symposium on User Interface Software and Technology (UIST), pp. 1–22. ACM (2023)

work page 2023
[24]

J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs

Johnson, J., Krishna, R., Stark, M., Li, L. J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: CVPR 2015, pp. 3668–3678. IEEE (2015)

work page 2015
[25]

Relational inductive biases, deep learning, and graph networks

Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., et al.: Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

IEEE Transactions on Neural Networks and Learning Systems, 33(8), pp

Jiang,Y.,Wu,Y.,Li,H.,Zhao,D.:Graph-basedreinforcementlearn- ing: A survey. IEEE Transactions on Neural Networks and Learning Systems, 33(8), pp. 3519–3539 (2022)

work page 2022
[27]

In: Proc

Pritzel, A., Uria, B., Srinivasan, S., Puigdomènech, A., et al.: Neural episodic control. In: Proc. 34th International Conference on Machine Learning (ICML), pp. 2827–2836. PMLR (2017)

work page 2017
[28]

Model-Free Episodic Control

Blundell, C., Uria, B., Pritzel, A., Li, Y., et al.: Model-free episodic control. arXiv:1606.04460 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

R., Cao,Y.:ReAct:Synergizingreasoningandactinginlanguagemodels

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., Cao,Y.:ReAct:Synergizingreasoningandactinginlanguagemodels. In: Proc. 11th International Conference on Learning Representations (ICLR 2023) (2022)

work page 2023
[30]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Ichter, B.: Inner Monologue: Embodied Reasoning through Planning with C. Cao et al.:Preprint submitted to ElsevierPage 18 of 19 Hierarchical Error-Corrective Graph Framework Language Models. arXiv preprint arXiv:2207.05608 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[31]

Technical Report CVC TR-98-003/DCS TR- 1165, Yale Center for Computational Vision and Control (1998)

McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL—The Planning Domain Definition Language. Technical Report CVC TR-98-003/DCS TR- 1165, Yale Center for Computational Vision and Control (1998)

work page 1998
[32]

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., Narasimhan, K.: Tree of Thoughts: Deliberate Problem Solving with LargeLanguageModels.In:AdvancesinNeuralInformationProcess- ing Systems (NeurIPS), vol. 36, pp. 11809–11822 (2023)

work page 2023
[33]

In: Advances in Neural Information Processing Systems (NeurIPS), vol

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: Language Agents with Verbal Reinforcement Learning. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 8634–8652 (2023)

work page 2023
[34]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Anandkumar, A.: Voyager: An Open-Ended Embodied Agent with Large Language Models. CoRR abs/2305.16291 (2023) C. Cao et al.:Preprint submitted to ElsevierPage 19 of 19

work page internal anchor Pith review Pith/arXiv arXiv 2023