pith. machine review for the scientific record. sign in

arxiv: 2308.11432 · v7 · submitted 2023-08-22 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

A Survey on Large Language Model based Autonomous Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:59 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords LLM-based autonomous agentsunified frameworkagent constructionapplicationsevaluation strategieschallengesfuture directions
0
0 comments X

The pith

A unified framework organizes the construction of most LLM-based autonomous agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews how large language models enable autonomous agents that learn from broad web knowledge rather than isolated training. The authors introduce one framework that structures the main components used in prior agent designs, covering perception, memory, planning, and action. It then maps applications across social science, natural science, and engineering domains while reviewing common evaluation methods. The work closes by listing challenges such as reliability and long-term coherence along with suggested research directions.

Core claim

The paper establishes a unified framework for LLM-based autonomous agents that integrates the core modules appearing across most existing architectures, then applies this lens to catalog construction approaches, applications in social, natural, and engineering fields, and evaluation strategies, while surfacing open challenges.

What carries the argument

The unified framework for LLM-based autonomous agents, which organizes components such as memory, planning, tool use, and feedback loops to describe prior designs.

If this is right

  • Most prior LLM-agent work can be categorized under the same construction, application, and evaluation headings.
  • Agents built this way can tackle tasks that require human-like reasoning in social simulation and scientific domains.
  • Evaluation combines automated metrics with human assessment of task success and reasoning steps.
  • Future progress depends on solving reliability, safety, and long-horizon planning issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework can serve as a checklist for designing new agents by highlighting which modules are often missing.
  • Maintaining the linked repository could support community-wide tracking as new papers appear rapidly.
  • Hybrid systems that combine the framework with non-LLM planning methods may address some current limitations.

Load-bearing premise

The proposed framework is general enough to encompass the majority of existing LLM-agent architectures without major omissions or forced groupings.

What would settle it

A substantial set of published LLM-agent papers whose designs cannot be mapped onto the framework's main stages without significant distortion would show the framework is not sufficiently general.

read the original abstract

Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at https://github.com/Paitesanshi/LLM-Agent-Survey.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper surveys the emerging field of LLM-based autonomous agents. It proposes a unified framework for agent construction that is claimed to encompass a majority of prior work, provides a comprehensive overview of applications across social science, natural science, and engineering domains, reviews common evaluation strategies, and discusses challenges and future directions while maintaining an online repository for ongoing updates.

Significance. If the unified framework successfully organizes the majority of existing LLM-agent architectures without major omissions, the survey will serve as a valuable reference point for researchers, helping to structure a rapidly expanding literature and identify cross-domain applications and evaluation practices.

minor comments (2)
  1. [§3] The claim that the framework 'encompasses a majority of the previous work' (abstract and §3) would be strengthened by an explicit count or table showing how many surveyed papers map to each component of the framework versus any notable omissions.
  2. [Applications sections] In the applications overview, some domain-specific examples could include more direct citations to the original LLM-agent papers rather than secondary references to improve traceability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. The assessment correctly identifies the core contributions of our unified framework for LLM-based agent construction, the cross-domain application overview, and the maintained repository for updates.

Circularity Check

0 steps flagged

No significant circularity in literature survey synthesis

full rationale

This is a pure survey paper whose central contribution is a descriptive unified framework for organizing prior LLM-agent literature. No equations, fitted parameters, predictions, or derivations appear in the provided text. The framework is explicitly positioned as an encompassing lens drawn from external cited works rather than derived internally or justified via self-citation chains. All content rests on external references; the survey structure (construction, applications, evaluation) introduces no self-definitional loops, fitted-input predictions, or load-bearing self-citations that reduce the claims to the paper's own inputs by construction. This matches the expected non-circular outcome for honest literature synthesis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper that synthesizes existing research without introducing new fitted parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5564 in / 905 out tokens · 26934 ms · 2026-05-15T03:59:17.330238+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

    cs.CR 2026-04 unverdicted novelty 8.0

    Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.

  2. Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems

    cs.AI 2026-05 unverdicted novelty 7.0

    The Agent-First Tool API paradigm raises AI agent task success from 64% to 88% and cuts human interventions by 72.7% through semantic phases, structured contracts, and risk governance in a production enterprise system.

  3. OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

    cs.AI 2026-05 unverdicted novelty 6.0

    OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

  4. EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

    cs.AI 2026-05 unverdicted novelty 6.0

    EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...

  5. Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

    cs.CR 2026-05 unverdicted novelty 6.0

    ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/e...

  6. An AI Agent Execution Environment to Safeguard User Data

    cs.CR 2026-04 unverdicted novelty 6.0

    GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...

  7. Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

    cs.CR 2026-04 unverdicted novelty 6.0

    Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.

  8. In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach

    cs.AI 2026-04 unverdicted novelty 6.0

    A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.

  9. SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    cs.CR 2026-02 unverdicted novelty 6.0

    The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.

  10. OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

    cs.CL 2024-10 unverdicted novelty 6.0

    OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

  11. Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

    cs.CL 2026-05 unverdicted novelty 5.0

    Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.

  12. Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

    cs.AI 2026-04 unverdicted novelty 5.0

    Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.

  13. Toward Explanatory Equilibrium: Verifiable Reasoning as a Coordination Mechanism under Asymmetric Information

    cs.MA 2026-04 unverdicted novelty 5.0

    Structured reasoning artifacts enable coordination in LLM multi-agent systems by preventing approval and welfare collapse under asymmetric information while keeping bad-approval rates low across audit regimes.

  14. EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments

    cs.MA 2026-05 unverdicted novelty 4.0

    EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.

  15. SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

    cs.AI 2026-04 unverdicted novelty 4.0

    SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.

  16. Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

    cs.AI 2026-04 unverdicted novelty 4.0

    Aethon enables near-constant-time instantiation of stateful AI agents via reference-based replication over compositional views, layered memory, and copy-on-write semantics.

  17. OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains

    cs.AI 2026-04 unverdicted novelty 4.0

    OpenKedge redefines AI agent state mutations as a governed process using intent proposals, policy-evaluated execution contracts, and cryptographic evidence chains to enable safe, auditable agentic behavior.

  18. Understanding the planning of LLM agents: A survey

    cs.AI 2024-02 accept novelty 4.0

    A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.

  19. The Rise and Potential of Large Language Model Based Agents: A Survey

    cs.AI 2023-09 accept novelty 4.0

    The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

  20. The End of the Foundation Model Era: Open-Weight Models, Sovereign AI, and Inference as Infrastructure

    cs.CY 2026-03 unverdicted novelty 3.0

    Open-weight models have ended the foundation model era by eliminating pre-training as a durable moat and enabling sovereign AI control through direct access to model weights.

  21. A Survey on the Memory Mechanism of Large Language Model based Agents

    cs.AI 2024-04 accept novelty 3.0

    A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

  22. Large Language Models: A Survey

    cs.CL 2024-02 accept novelty 3.0

    The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

Reference graph

Works this paper leans on

185 extracted references · 185 canonical work pages · cited by 22 Pith papers · 25 internal anchors

  1. [1]

    Human-level control through deep reinforcement learning

    Mnih V , Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, others . Human-level control through deep reinforcement learning. nature, 2015, 518(7540): 529–533

  2. [2]

    Continuous control with deep reinforcement learning

    Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y , Silver D, Wierstra D. Continuous con- trol with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

  3. [3]

    Proximal Policy Optimization Algorithms

    Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

  4. [4]

    Soft actor- critic: O ff-policy maximum entropy deep reinforce- ment learning with a stochastic actor

    Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor- critic: O ff-policy maximum entropy deep reinforce- ment learning with a stochastic actor. In: International conference on machine learning. 2018, 1861–1870

  5. [5]

    Language models are few-shot learners

    Brown T, Mann B, Ryder N, Subbiah M, Kaplan J D, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, others . Language models are few-shot learners. Advances in neural information processing systems, 2020, 33: 1877–1901

  6. [6]

    Language models are unsuper- vised multitask learners

    Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, others . Language models are unsuper- vised multitask learners. OpenAI blog, 2019, 1(8): 9

  7. [7]

    GPT-4 Technical Report

    Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, Almeida D, Altenschmidt J, Altman S, Anadkat S, others . Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  8. [8]

    Model card and evaluations for claude models

    Anthropic . Model card and evaluations for claude models. https://www-files. anthropic.com/production/images/ Model-Card-Claude-2.pdf?ref= maginative.com, 2023

  9. [9]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, others . Llama: Open and e fficient foundation lan- guage models. arXiv preprint arXiv:2302.13971, 2023

  10. [10]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y , Bashlykov N, Batra S, Bhargava P, Bhosale S, others . Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

  11. [11]

    Genera- tive adversarial user model for reinforcement learning based recommendation system

    Chen X, Li S, Li H, Jiang S, Qi Y , Song L. Genera- tive adversarial user model for reinforcement learning based recommendation system. In: International Con- ference on Machine Learning. 2019, 1052–1061

  12. [12]

    Reflexion: Language agents with verbal reinforcement learning

    Shinn N, Cassano F, Gopinath A, Narasimhan K, Yao S. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2024, 36

  13. [13]

    Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face

    Shen Y , Song K, Tan X, Li D, Lu W, Zhuang Y . Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Pro- cessing Systems, 2024, 36

  14. [14]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Qin Y , Liang S, Ye Y , Zhu K, Yan L, Lu Y , Lin Y , Cong X, Tang X, Qian B, others . Toolllm: Facilitating large language models to master 16000 + real-world apis. arXiv preprint arXiv:2307.16789, 2023

  15. [15]

    Toolformer: Language models can teach themselves to use tools

    Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 2024, 36

  16. [16]

    Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory

    Zhu X, Chen Y , Tian H, Tao C, Su W, Yang C, Huang G, Li B, Lu L, Wang X, others . Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023

  17. [17]

    Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker

    Sclar M, Kumar S, West P, Suhr A, Choi Y , Tsvetkov Y . Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv preprint arXiv:2306.00924, 2023

  18. [18]

    ChatDev: Communicative Agents for Software Development

    Qian C, Cong X, Yang C, Chen W, Su Y , Xu J, Liu Z, Sun M. Communicative agents for software develop- ment. arXiv preprint arXiv:2307.07924, 2023

  19. [19]

    al. e C. Agentverse. https://github.com/ OpenBMB/AgentVerse, 2023

  20. [20]

    Generative agents: Interactive simu- lacra of human behavior

    Park J S, O’Brien J, Cai C J, Morris M R, Liang P, Bernstein M S. Generative agents: Interactive simu- lacra of human behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–22

  21. [21]

    Recagent: A novel simulation paradigm for recommender systems

    Wang L, Zhang J, Chen X, Lin Y , Song R, Zhao W X, Wen J R. Recagent: A novel simulation paradigm for recommender systems. arXiv preprint arXiv:2306.02552, 2023

  22. [22]

    Building cooperative embodied agents modularly with large language models

    Zhang H, Du W, Shan J, Zhou Q, Du Y , Tenenbaum J B, Shu T, Gan C. Building cooperative embodied agents modularly with large language models. arXiv Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 35 preprint arXiv:2307.02485, 2023

  23. [23]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    Hong S, Zheng X, Chen J, Cheng Y , Wang J, Zhang C, Wang Z, Yau S K S, Lin Z, Zhou L, others . Metagpt: Meta programming for multi-agent collaborative frame- work. arXiv preprint arXiv:2308.00352, 2023

  24. [24]

    Self-collaboration code generation via chatgpt

    Dong Y , Jiang X, Jin Z, Li G. Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590, 2023

  25. [25]

    Personality traits in large language models

    Safdari M, Serapio-García G, Crepy C, Fitz S, Romero P, Sun L, Abdulhai M, Faust A, Matari ´c M. Person- ality traits in large language models. arXiv preprint arXiv:2307.00184, 2023

  26. [26]

    Measuring thirty facets of the five factor model with a 120-item public domain inventory: De- velopment of the ipip-neo-120

    Johnson J A. Measuring thirty facets of the five factor model with a 120-item public domain inventory: De- velopment of the ipip-neo-120. Journal of research in personality, 2014, 51: 78–89

  27. [27]

    Big five inventory

    John O P, Donahue E M, Kentle R L. Big five inventory. Journal of Personality and Social Psychology, 1991

  28. [28]

    Toxicity in chatgpt: Analyzing persona-assigned language models

    Deshpande A, Murahari V , Rajpurohit T, Kalyan A, Narasimhan K. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023

  29. [29]

    Out of one, many: Using language models to simulate human samples

    Argyle L P, Busby E C, Fulda N, Gubler J R, Rytting C, Wingate D. Out of one, many: Using language models to simulate human samples. Political Analysis, 2023, 31(3): 337–351

  30. [30]

    Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi)

    Fischer K A. Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi). arXiv preprint arXiv:2305.12647, 2023

  31. [31]

    Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning

    Rana K, Haviland J, Garg S, Abou-Chakra J, Reid I, Suenderhauf N. Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. In: 7th Annual Conference on Robot Learn- ing. 2023

  32. [32]

    Calypso: Llms as dungeon master’s assistants

    Zhu A, Martin L, Head A, Callison-Burch C. Calypso: Llms as dungeon master’s assistants. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2023, 380–390

  33. [33]

    Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, and Yu Qiao

    Wang Z, Cai S, Chen G, Liu A, Ma X, Liang Y . De- scribe, explain, plan and select: Interactive planning with large language models enables open-world multi- task agents. arXiv preprint arXiv:2302.01560, 2023

  34. [34]

    Agentsims: An open-source sandbox for large language model evaluation

    Lin J, Zhao H, Zhang A, Wu Y , Ping H, Chen Q. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023

  35. [35]

    Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system

    Liang X, Wang B, Huang H, Wu S, Wu P, Lu L, Ma Z, Li Z. Unleashing infinite-length input capacity for large-scale language models with self-controlled mem- ory system. arXiv preprint arXiv:2304.13343, 2023

  36. [36]

    Simplyretrieve: A private and lightweight retrieval-centric generative ai tool

    Ng Y , Miyashita D, Hoshi Y , Morioka Y , Torii O, Ko- dama T, Deguchi J. Simplyretrieve: A private and lightweight retrieval-centric generative ai tool. arXiv preprint arXiv:2308.03983, 2023

  37. [37]

    Memory sandbox: Transparent and interactive memory manage- ment for conversational agents

    Huang Z, Gutierrez S, Kamana H, MacNeil S. Memory sandbox: Transparent and interactive memory manage- ment for conversational agents. In: Adjunct Proceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–3

  38. [38]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Wang G, Xie Y , Jiang Y , Mandlekar A, Xiao C, Zhu Y , Fan L, Anandkumar A. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

  39. [39]

    Memorybank: Enhancing large language models with long-term memory

    Zhong W, Guo L, Gao Q, Wang Y . Memorybank: En- hancing large language models with long-term memory. arXiv preprint arXiv:2305.10250, 2023

  40. [40]

    Chatdb: Augmenting llms with databases as their symbolic memory

    Hu C, Fu J, Du C, Luo S, Zhao J, Zhao H. Chatdb: Aug- menting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901, 2023

  41. [41]

    Ret-llm: Towards a general read-write memory for large language models.arXiv preprint arXiv:2305.14322, 2023

    Modarressi A, Imani A, Fayyaz M, Schütze H. Ret- llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322, 2023

  42. [42]

    Memory augmented large language models are computationally universal

    Schuurmans D. Memory augmented large language models are computationally universal. arXiv preprint arXiv:2301.04589, 2023

  43. [43]

    Expel: Llm agents are experiential learners

    Zhao A, Huang D, Xu Q, Lin M, Liu Y J, Huang G. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144, 2023

  44. [44]

    Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

    Huang W, Abbeel P, Pathak D, Mordatch I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: International Con- ference on Machine Learning. 2022, 9118–9147

  45. [45]

    Chain-of-thought prompting elicits reasoning in large language models

    Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le Q V , Zhou D, others . Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022, 35: 24824–24837

  46. [46]

    Large language models are zero-shot reasoners

    Kojima T, Gu S S, Reid M, Matsuo Y , Iwasawa Y . Large language models are zero-shot reasoners. Advances in neural information processing systems, 2022, 35: 22199–22213

  47. [47]

    Planning with large language models via corrective re-prompting

    Raman S S, Cohen V , Rosen E, Idrees I, Paulius D, Tellex S. Planning with large language models via corrective re-prompting. In: NeurIPS 2022 Foundation Models for Decision Making Workshop. 2022

  48. [48]

    Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models

    Xu B, Peng Z, Lei B, Mukherjee S, Liu Y , Xu D. Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models. arXiv preprint arXiv:2305.18323, 2023

  49. [49]

    Swiftsage: 36 Front

    Lin B Y , Fu Y , Yang K, Brahman F, Huang S, Bhaga- vatula C, Ammanabrolu P, Choi Y , Ren X. Swiftsage: 36 Front. Comput. Sci., 2025, 0(0): 1–42 A generative agent with fast and slow thinking for com- plex interactive tasks. Advances in Neural Information Processing Systems, 2024, 36

  50. [50]

    Dual-process theories of higher cognition: Advancing the debate

    Evans J S B, Stanovich K E. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 2013, 8(3): 223–241

  51. [51]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

  52. [52]

    Tree of thoughts: Deliberate problem solving with large language models

    Yao S, Yu D, Zhao J, Shafran I, Gri ffiths T, Cao Y , Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neu- ral Information Processing Systems, 2024, 36

  53. [53]

    Recmind: Large language model powered agent for recommendation

    Wang Y , Jiang Z, Chen Z, Yang F, Zhou Y , Cho E, Fan X, Huang X, Lu Y , Yang Y . Recmind: Large language model powered agent for recommendation. arXiv preprint arXiv:2308.14296, 2023

  54. [54]

    Graph of thoughts: Solving elaborate problems with large language mod- els

    Besta M, Blach N, Kubicek A, Gerstenberger R, Gianinazzi L, Gajda J, Lehmann T, Podstawski M, Niewiadomski H, Nyczyk P, others . Graph of thoughts: Solving elaborate problems with large language mod- els. arXiv preprint arXiv:2308.09687, 2023

  55. [55]

    Algorithm of thoughts: Enhancing exploration of ideas in large language models

    Sel B, Al-Tawaha A, Khattar V , Wang L, Jia R, Jin M. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023

  56. [56]

    Generating executable ac- tion plans with environmentally-aware language mod- els

    Gramopadhye M, Szafir D. Generating executable ac- tion plans with environmentally-aware language mod- els. In: 2023 IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS). 2023, 3568– 3575

  57. [57]

    Reasoning with language model is planning with world model.arXiv preprint arXiv:2305.14992,

    Hao S, Gu Y , Ma H, Hong J J, Wang Z, Wang D Z, Hu Z. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023

  58. [58]

    LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

    Liu B, Jiang Y , Zhang X, Liu Q, Zhang S, Biswas J, Stone P. LLM +P: Empowering large language mod- els with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023

  59. [59]

    Dynamic planning with a llm

    Dagan G, Keller F, Lascarides A. Dynamic planning with a llm. arXiv preprint arXiv:2308.06391, 2023

  60. [60]

    React: Synergizing reasoning and acting in language models

    Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y . React: Synergizing reasoning and acting in language models. In: The Twelfth International Conference on Learning Representations. 2023

  61. [61]

    Llm-planner: Few-shot grounded planning for embodied agents with large language models

    Song C H, Wu J, Washington C, Sadler B M, Chao W L, Su Y . Llm-planner: Few-shot grounded planning for embodied agents with large language models. In: Pro- ceedings of the IEEE /CVF International Conference on Computer Vision. 2023, 2998–3009

  62. [62]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Huang W, Xia F, Xiao T, Chan H, Liang J, Flo- rence P, Zeng A, Tompson J, Mordatch I, Chebotar Y , others . Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022

  63. [63]

    Self-refine: Iterative refinement with self- feedback

    Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, Dziri N, Prabhumoye S, Yang Y , others . Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 2024, 36

  64. [64]

    Selfcheck: Using llms to zero-shot check their own step-by-step reasoning

    Miao N, Teh Y W, Rainforth T. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. In: The Twelfth International Conference on Learning Representations. 2023

  65. [65]

    Interact: Exploring the poten- tials of chatgpt as a cooperative agent

    Chen P L, Chang C S. Interact: Exploring the poten- tials of chatgpt as a cooperative agent. arXiv preprint arXiv:2308.01552, 2023

  66. [66]

    Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models

    Chen Z, Zhou K, Zhang B, Gong Z, Zhao W X, Wen J R. Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models. arXiv preprint arXiv:2305.14323, 2023

  67. [67]

    WebGPT: Browser-assisted question-answering with human feedback

    Nakano R, Hilton J, Balaji S, Wu J, Ouyang L, Kim C, Hesse C, Jain S, Kosaraju V , Saunders W, others . Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021

  68. [68]

    Tptu: Task planning and tool usage of large language model-based ai agents

    Ruan J, Chen Y , Zhang B, Xu Z, Bao T, Du G, Shi S, Mao H, Zeng X, Zhao R. TPTU: Task planning and tool usage of large language model-based AI agents. arXiv preprint arXiv:2308.03427, 2023

  69. [69]

    Gorilla: Large Language Model Connected with Massive APIs

    Patil S G, Zhang T, Wang X, Gonzalez J E. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023

  70. [70]

    Api-bank: A com- prehensive benchmark for tool -augmented llms

    Li M, Song F, Yu B, Yu H, Li Z, Huang F, Li Y . Api- bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023

  71. [71]

    Restgpt: Connecting large lan- guage models with real -world restful apis

    Song Y , Xiong W, Zhu D, Li C, Wang K, Tian Y , Li S. Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624, 2023

  72. [72]

    Taskmatrix

    Liang Y , Wu C, Song T, Wu W, Xia Y , Liu Y , Ou Y , Lu S, Ji L, Mao S, others . Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 2024, 3: 0063

  73. [73]

    MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

    Karpas E, Abend O, Belinkov Y , Lenz B, Lieber O, Ratner N, Shoham Y , Bata H, Levine Y , Leyton-Brown K, others . Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, ex- ternal knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445, 2022

  74. [74]

    Openagi: When llm meets domain experts

    Ge Y , Hua W, Mei K, Tan J, Xu S, Li Z, Zhang Y , others . Openagi: When llm meets domain experts. Advances Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 37 in Neural Information Processing Systems, 2024, 36

  75. [75]

    Vipergpt: Visual inference via python execution for reasoning,

    Surís D, Menon S, V ondrick C. Vipergpt: Visual infer- ence via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023

  76. [76]

    M., Cox, S., Schilter, O., Baldassari, C., White, A

    Bran A M, Cox S, White A D, Schwaller P. Chem- crow: Augmenting large-language models with chem- istry tools. arXiv preprint arXiv:2304.05376, 2023

  77. [77]

    MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

    Yang Z, Li L, Wang J, Lin K, Azarnasab E, Ahmed F, Liu Z, Liu C, Zeng M, Wang L. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023

  78. [78]

    S3: Social-network simulation system with large language model-empowered agents

    Gao C, Lan X, Lu Z, Mao J, Piao J, Wang H, Jin D, Li Y . S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023

  79. [79]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    Ahn M, Brohan A, Brown N, Chebotar Y , Cortes O, David B, Finn C, Fu C, Gopalakrishnan K, Hausman K, others . Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022

  80. [80]

    Social simulacra: Creating populated prototypes for social computing systems

    Park J S, Popowski L, Cai C, Morris M R, Liang P, Bernstein M S. Social simulacra: Creating populated prototypes for social computing systems. In: Proceed- ings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022, 1–18

Showing first 80 references.