Recognition: unknown
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Pith reviewed 2026-05-10 17:35 UTC · model grok-4.3
The pith
LLM agents advance primarily by externalizing cognitive tasks into memory, skills, protocols, and harness systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that agent infrastructure matters because it transforms hard cognitive burdens into forms that the model can solve more reliably. Memory externalizes state across time, skills externalize procedural expertise, protocols externalize interaction structure, and harness engineering unifies them into governed execution. This provides a systems-level view explaining why practical agent progress depends on better external cognitive infrastructure alongside stronger models, including trade-offs between parametric and externalized capabilities and directions like self-evolving harnesses.
What carries the argument
The unifying lens of externalization as cognitive artifacts, with memory, skills, protocols, and harness as the four key mechanisms that offload and restructure cognitive demands.
If this is right
- Agent development will focus more on designing reliable external modules that models can use effectively.
- Evaluation metrics will shift to assess coordination and governance provided by the harness.
- New agent systems may incorporate self-evolving harnesses that adapt infrastructure dynamically.
- Shared infrastructure layers could allow different models to benefit from common externalized components.
- Long-term co-evolution between models and their external environments will become a key research area.
Where Pith is reading between the lines
- This suggests that benchmarks for agents should include controlled experiments varying only the harness to measure its isolated impact.
- Interoperability standards for protocols could accelerate adoption across different LLM platforms.
- The framework implies potential limits to pure scaling of models without corresponding infrastructure advances.
- Open challenges in governance may require new frameworks for auditing externalized components in agents.
Load-bearing premise
That the progression from weights to context to harness, viewed through cognitive artifacts, provides a unifying and accurate explanation for why agent systems improve in practice.
What would settle it
Finding a set of major agent capabilities that improved substantially through model fine-tuning or scaling alone, without corresponding advances in memory, skills, protocols, or harness design.
read the original abstract
Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into memory stores, reusable skills, interaction protocols, and the surrounding harness that makes these modules reliable in practice. This paper reviews that shift through the lens of externalization. Drawing on the idea of cognitive artifacts, we argue that agent infrastructure matters not merely because it adds auxiliary components, but because it transforms hard cognitive burdens into forms that the model can solve more reliably. Under this view, memory externalizes state across time, skills externalize procedural expertise, protocols externalize interaction structure, and harness engineering serves as the unification layer that coordinates them into governed execution. We trace a historical progression from weights to context to harness, analyze memory, skills, and protocols as three distinct but coupled forms of externalization, and examine how they interact inside a larger agent system. We further discuss the trade-off between parametric and externalized capability, identify emerging directions such as self-evolving harnesses and shared agent infrastructure, and discuss open challenges in evaluation, governance, and the long-term co-evolution of models and external infrastructure. The result is a systems-level framework for explaining why practical agent progress increasingly depends not only on stronger models, but on better external cognitive infrastructure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews the development of LLM agents as a process of externalizing cognitive burdens from model weights into runtime infrastructure. It frames this shift using cognitive artifacts: memory externalizes state across time, skills externalize procedural knowledge, protocols externalize interaction structure, and harness engineering coordinates these into reliable execution. The manuscript traces a historical progression from weights to context to harness, analyzes the three externalization forms and their interactions, discusses parametric vs. externalized capability trade-offs, and identifies open challenges in evaluation, governance, and co-evolution of models with infrastructure.
Significance. If the externalization lens holds, the paper supplies a coherent systems-level framework that organizes disparate agent-engineering practices and explains why infrastructure improvements often yield more reliable gains than scale alone. It synthesizes trends across memory, skills, and protocols literature into a single narrative, which could help researchers and practitioners prioritize harness design and shared infrastructure. The review format itself is a strength, as it avoids new empirical claims while highlighting falsifiable directions such as self-evolving harnesses.
minor comments (3)
- The historical progression section would benefit from an explicit timeline or table summarizing key milestones (e.g., early ReAct-style agents vs. later tool-use harnesses) to make the weights-to-context-to-harness arc easier to follow.
- In the trade-off discussion, clarify whether the parametric/externalized distinction is treated as a strict dichotomy or a continuum; the current phrasing risks implying zero-sum dynamics without addressing hybrid approaches that retain some parametric capability.
- The open challenges subsection on evaluation could reference specific existing benchmarks (e.g., AgentBench or WebArena) and note how the externalization view would change what those benchmarks measure.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of the manuscript, which correctly identifies the externalization lens as the central organizing principle. The recommendation for minor revision is noted, and we will incorporate any editorial or presentational improvements in the revised version.
Circularity Check
No significant circularity: conceptual review without derivations or fitted predictions
full rationale
The paper is a literature review that organizes existing trends in LLM agent design under an interpretive framework of externalization (memory for state, skills for procedures, protocols for interaction, harness for coordination). It draws on the idea of cognitive artifacts but presents no new equations, quantitative predictions, parameter fits, or first-principles derivations. The central claim—that external infrastructure transforms cognitive burdens into more solvable forms—is interpretive and cites prior work without reducing any result to a self-referential definition or fitted input renamed as prediction. No load-bearing self-citation chains, uniqueness theorems, or ansatzes are invoked in a way that creates circularity. The acknowledged trade-offs further keep the framing non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cognitive artifacts from prior literature can be productively applied to frame externalization in LLM agents
Forward citations
Cited by 10 Pith papers
-
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
-
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
SLIM dynamically optimizes active external skills in agentic RL via leave-one-skill-out marginal contribution estimates and three lifecycle operations, outperforming baselines by 7.1% on ALFWorld and SearchQA while sh...
-
Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference a...
-
CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models
CellScientist introduces a dual-space hierarchical orchestration system that enables closed-loop refinement of virtual cell models by routing execution discrepancies back to hypothesis or implementation updates, yield...
-
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
-
MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation
MeloTune implements learned per-listener Personal Arousal Functions and mesh memory protocols on mobile devices to predict affective trajectories and enable peer-coupled proactive music selection, reporting 96.6% patt...
-
Harness Engineering as Categorical Architecture
Categorical Architecture triple (G, Know, Phi) supplies the formal theory for composing LLM agent harnesses with structurally preserved certificates.
-
The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
-
Memory as Metabolism: A Design for Companion Knowledge Systems
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
Reference graph
Works this paper leans on
-
[1]
M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gober, K. Gopalakrishnan, et al. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, 2022
2022
-
[2]
Arigraph: Learning knowledge graph world models with episodic memory for llm agents
P. Anokhin, N. Semenov, A. Sorokin, D. Evseev, A. Kravchenko, M. Burtsev, and E. Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents. arXiv preprint arXiv:2407.04363, 2024
-
[3]
Introducing the model context protocol
Anthropic . Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol, Nov. 2024. Anthropic news post, November 25, 2024
2024
-
[4]
Model context protocol, 2024
Anthropic. Model context protocol, 2024. URL https://www.anthropic.com/news/model-context-protocol. Accessed: 2025-04-19
2024
-
[5]
Introducing agent skills
Anthropic . Introducing agent skills. https://claude.com/blog/skills, Oct. 2025. Anthropic product announcement, October 16, 2025
2025
-
[6]
Agent skills
Anthropic . Agent skills. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview, 2026. Claude API Docs, accessed 2026-04-02
2026
-
[7]
Baghel and R
G. Baghel and R. Chandna. Introducing hashicorp agent skills, 2026. URL https://www.hashicorp.com/en/blog/introducing-hashicorp-agent-skills#what-are-agent-skills
2026
-
[8]
Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022 a . doi:10.48550/arXiv.2204.05862. URL https://arxiv.org/abs/2204.05862
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.05862 2022
-
[9]
Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Lukosuite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. DasSarma, R. L...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
Borgeaud, A
S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. M. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, et al. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning, pages 2206--2240. PMLR, 2022. URL https://proceedings.mlr.press/v162/borgea...
2022
-
[11]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023
work page internal anchor Pith review arXiv 2023
-
[12]
Brown, B
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877--1901, 2020
1901
-
[13]
H. Cai, Y. Li, W. Wang, F. Zhu, X. Shen, W. Li, and T.-S. Chua. Large language models empowered personalized web agents. In Proceedings of the ACM on Web Conference 2025, pages 198--215, 2025
2025
- [14]
-
[15]
Agent network protocol technical white paper,
G. Chang, E. Lin, C. Yuan, R. Cai, B. Chen, X. Xie, and Y. Zhang. Agent network protocol technical white paper, 2025. URL https://arxiv.org/abs/2508.00007
- [16]
-
[17]
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herb...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[18]
S. Chen, S. Wong, L. Chen, and Y. Tian. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023. URL https://arxiv.org/abs/2306.15595
work page internal anchor Pith review arXiv 2023
- [19]
- [20]
-
[21]
Dated data: Tracing knowledge cutoffs in large language models.arXiv:2403.12958, 2024
J. Cheng, M. Marone, O. Weller, D. Lawrie, D. Khashabi, and B. Van Durme. Dated data: Tracing knowledge cutoffs in large language models. arXiv preprint arXiv:2403.12958, 2024
-
[22]
X. Cheng, W. Zeng, D. Dai, Q. Chen, B. Wang, Z. Xie, K. Huang, X. Yu, Z. Hao, Y. Li, et al. Conditional memory via scalable lookup: A new axis of sparsity for large language models. arXiv preprint arXiv:2601.07372, 2026
-
[23]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav. Mem0 : Building production-ready AI agents with scalable long-term memory. arXiv preprint arXiv:2504.19413, 2025. doi:10.48550/arXiv.2504.19413
work page internal anchor Pith review doi:10.48550/arxiv.2504.19413 2025
- [24]
-
[25]
Chowdhery, S
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al. PaLM : Scaling language modeling with pathways. Journal of Machine Learning Research, 24 0 (240): 0 1--113, 2023. URL https://jmlr.org/papers/v24/22-1144.html
2023
-
[26]
A. Clark and D. J. Chalmers. The extended mind. Analysis, 58 0 (1): 0 7--19, 1998. doi:10.1093/analys/58.1.7
-
[27]
Ag-ui: The agent-user interaction protocol
CopilotKit . Ag-ui: The agent-user interaction protocol. https://github.com/ag-ui-protocol/ag-ui, 2025. Official protocol repository and specification
2025
-
[28]
G. Corallo and P. Papotti. Parallel context-of-experts decoding for retrieval augmented generation. arXiv preprint arXiv:2601.08670, 2026
-
[29]
CrewAI : Framework for orchestrating role-playing autonomous AI agents
CrewAI . CrewAI : Framework for orchestrating role-playing autonomous AI agents. https://github.com/crewAIInc/crewAI, 2024. GitHub repository, accessed 2026-04-02
2024
-
[30]
De Brigard, S
F. De Brigard, S. Umanath, and M. Irish. Rethinking the distinction between episodic and semantic memory: Insights from the past, present, and future. Memory & Cognition, 50 0 (3): 0 459--463, 2022
2022
-
[31]
DeepSeek-AI . Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2025. URL https://arxiv.org/abs/2412.19437
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Z. Deng, Y. Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y. Xiang. Ai agents under threat: A survey of key security challenges and future pathways. ACM Computing Surveys, 57 0 (7): 0 1--36, 2025
2025
- [34]
-
[35]
D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024
work page internal anchor Pith review arXiv 2024
-
[37]
A. Ehtesham, A. Singh, G. K. Gupta, and S. Kumar. A survey of agent interoperability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp), 2025 b . URL https://arxiv.org/abs/2505.02279
-
[38]
A. Ehtesham et al. A survey of agent interoperability protocols: Model context protocol ( MCP ), agent communication protocol ( ACP ), agent-to-agent protocol ( A2A ), and agent network protocol ( ANP ). arXiv preprint arXiv:2505.02279, 2025 c . doi:10.48550/arXiv.2505.02279
-
[39]
N. Esmi, M. Nezhad-Moghaddam, F. Borhani, A. Shahbahrami, A. Daemdoost, and G. Gaydadjiev. Gpt-5 vs other llms in long short-context performance. In 2025 3rd International Conference on Foundation and Large Language Models (FLLM), pages 129--133. IEEE, 2025
2025
-
[40]
Agent Control Protocol: Admission Control for Agent Actions
M. Fernandez. Agent control protocol: Admission control for agent actions. arXiv preprint arXiv:2603.18829, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [41]
- [42]
-
[43]
Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, and H. Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2024. doi:10.48550/arXiv.2312.10997. URL https://arxiv.org/abs/2312.10997
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.10997 2024
-
[44]
Gemini Team , R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Sorber, et al. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. URL https://arxiv.org/abs/2312.11805
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
G. Gigerenzer and W. Gaissmaier. Heuristic decision making. Annual Review of Psychology, 62 0 (1): 0 451--482, 2011. doi:10.1146/annurev-psych-120709-145346. URL https://doi.org/10.1146/annurev-psych-120709-145346
-
[46]
Gemini: Try deep research and gemini 2.0 flash experimental
Google . Gemini: Try deep research and gemini 2.0 flash experimental. https://blog.google/products-and-platforms/products/gemini/google-gemini-deep-research/, Dec. 2024. Google blog post introducing Deep Research in Gemini, December 11, 2024; accessed 2026-04-02
2024
-
[47]
A2a: A new era of agent interoperability
Google . A2a: A new era of agent interoperability. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/, 2025 a . Official announcement of the Agent2Agent (A2A) protocol for enabling secure communication and coordination between AI agents
2025
-
[48]
A2ui: Agent-to-user interface protocol
Google . A2ui: Agent-to-user interface protocol. https://github.com/google/A2UI, 2025 b . Open-source implementation of the A2UI protocol, enabling AI agents to generate declarative user interfaces that are rendered natively across platforms
2025
-
[49]
Under the hood: Universal commerce protocol (ucp)
Google . Under the hood: Universal commerce protocol (ucp). https://developers.googleblog.com/under-the-hood-universal-commerce-protocol-ucp/, 2026. Official introduction of the Universal Commerce Protocol (UCP), an open standard enabling interoperable agent-driven commerce across discovery, checkout, and post-purchase workflows
2026
-
[50]
Announcing the agent2agent protocol ( A2A )
Google Cloud . Announcing the agent2agent protocol ( A2A ). https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/, Apr. 2025 a . Google Developers Blog announcement, April 9, 2025; see also the official specification site at https://google.github.io/A2A/
2025
-
[51]
Announcing agent payments protocol (ap2)
Google Cloud . Announcing agent payments protocol (ap2). https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol, 2025 b . Official introduction of AP2 as an open protocol enabling secure, compliant, and interoperable agent-driven payments
2025
- [52]
- [53]
- [54]
-
[55]
Training Compute-Optimal Large Language Models
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, et al. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022. URL https://arxiv.org/abs/2203.15556
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[56]
S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Lin, et al. MetaGPT : Meta programming for a multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023. URL https://arxiv.org/abs/2308.00352
work page internal anchor Pith review arXiv 2023
-
[57]
X. Hou, Y. Zhao, S. Wang, and H. Wang. Model context protocol (mcp): Landscape, security threats, and future research directions. ACM Transactions on Software Engineering and Methodology, 2025
2025
- [58]
-
[59]
Z. Hu, Q. Zhu, H. Yan, Y. He, and L. Gui. Beyond rag for agent memory: Retrieval by decoupling and aggregation. arXiv preprint arXiv:2602.02007, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[60]
Hutchins
E. Hutchins. Cognition in the Wild. MIT press, 1995
1995
-
[61]
The simplest protocol for ai agents to work together
IBM Research . The simplest protocol for ai agents to work together. https://research.ibm.com/blog/agent-communication-protocol-ai, 2025. Official introduction of ACP, describing it as a shared communication language enabling collaboration among AI agents
2025
-
[62]
P. Jiang, J. Lin, Z. Shi, Z. Wang, L. He, Y. Wu, M. Zhong, P. Song, Q. Zhang, H. Wang, X. Xu, H. Xu, P. Han, D. Zhang, J. Sun, C. Yang, K. Qian, T. Wang, C. Hu, M. Li, Q. Li, H. Peng, S. Wang, J. Shang, C. Zhang, J. You, L. Liu, P. Lu, Y. Zhang, H. Ji, Y. Choi, D. Song, J. Sun, and J. Han. Adaptation of agentic ai: A survey of post-training, memory, and s...
-
[63]
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
Y. Jiang et al. SoK : Agentic skills -- beyond tool use in LLM agents. arXiv preprint arXiv:2602.20867, 2026 b
work page internal anchor Pith review arXiv 2026
-
[64]
Json-rpc 2.0 specification, 2010
JSON-RPC Working Group . Json-rpc 2.0 specification, 2010. URL https://www.jsonrpc.org/specification
2010
-
[65]
J. Kang, M. Ji, Z. Zhao, and T. Bai. Memory os of ai agent. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25972--25981, 2025
2025
-
[66]
Scaling Laws for Neural Language Models
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https://arxiv.org/abs/2001.08361
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[67]
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model. arXiv preprint arXiv:2406.09246, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[68]
D. Kirsh. Complementary strategies: Why we use our hands when we think. In Proceedings of the seventeenth annual conference of the cognitive science society, Hillsdale, NJ, 1995. Lawrence Erlbaum
1995
-
[69]
Kojima, S
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199--22213, 2022
2022
- [70]
-
[71]
LangGraph : Build resilient language agents as graphs
LangChain . LangGraph : Build resilient language agents as graphs. https://github.com/langchain-ai/langgraph, 2024. GitHub repository, accessed 2026-04-02
2024
-
[72]
Lazaros, A
K. Lazaros, A. G. Vrahatis, and S. Kotsiantis. Human-in-the-loop artificial intelligence: A systematic review of concepts, methods, and applications. Entropy, 28 0 (4): 0 377, 2026
2026
- [73]
- [74]
-
[75]
u ttler, M. Lewis, W.-t. Yih, T. Rockt \
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K \"u ttler, M. Lewis, W.-t. Yih, T. Rockt \"a schel, S. Riedel, and D. Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459--9474, 2020
2020
-
[76]
G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem. CAMEL : Communicative agents for ``mind'' exploration of large language model society. Advances in Neural Information Processing Systems, 36, 2023
2023
- [77]
-
[78]
Memory, consciousness and large language model
J. Li and J. Li. Memory, consciousness and large language model. arXiv preprint arXiv:2401.02509, 2024
-
[79]
N. Li, K. Zhang, K. Polley, and J. Ma. Security considerations for artificial intelligence agents. arXiv preprint arXiv:2603.12230, 2026 b
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[80]
X. Li. A review of prominent paradigms for LLM -based agents: Tool use (including RAG ), planning, and feedback learning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9760--9779, Abu Dhabi, UAE, 2025. Association for Computational Linguistics
2025
-
[81]
X. Li, W. Chen, Y. Liu, S. Zheng, X. Chen, Y. He, Y. Li, B. You, H. Shen, J. Sun, et al. Skillsbench: Benchmarking how well agent skills work across diverse tasks. arXiv preprint arXiv:2602.12670, 2026 c
work page internal anchor Pith review arXiv 2026
-
[82]
Z. Li, C. Xi, C. Li, D. Chen, B. Chen, S. Song, S. Niu, H. Wang, J. Yang, C. Tang, et al. Memos: A memory os for ai system. arXiv preprint arXiv:2507.03724, 2025
work page internal anchor Pith review arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.