Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production

Francisco Jos\'e Garc\'ia-Pe\~nalvo; Juanan Pereira; Marc Alier Forment; Mar\'ia Jos\'e Casa\~n Guerrero

arxiv: 2606.11869 · v1 · pith:CAQ5ZBQHnew · submitted 2026-06-10 · 💻 cs.SE · cs.AI

Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production

Marc Alier Forment , Juanan Pereira , Francisco Jos\'e Garc\'ia-Pe\~nalvo , Mar\'ia Jos\'e Casa\~n Guerrero This is my paper

Pith reviewed 2026-06-27 09:16 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords custom AI agentsagent development methodologyLLM applicationssoftware engineering practiceprototype to productionframework-free developmentagent testingCLI orchestration

0 comments

The pith

The Agents All the Way Down methodology uses two preconditions crossed once and three practices repeated to build custom AI agents from substrate to production without frameworks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a complete practice for engineers to create custom AI agents that live inside their own applications, use their own data and tools, and maintain their own security and audit trail. It begins by establishing a substrate that treats the LLM as a software component with tools, system prompts, and cached messages, plus reusable building blocks such as function calling and CLI orchestration. The core loop then repeats three practices: prototyping with a general-purpose agent, harvesting results into a shipped CLI via the Turtle pattern, and using one agent to test another through behavioral scenarios. This loop is shown working in the construction of the AAC agent for the LAMB platform in roughly ten days. A sympathetic reader would see it as turning scattered online advice into a repeatable, framework-free process that any developer can follow end to end.

Core claim

The paper claims that custom AI agents are built by first crossing two preconditions once—P1 Substrate framing the LLM as tools, system, and messages under prompt-caching, and P2 Building blocks of function calling, MCP, CLI orchestration, liteshell, agent loop, skills, characters, hooks and scaffolding—then repeating three practices: P3 prototype with a general-purpose agent, P4 harvest-fold-and-ship the result as a CLI using the Turtle pattern, and P5 agent-tests-agent in which a general-purpose agent drives behavioral scenarios. The working loop P3-P4-P5 yields a framework-free methodology whose only demonstrated instance is the AAC agent built in about ten days on the LAMB platform, with

What carries the argument

The Agents All the Way Down methodology consisting of preconditions P1 Substrate and P2 Building blocks crossed once plus the repeated practices P3 prototype, P4 Turtle-pattern harvest-and-ship, and P5 agent-tests-agent.

If this is right

Multi-agent systems reduce to ordinary CLI composition of the harvested agents.
Classical unit and integration tests are complemented rather than replaced by agent-driven behavioral scenarios.
A single developer with an AI pair-programmer can move from prototype to production agent in roughly ten days.
The same loop can be restarted whenever the agent must incorporate new data sources or tools.
No external agent framework is required at any stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Turtle pattern of harvesting a working prototype into a minimal CLI may generalize to other software artifacts beyond agents.
Agent-tests-agent could surface integration issues that only appear when the agent interacts with live external services.
Teams maintaining multiple custom agents could share the same general-purpose tester agent across projects.
The methodology's emphasis on CLI output suggests it may integrate more easily with existing DevOps pipelines than GUI-centric agent builders.

Load-bearing premise

The patterns observed in one ten-day build of a single agent on the LAMB platform can be applied by other developers to different languages, domains, and projects without further validation.

What would settle it

A second developer following only the stated preconditions and practices to build a different custom agent in a new domain and language, then checking whether the result reaches production without requiring additional unlisted steps or major rework.

Figures

Figures reproduced from arXiv: 2606.11869 by Francisco Jos\'e Garc\'ia-Pe\~nalvo, Juanan Pereira, Marc Alier Forment, Mar\'ia Jos\'e Casa\~n Guerrero.

**Figure 2.** Figure 2: Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

read the original abstract

Custom AI agents areagents that live inside their own application, talk to their own data and tools, enforce their own security boundaries, and carry their own brand and audit trail. What separates them from the general-purpose tier is fit, not capability: each is built for one job, by the engineer who will maintain it. No published practice sets out how to build one end to end. The pieces are everywhere (function-calling APIs, the Model Context Protocol, code agents to pair with), but the practice that chains them lives in podcasts, blogs, and leaked system prompts. This paper writes that practice down as a methodology, Agents All the Way Down: two preconditions crossed once and kept, then three practices repeated for the agent's life. The preconditions are (P1) Substrate, the LLM as a software component, framed as tools, then system, then messages under prompt-caching; and (P2) Building blocks: function calling, MCP, CLI orchestration, the liteshell pattern, the agent loop, skills, characters, hooks, and scaffolding. The practices are (P3) prototype with a general-purpose agent; (P4) harvest, fold, and ship the result as a CLI, the Turtle pattern; and (P5) agent-tests-agent, in which a general-purpose agent drives it through behavioural scenarios, a complement to classical testing, not a replacement. The working loop is P3 to P4 to P5 and back, and one corollary falls out for free: multi-agent orchestration is just CLI composition. The methodology is framework-free by construction. It was distilled from the AAC, a custom agent for the open-source LAMB platform, built in about ten days by one developer with an AI pair-programmer and in production . We present it as a transferable practice, independent of any language or framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Methodology distilled from one 10-day build asserts transferability without replication or comparison.

read the letter

The core of this paper is a named methodology—two preconditions on substrate and building blocks, followed by repeated cycles of prototype, Turtle-pattern harvest into CLI, and agent-tests-agent—that the authors say was extracted from building the AAC agent for the LAMB platform in roughly ten days. It does a service by writing down an explicit sequence that many people currently assemble from scattered blog posts and system prompts, and the reduction of multi-agent work to CLI composition is a clean observation.

What stands out as new is the specific labeling of the Turtle pattern and the agent-tests-agent loop as repeatable practices, plus the claim that the whole thing stays framework-free by design. That structure could save some engineers time when they need a custom agent that owns its own data and boundaries.

The limitation is straightforward: the entire argument for generality comes from a single developer's experience with an AI pair-programmer on one open-source project. There are no additional cases, no side-by-side comparison against ad-hoc methods or existing agent frameworks, and no reported metrics on build time, error rates, or maintenance cost across different languages or domains. The transferability step therefore sits on assertion rather than evidence.

This is aimed at working software engineers who want a documented process for custom agents rather than a research audience looking for validated results. A serious editor could send it to review if the venue accepts engineering-practice papers, with the expectation that referees will ask for at least one more independent application and some form of comparison.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce 'Agents All the Way Down', a framework-free methodology for building custom AI agents (those embedded in their own applications with domain-specific data, tools, security, and audit trails). It consists of two preconditions crossed once—(P1) Substrate (LLM framed as tools, then system, then messages under prompt-caching) and (P2) Building blocks (function calling, MCP, CLI orchestration, liteshell, agent loop, skills, characters, hooks, scaffolding)—followed by three repeated practices: (P3) prototype with a general-purpose agent, (P4) Turtle-pattern harvest-and-ship into a CLI, and (P5) agent-tests-agent for behavioral scenarios. The working loop is P3-P4-P5, with the corollary that multi-agent orchestration reduces to CLI composition. The methodology is presented as distilled from and demonstrated by the ten-day construction of the AAC agent for the open-source LAMB platform by one developer using an AI pair-programmer, and asserted to be transferable independent of language or framework.

Significance. If the transferability claim holds, the work would provide a documented, end-to-end practice for an area of software engineering where custom-agent construction currently relies on scattered informal sources. The explicit separation of preconditions from repeatable practices, the Turtle pattern, and the reduction of multi-agent systems to CLI composition offer concrete, actionable structure. The framework-free construction by design is a strength, as is the positioning of agent-tests-agent as a complement rather than replacement for classical testing.

major comments (1)

[Abstract] Abstract: the central claim that the two preconditions plus three practices 'provide a complete, framework-free practice' that is 'transferable' and 'independent of any language or framework' rests solely on distillation from the single AAC case built in ten days on the LAMB platform; no replication studies, comparative evaluation against ad-hoc or existing agent frameworks, error analysis, success/failure metrics, or external validation across domains or developers are supplied to anchor the transferability step.

minor comments (1)

[Abstract] Abstract: 'Custom AI agents areagents' contains a missing space and should read 'are agents'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the methodology's potential contributions and for identifying the key point of evidence strength. We address the concern below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the two preconditions plus three practices 'provide a complete, framework-free practice' that is 'transferable' and 'independent of any language or framework' rests solely on distillation from the single AAC case built in ten days on the LAMB platform; no replication studies, comparative evaluation against ad-hoc or existing agent frameworks, error analysis, success/failure metrics, or external validation across domains or developers are supplied to anchor the transferability step.

Authors: We agree that the transferability assertion rests on distillation from a single, documented case rather than on replication studies or comparative evaluations. The manuscript presents Agents All the Way Down explicitly as a practitioner-derived methodology (analogous to how many software engineering practices such as test-driven development or continuous integration were first documented from experience before broader empirical study). The framework-free construction and the separation of preconditions from repeatable practices are offered as structural features that support transfer in principle; the AAC implementation serves as the concrete existence proof. We do not claim to have performed the empirical work the referee correctly notes is absent. To address the concern we will revise the abstract and introduction to replace the phrasing 'provide a complete, framework-free practice that is transferable' with the more precise 'document a framework-free methodology distilled from production experience and offered as transferable'. We will also add an explicit Limitations section stating that broader validation across developers and domains remains future work. This is a partial revision because the core claim structure of the paper (methodology distilled from one end-to-end case) is retained. revision: partial

Circularity Check

0 steps flagged

No circularity: descriptive methodology with no derivations or self-referential reductions

full rationale

The paper presents a descriptive methodology (P1–P5 practices) distilled from one 10-day case study on the LAMB platform. No equations, fitted parameters, predictions, or mathematical derivations exist that could reduce to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. The transferability claim is an assertion, not a self-referential derivation. This matches the default non-circular outcome for non-quantitative papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper contains no quantitative modeling, data fitting, or formal derivations, so the ledger is empty.

pith-pipeline@v0.9.1-grok · 5936 in / 1183 out tokens · 24311 ms · 2026-06-27T09:16:12.292733+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 11 canonical work pages · 9 internal anchors

[1]

[Alier 2026] Alier, M

The references below are in author-year format suitable for markdown reading; the BibTeX equivalents are listed at the end of this section for the eventual arXiv / TeX submission. [Alier 2026] Alier, M. Learning LLMs, RAG, and Building Agents. YouTube lecture playlist, June

2026
[2]

Accessed 2026-06-10

https://www.youtube.com/playlist?list=PLjRDvpoYVcO2eE3RYzZ9b2mWS1 m6UxHCX. Accessed 2026-06-10. [Anomaly 2026] Anomaly. OpenCode Documentation. https://opencode.ai/docs/. Ac - cessed 2026-05-22. [Anthropic 2024] Anthropic. Building Effective AI Agents. Anthropic Research, 19 December

2026
[3]

Claude Code

https://www.anthropic.com/news/model-context-protocol [Anthropic 2026a] Anthropic. Claude Code. https://www.anthropic.com/product/claude- code. Accessed 2026-05-22. [Anthropic 2026b] Anthropic. Prompt caching. https://platform.claude.com/docs/en/ build-with-claude/prompt-caching. Accessed 2026-05-22. 39 [Anthropic 2026c] Anthropic. Code execution with MCP...

2026
[4]

2026] García-Peñalvo, F

(The Facade pattern and the lineage of naming-as-contribution the liteshell discussion in §3.2 borrows from.) [García-Peñalvo et al. 2026] García-Peñalvo, F. J., Alier, M., Vázquez-Ingelmo, A., García- Holgado, A., Casañ, M. J., and Pereira, J. Evaluación asistida por inteligencia artificial generativa en prácticas de Ingeniería de Software: una prueba de...

2026
[5]

[Gauthier 2026] Gauthier, P

DOI: 10.5944/ried.47173. [Gauthier 2026] Gauthier, P . and the Aider contributors. Aider Documentation. https:// aider.chat/docs/. Accessed 2026-05-22. [Google 2026a] Google. Gemini Developer API pricing. https://ai.google.dev/gemini-api/ docs/pricing. Accessed 2026-05-22. [Google 2026b] Google. Context caching — Gemini API. https://ai.google.dev/gemini-a...

work page doi:10.5944/ried.47173 2026
[6]

SmolAgents — Tools-as-code agents

(The turtles all the way down image is attributed in this source to a lecture by Bertrand Russell; the underlying World Turtle cosmology predates the modern attribution.) 40 [HuggingFace 2024] HuggingFace. SmolAgents — Tools-as-code agents. https://github. com/huggingface/smolagents. Released 2024; Accessed 2026-05-23. (The single-execu - tion-tool / tool...

2024
[7]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

https:// ghuntley.com/loop/ [IEEE Software 2026] IEEE Computer Society. Call for Papers: Engineering Agentic Systems. IEEE Software special issue. https://www.computer.org/digital-library/magazines/so/ cfp-engineering-agentic-systems. Accessed 2026-05-22. [Inngest 2026] Inngest. Inngest: durable workflow engine. https://www.inngest.com/. Ac- cessed 2026-0...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[8]

Self-Refine: Iterative Refinement with Self-Feedback

[LangChain 2026] LangChain. LangChain documentation. https://docs.langchain.com/. Accessed 2026-05-22. [LangChain 2026b] LangChain. LangGraph overview. https://docs.langchain.com/oss/ python/langgraph/overview. Accessed 2026-05-22. [Legati Workshop 2026] Alier, M. The Legati Workshop — a scaffold for AI-augmented intel - lectual workspaces. GitHub reposit...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Specification, Version 2025-06-18

[MCP 2025] Model Context Protocol. Specification, Version 2025-06-18. https://modelconte xtprotocol.io/specification/2025-06-18. Accessed 2026-05-22. [MCP 2026] Model Context Protocol. Authorization. https://modelcontextprotocol.io/ specification/draft/basic/authorization. Accessed 2026-05-22. [MindStudio 2026] MindStudio. MCP Servers Use 35× More Tokens ...

2025
[10]

Accessed 2026-06-08

https://github.com/ yoheinakajima/babyagi. Accessed 2026-06-08. 41 [NVD 2024] National Vulnerability Database. CVE-2024-3094. https://nvd.nist.gov/ vuln/detail/CVE-2024-3094. Accessed 2026-05-22. [OpenAI 2022] OpenAI. Introducing ChatGPT. https://openai.com/index/chatgpt/. Pub- lished November 2022; Accessed 2026-05-22. [OpenAI 2023] OpenAI. Function call...

2026
[11]

Inbound Malware Volume Report

[PyPI 2023] PyPI. Inbound Malware Volume Report. https://blog.pypi.org/posts/2023-09- 18-inbound-malware-reporting/. Accessed 2026-05-22. [PyPI 2026] Larson, S. and Fiedler, M. Incident Report: LiteLLM/Telnyx supply-chain attacks, with guidance. PyPI Blog, 2 April

2023
[12]

Why CLI Tools Are Beating MCP for AI Agents

https://blog.pypi.org/posts/2026-04-02-incident- report-litellm-telnyx-supply-chain-attack/ [Reinhard 2026] Reinhard, J. Why CLI Tools Are Beating MCP for AI Agents. https://jannikreinhard.com/2026/02/22/why-cli-tools-are-beating-mcp-for-ai-agents/. Published 2026-02-22; Accessed 2026-05-22. [Restate 2026] Restate. Restate: durable execution for resilient...

work page arXiv 2026
[13]

Toolformer: Language Models Can Teach Themselves to Use Tools

[Schick et al. 2023] Schick, T. et al. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Reflexion: Language Agents with Verbal Reinforcement Learning

[Shinn et al. 2023] Shinn, N. et al. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

OpenClaw: The Viral AI Agent that Broke the Internet

[Steinberger 2026] Steinberger, P . OpenClaw: The Viral AI Agent that Broke the Internet. Lex Fridman Podcast #491,

2026
[16]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Transcript: https://lexfridman.com/peter-steinberger- transcript. Accessed 2026-06-07. [Temporal 2026] Temporal Technologies. Temporal: durable execution platform. https:// temporal.io/. Accessed 2026-06-07. [Vensas 2026] Vensas. MCP vs CLI: Cost Comparison for AI Agent Tooling. https://vensas. de/en/blog/mcp-vs-cli-cost-comparison. Accessed 2026-05-22. [...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[17]

A Survey on Large Language Model based Autonomous Agents

42 [Wang et al. 2023b] Wang, L. et al. A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

[Wu et al. 2023] Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

The Rise and Potential of Large Language Model Based Agents: A Survey

[Xi et al. 2023] Xi, Z. et al. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

ReAct: Synergizing Reasoning and Acting in Language Models

[Yao et al. 2022] Yao, S. et al. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629,

work page internal anchor Pith review Pith/arXiv arXiv 2022

[1] [1]

[Alier 2026] Alier, M

The references below are in author-year format suitable for markdown reading; the BibTeX equivalents are listed at the end of this section for the eventual arXiv / TeX submission. [Alier 2026] Alier, M. Learning LLMs, RAG, and Building Agents. YouTube lecture playlist, June

2026

[2] [2]

Accessed 2026-06-10

https://www.youtube.com/playlist?list=PLjRDvpoYVcO2eE3RYzZ9b2mWS1 m6UxHCX. Accessed 2026-06-10. [Anomaly 2026] Anomaly. OpenCode Documentation. https://opencode.ai/docs/. Ac - cessed 2026-05-22. [Anthropic 2024] Anthropic. Building Effective AI Agents. Anthropic Research, 19 December

2026

[3] [3]

Claude Code

https://www.anthropic.com/news/model-context-protocol [Anthropic 2026a] Anthropic. Claude Code. https://www.anthropic.com/product/claude- code. Accessed 2026-05-22. [Anthropic 2026b] Anthropic. Prompt caching. https://platform.claude.com/docs/en/ build-with-claude/prompt-caching. Accessed 2026-05-22. 39 [Anthropic 2026c] Anthropic. Code execution with MCP...

2026

[4] [4]

2026] García-Peñalvo, F

(The Facade pattern and the lineage of naming-as-contribution the liteshell discussion in §3.2 borrows from.) [García-Peñalvo et al. 2026] García-Peñalvo, F. J., Alier, M., Vázquez-Ingelmo, A., García- Holgado, A., Casañ, M. J., and Pereira, J. Evaluación asistida por inteligencia artificial generativa en prácticas de Ingeniería de Software: una prueba de...

2026

[5] [5]

[Gauthier 2026] Gauthier, P

DOI: 10.5944/ried.47173. [Gauthier 2026] Gauthier, P . and the Aider contributors. Aider Documentation. https:// aider.chat/docs/. Accessed 2026-05-22. [Google 2026a] Google. Gemini Developer API pricing. https://ai.google.dev/gemini-api/ docs/pricing. Accessed 2026-05-22. [Google 2026b] Google. Context caching — Gemini API. https://ai.google.dev/gemini-a...

work page doi:10.5944/ried.47173 2026

[6] [6]

SmolAgents — Tools-as-code agents

(The turtles all the way down image is attributed in this source to a lecture by Bertrand Russell; the underlying World Turtle cosmology predates the modern attribution.) 40 [HuggingFace 2024] HuggingFace. SmolAgents — Tools-as-code agents. https://github. com/huggingface/smolagents. Released 2024; Accessed 2026-05-23. (The single-execu - tion-tool / tool...

2024

[7] [7]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

https:// ghuntley.com/loop/ [IEEE Software 2026] IEEE Computer Society. Call for Papers: Engineering Agentic Systems. IEEE Software special issue. https://www.computer.org/digital-library/magazines/so/ cfp-engineering-agentic-systems. Accessed 2026-05-22. [Inngest 2026] Inngest. Inngest: durable workflow engine. https://www.inngest.com/. Ac- cessed 2026-0...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[8] [8]

Self-Refine: Iterative Refinement with Self-Feedback

[LangChain 2026] LangChain. LangChain documentation. https://docs.langchain.com/. Accessed 2026-05-22. [LangChain 2026b] LangChain. LangGraph overview. https://docs.langchain.com/oss/ python/langgraph/overview. Accessed 2026-05-22. [Legati Workshop 2026] Alier, M. The Legati Workshop — a scaffold for AI-augmented intel - lectual workspaces. GitHub reposit...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

Specification, Version 2025-06-18

[MCP 2025] Model Context Protocol. Specification, Version 2025-06-18. https://modelconte xtprotocol.io/specification/2025-06-18. Accessed 2026-05-22. [MCP 2026] Model Context Protocol. Authorization. https://modelcontextprotocol.io/ specification/draft/basic/authorization. Accessed 2026-05-22. [MindStudio 2026] MindStudio. MCP Servers Use 35× More Tokens ...

2025

[10] [10]

Accessed 2026-06-08

https://github.com/ yoheinakajima/babyagi. Accessed 2026-06-08. 41 [NVD 2024] National Vulnerability Database. CVE-2024-3094. https://nvd.nist.gov/ vuln/detail/CVE-2024-3094. Accessed 2026-05-22. [OpenAI 2022] OpenAI. Introducing ChatGPT. https://openai.com/index/chatgpt/. Pub- lished November 2022; Accessed 2026-05-22. [OpenAI 2023] OpenAI. Function call...

2026

[11] [11]

Inbound Malware Volume Report

[PyPI 2023] PyPI. Inbound Malware Volume Report. https://blog.pypi.org/posts/2023-09- 18-inbound-malware-reporting/. Accessed 2026-05-22. [PyPI 2026] Larson, S. and Fiedler, M. Incident Report: LiteLLM/Telnyx supply-chain attacks, with guidance. PyPI Blog, 2 April

2023

[12] [12]

Why CLI Tools Are Beating MCP for AI Agents

https://blog.pypi.org/posts/2026-04-02-incident- report-litellm-telnyx-supply-chain-attack/ [Reinhard 2026] Reinhard, J. Why CLI Tools Are Beating MCP for AI Agents. https://jannikreinhard.com/2026/02/22/why-cli-tools-are-beating-mcp-for-ai-agents/. Published 2026-02-22; Accessed 2026-05-22. [Restate 2026] Restate. Restate: durable execution for resilient...

work page arXiv 2026

[13] [13]

Toolformer: Language Models Can Teach Themselves to Use Tools

[Schick et al. 2023] Schick, T. et al. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Reflexion: Language Agents with Verbal Reinforcement Learning

[Shinn et al. 2023] Shinn, N. et al. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

OpenClaw: The Viral AI Agent that Broke the Internet

[Steinberger 2026] Steinberger, P . OpenClaw: The Viral AI Agent that Broke the Internet. Lex Fridman Podcast #491,

2026

[16] [16]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Transcript: https://lexfridman.com/peter-steinberger- transcript. Accessed 2026-06-07. [Temporal 2026] Temporal Technologies. Temporal: durable execution platform. https:// temporal.io/. Accessed 2026-06-07. [Vensas 2026] Vensas. MCP vs CLI: Cost Comparison for AI Agent Tooling. https://vensas. de/en/blog/mcp-vs-cli-cost-comparison. Accessed 2026-05-22. [...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[17] [17]

A Survey on Large Language Model based Autonomous Agents

42 [Wang et al. 2023b] Wang, L. et al. A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

[Wu et al. 2023] Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

The Rise and Potential of Large Language Model Based Agents: A Survey

[Xi et al. 2023] Xi, Z. et al. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864,

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

ReAct: Synergizing Reasoning and Acting in Language Models

[Yao et al. 2022] Yao, S. et al. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629,

work page internal anchor Pith review Pith/arXiv arXiv 2022