Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production
Pith reviewed 2026-06-27 09:16 UTC · model grok-4.3
The pith
The Agents All the Way Down methodology uses two preconditions crossed once and three practices repeated to build custom AI agents from substrate to production without frameworks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that custom AI agents are built by first crossing two preconditions once—P1 Substrate framing the LLM as tools, system, and messages under prompt-caching, and P2 Building blocks of function calling, MCP, CLI orchestration, liteshell, agent loop, skills, characters, hooks and scaffolding—then repeating three practices: P3 prototype with a general-purpose agent, P4 harvest-fold-and-ship the result as a CLI using the Turtle pattern, and P5 agent-tests-agent in which a general-purpose agent drives behavioral scenarios. The working loop P3-P4-P5 yields a framework-free methodology whose only demonstrated instance is the AAC agent built in about ten days on the LAMB platform, with
What carries the argument
The Agents All the Way Down methodology consisting of preconditions P1 Substrate and P2 Building blocks crossed once plus the repeated practices P3 prototype, P4 Turtle-pattern harvest-and-ship, and P5 agent-tests-agent.
If this is right
- Multi-agent systems reduce to ordinary CLI composition of the harvested agents.
- Classical unit and integration tests are complemented rather than replaced by agent-driven behavioral scenarios.
- A single developer with an AI pair-programmer can move from prototype to production agent in roughly ten days.
- The same loop can be restarted whenever the agent must incorporate new data sources or tools.
- No external agent framework is required at any stage.
Where Pith is reading between the lines
- The Turtle pattern of harvesting a working prototype into a minimal CLI may generalize to other software artifacts beyond agents.
- Agent-tests-agent could surface integration issues that only appear when the agent interacts with live external services.
- Teams maintaining multiple custom agents could share the same general-purpose tester agent across projects.
- The methodology's emphasis on CLI output suggests it may integrate more easily with existing DevOps pipelines than GUI-centric agent builders.
Load-bearing premise
The patterns observed in one ten-day build of a single agent on the LAMB platform can be applied by other developers to different languages, domains, and projects without further validation.
What would settle it
A second developer following only the stated preconditions and practices to build a different custom agent in a new domain and language, then checking whether the result reaches production without requiring additional unlisted steps or major rework.
Figures
read the original abstract
Custom AI agents areagents that live inside their own application, talk to their own data and tools, enforce their own security boundaries, and carry their own brand and audit trail. What separates them from the general-purpose tier is fit, not capability: each is built for one job, by the engineer who will maintain it. No published practice sets out how to build one end to end. The pieces are everywhere (function-calling APIs, the Model Context Protocol, code agents to pair with), but the practice that chains them lives in podcasts, blogs, and leaked system prompts. This paper writes that practice down as a methodology, Agents All the Way Down: two preconditions crossed once and kept, then three practices repeated for the agent's life. The preconditions are (P1) Substrate, the LLM as a software component, framed as tools, then system, then messages under prompt-caching; and (P2) Building blocks: function calling, MCP, CLI orchestration, the liteshell pattern, the agent loop, skills, characters, hooks, and scaffolding. The practices are (P3) prototype with a general-purpose agent; (P4) harvest, fold, and ship the result as a CLI, the Turtle pattern; and (P5) agent-tests-agent, in which a general-purpose agent drives it through behavioural scenarios, a complement to classical testing, not a replacement. The working loop is P3 to P4 to P5 and back, and one corollary falls out for free: multi-agent orchestration is just CLI composition. The methodology is framework-free by construction. It was distilled from the AAC, a custom agent for the open-source LAMB platform, built in about ten days by one developer with an AI pair-programmer and in production . We present it as a transferable practice, independent of any language or framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce 'Agents All the Way Down', a framework-free methodology for building custom AI agents (those embedded in their own applications with domain-specific data, tools, security, and audit trails). It consists of two preconditions crossed once—(P1) Substrate (LLM framed as tools, then system, then messages under prompt-caching) and (P2) Building blocks (function calling, MCP, CLI orchestration, liteshell, agent loop, skills, characters, hooks, scaffolding)—followed by three repeated practices: (P3) prototype with a general-purpose agent, (P4) Turtle-pattern harvest-and-ship into a CLI, and (P5) agent-tests-agent for behavioral scenarios. The working loop is P3-P4-P5, with the corollary that multi-agent orchestration reduces to CLI composition. The methodology is presented as distilled from and demonstrated by the ten-day construction of the AAC agent for the open-source LAMB platform by one developer using an AI pair-programmer, and asserted to be transferable independent of language or framework.
Significance. If the transferability claim holds, the work would provide a documented, end-to-end practice for an area of software engineering where custom-agent construction currently relies on scattered informal sources. The explicit separation of preconditions from repeatable practices, the Turtle pattern, and the reduction of multi-agent systems to CLI composition offer concrete, actionable structure. The framework-free construction by design is a strength, as is the positioning of agent-tests-agent as a complement rather than replacement for classical testing.
major comments (1)
- [Abstract] Abstract: the central claim that the two preconditions plus three practices 'provide a complete, framework-free practice' that is 'transferable' and 'independent of any language or framework' rests solely on distillation from the single AAC case built in ten days on the LAMB platform; no replication studies, comparative evaluation against ad-hoc or existing agent frameworks, error analysis, success/failure metrics, or external validation across domains or developers are supplied to anchor the transferability step.
minor comments (1)
- [Abstract] Abstract: 'Custom AI agents areagents' contains a missing space and should read 'are agents'.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the methodology's potential contributions and for identifying the key point of evidence strength. We address the concern below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the two preconditions plus three practices 'provide a complete, framework-free practice' that is 'transferable' and 'independent of any language or framework' rests solely on distillation from the single AAC case built in ten days on the LAMB platform; no replication studies, comparative evaluation against ad-hoc or existing agent frameworks, error analysis, success/failure metrics, or external validation across domains or developers are supplied to anchor the transferability step.
Authors: We agree that the transferability assertion rests on distillation from a single, documented case rather than on replication studies or comparative evaluations. The manuscript presents Agents All the Way Down explicitly as a practitioner-derived methodology (analogous to how many software engineering practices such as test-driven development or continuous integration were first documented from experience before broader empirical study). The framework-free construction and the separation of preconditions from repeatable practices are offered as structural features that support transfer in principle; the AAC implementation serves as the concrete existence proof. We do not claim to have performed the empirical work the referee correctly notes is absent. To address the concern we will revise the abstract and introduction to replace the phrasing 'provide a complete, framework-free practice that is transferable' with the more precise 'document a framework-free methodology distilled from production experience and offered as transferable'. We will also add an explicit Limitations section stating that broader validation across developers and domains remains future work. This is a partial revision because the core claim structure of the paper (methodology distilled from one end-to-end case) is retained. revision: partial
Circularity Check
No circularity: descriptive methodology with no derivations or self-referential reductions
full rationale
The paper presents a descriptive methodology (P1–P5 practices) distilled from one 10-day case study on the LAMB platform. No equations, fitted parameters, predictions, or mathematical derivations exist that could reduce to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. The transferability claim is an assertion, not a self-referential derivation. This matches the default non-circular outcome for non-quantitative papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[Alier 2026] Alier, M
The references below are in author-year format suitable for markdown reading; the BibTeX equivalents are listed at the end of this section for the eventual arXiv / TeX submission. [Alier 2026] Alier, M. Learning LLMs, RAG, and Building Agents. YouTube lecture playlist, June
2026
-
[2]
Accessed 2026-06-10
https://www.youtube.com/playlist?list=PLjRDvpoYVcO2eE3RYzZ9b2mWS1 m6UxHCX. Accessed 2026-06-10. [Anomaly 2026] Anomaly. OpenCode Documentation. https://opencode.ai/docs/. Ac - cessed 2026-05-22. [Anthropic 2024] Anthropic. Building Effective AI Agents. Anthropic Research, 19 December
2026
-
[3]
Claude Code
https://www.anthropic.com/news/model-context-protocol [Anthropic 2026a] Anthropic. Claude Code. https://www.anthropic.com/product/claude- code. Accessed 2026-05-22. [Anthropic 2026b] Anthropic. Prompt caching. https://platform.claude.com/docs/en/ build-with-claude/prompt-caching. Accessed 2026-05-22. 39 [Anthropic 2026c] Anthropic. Code execution with MCP...
2026
-
[4]
2026] García-Peñalvo, F
(The Facade pattern and the lineage of naming-as-contribution the liteshell discussion in §3.2 borrows from.) [García-Peñalvo et al. 2026] García-Peñalvo, F. J., Alier, M., Vázquez-Ingelmo, A., García- Holgado, A., Casañ, M. J., and Pereira, J. Evaluación asistida por inteligencia artificial generativa en prácticas de Ingeniería de Software: una prueba de...
2026
-
[5]
DOI: 10.5944/ried.47173. [Gauthier 2026] Gauthier, P . and the Aider contributors. Aider Documentation. https:// aider.chat/docs/. Accessed 2026-05-22. [Google 2026a] Google. Gemini Developer API pricing. https://ai.google.dev/gemini-api/ docs/pricing. Accessed 2026-05-22. [Google 2026b] Google. Context caching — Gemini API. https://ai.google.dev/gemini-a...
-
[6]
SmolAgents — Tools-as-code agents
(The turtles all the way down image is attributed in this source to a lecture by Bertrand Russell; the underlying World Turtle cosmology predates the modern attribution.) 40 [HuggingFace 2024] HuggingFace. SmolAgents — Tools-as-code agents. https://github. com/huggingface/smolagents. Released 2024; Accessed 2026-05-23. (The single-execu - tion-tool / tool...
2024
-
[7]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
https:// ghuntley.com/loop/ [IEEE Software 2026] IEEE Computer Society. Call for Papers: Engineering Agentic Systems. IEEE Software special issue. https://www.computer.org/digital-library/magazines/so/ cfp-engineering-agentic-systems. Accessed 2026-05-22. [Inngest 2026] Inngest. Inngest: durable workflow engine. https://www.inngest.com/. Ac- cessed 2026-0...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[8]
Self-Refine: Iterative Refinement with Self-Feedback
[LangChain 2026] LangChain. LangChain documentation. https://docs.langchain.com/. Accessed 2026-05-22. [LangChain 2026b] LangChain. LangGraph overview. https://docs.langchain.com/oss/ python/langgraph/overview. Accessed 2026-05-22. [Legati Workshop 2026] Alier, M. The Legati Workshop — a scaffold for AI-augmented intel - lectual workspaces. GitHub reposit...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
Specification, Version 2025-06-18
[MCP 2025] Model Context Protocol. Specification, Version 2025-06-18. https://modelconte xtprotocol.io/specification/2025-06-18. Accessed 2026-05-22. [MCP 2026] Model Context Protocol. Authorization. https://modelcontextprotocol.io/ specification/draft/basic/authorization. Accessed 2026-05-22. [MindStudio 2026] MindStudio. MCP Servers Use 35× More Tokens ...
2025
-
[10]
Accessed 2026-06-08
https://github.com/ yoheinakajima/babyagi. Accessed 2026-06-08. 41 [NVD 2024] National Vulnerability Database. CVE-2024-3094. https://nvd.nist.gov/ vuln/detail/CVE-2024-3094. Accessed 2026-05-22. [OpenAI 2022] OpenAI. Introducing ChatGPT. https://openai.com/index/chatgpt/. Pub- lished November 2022; Accessed 2026-05-22. [OpenAI 2023] OpenAI. Function call...
2026
-
[11]
Inbound Malware Volume Report
[PyPI 2023] PyPI. Inbound Malware Volume Report. https://blog.pypi.org/posts/2023-09- 18-inbound-malware-reporting/. Accessed 2026-05-22. [PyPI 2026] Larson, S. and Fiedler, M. Incident Report: LiteLLM/Telnyx supply-chain attacks, with guidance. PyPI Blog, 2 April
2023
-
[12]
Why CLI Tools Are Beating MCP for AI Agents
https://blog.pypi.org/posts/2026-04-02-incident- report-litellm-telnyx-supply-chain-attack/ [Reinhard 2026] Reinhard, J. Why CLI Tools Are Beating MCP for AI Agents. https://jannikreinhard.com/2026/02/22/why-cli-tools-are-beating-mcp-for-ai-agents/. Published 2026-02-22; Accessed 2026-05-22. [Restate 2026] Restate. Restate: durable execution for resilient...
-
[13]
Toolformer: Language Models Can Teach Themselves to Use Tools
[Schick et al. 2023] Schick, T. et al. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Reflexion: Language Agents with Verbal Reinforcement Learning
[Shinn et al. 2023] Shinn, N. et al. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
OpenClaw: The Viral AI Agent that Broke the Internet
[Steinberger 2026] Steinberger, P . OpenClaw: The Viral AI Agent that Broke the Internet. Lex Fridman Podcast #491,
2026
-
[16]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Transcript: https://lexfridman.com/peter-steinberger- transcript. Accessed 2026-06-07. [Temporal 2026] Temporal Technologies. Temporal: durable execution platform. https:// temporal.io/. Accessed 2026-06-07. [Vensas 2026] Vensas. MCP vs CLI: Cost Comparison for AI Agent Tooling. https://vensas. de/en/blog/mcp-vs-cli-cost-comparison. Accessed 2026-05-22. [...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
A Survey on Large Language Model based Autonomous Agents
42 [Wang et al. 2023b] Wang, L. et al. A Survey on Large Language Model based Autonomous Agents. arXiv:2308.11432,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
[Wu et al. 2023] Wu, Q. et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
The Rise and Potential of Large Language Model Based Agents: A Survey
[Xi et al. 2023] Xi, Z. et al. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864,
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
ReAct: Synergizing Reasoning and Acting in Language Models
[Yao et al. 2022] Yao, S. et al. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629,
work page internal anchor Pith review Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.