Agentic Software: How AI Agents Are Restructuring the Software Paradigm

Zhenfeng Cao

arxiv: 2606.05608 · v2 · pith:O4DXWHAMnew · submitted 2026-06-04 · 💻 cs.SE · cs.AI

Agentic Software: How AI Agents Are Restructuring the Software Paradigm

Zhenfeng Cao This is my paper

Pith reviewed 2026-06-28 00:44 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords AI agentsagentic softwaresoftware paradigmruntime logic generationAgent-as-a-ServiceAgentic Engineeringsoftware engineering

0 comments

The pith

AI agents restructure software by making the agent itself the software with runtime-generated decision logic instead of static code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that AI agents, powered by large language models that dynamically generate and discard code, mark a fundamental change in what software is rather than an incremental advance. In traditional software, humans encode fixed decision logic into code that remains static. In the agentic model, the agent serves as the software and creates its own logic during operation. This builds on prior shifts like licensed software to SaaS by also moving decision-making complexity away from users. The authors define Agentic Engineering as a distinct discipline with new objects of study, control mechanisms, and human roles, backed by benchmark reviews and a development roadmap.

Core claim

The emergence of AI agents constitutes a fundamental restructuring of what software is, not an incremental tool improvement. In traditional deterministic software, code is the carrier of pre-written decision logic. In agentic software, the agent itself is the software, and its decision logic is generated at runtime. The historical progression from licensed software to SaaS to Agent-as-a-Service transfers increasing complexity from end-users, including decision-making itself, and expands software engineering into Agentic Engineering focused on agent systems with LLM-driven control and humans as intent architects.

What carries the argument

The distinction between traditional deterministic software, where code carries pre-written decision logic, and agentic software, where the agent generates decision logic at runtime using LLMs as the reasoning engine.

If this is right

Agentic Engineering expands the discipline with agent systems as its core object of study, LLM-driven control models, and humans acting as intent architects instead of code authors.
The service model evolution to Agent-as-a-Service transfers decision-making complexity away from end-users in addition to operational complexity.
Benchmark analyses such as SWE-bench Verified and multi-agent coordination studies reveal both the potential and the present limitations of the agentic approach.
A four-stage roadmap outlines progression toward self-evolving agent ecosystems with practical recommendations for the transition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Software development practices may shift toward specifying high-level intents and overseeing runtime behavior rather than authoring detailed code.
Verification and security methods will likely need adaptation to handle dynamically generated logic that changes during execution.
Existing static software systems could integrate with agents through hybrid architectures that preserve pre-written components where stability is required.

Load-bearing premise

The assumption that runtime-generated logic by agents creates a structural break from pre-written logic in static code rather than continuing existing automation trends.

What would settle it

A demonstration that fully functional agent behaviors can be replicated exactly using only static pre-written code without runtime generation or LLM involvement.

Figures

Figures reproduced from arXiv: 2606.05608 by Zhenfeng Cao.

**Figure 2.** Figure 2: Agent performance on the EvoClaw benchmark [6]. When evaluated on [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents -- systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource -- constitutes a fundamental restructuring of what software is, not an incremental tool improvement. We formalize the distinction between traditional deterministic software and agentic software: in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime. We trace the historical arc from licensed software to SaaS to Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users -- with the agentic shift transferring not just operational complexity but decision-making complexity itself. We introduce Agentic Engineering as an expansion of the software engineering discipline into a new paradigm, distinct in its core object of study (agent systems rather than static source code), its control model (LLM-driven rather than human-predefined), and its human role (intent architect rather than code author). Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain's multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes AI agents as a paradigm shift in software but the claimed structural break rests on definitions that skip over decades of runtime-adaptive systems.

read the letter

The main point is that this is a position paper arguing AI agents move software from static pre-written logic to runtime-generated decision making, with the agent itself becoming the software. The distinction is presented as fundamental rather than incremental.

What is new is the coinage of Agentic Engineering and Agent-as-a-Service, plus the explicit historical line from licensed software through SaaS to this stage. The paper also gathers benchmark references such as SWE-bench Verified and multi-agent coordination studies to show both promise and current limits.

It does a reasonable job describing how human roles could shift toward intent specification and outlining a four-stage roadmap. The discussion of limitations draws directly from the cited work rather than overclaiming.

The soft spot is the load-bearing premise that only agentic systems generate logic at runtime while traditional code is purely static. Rule engines, expert systems, and meta-programming have long separated carrier code from dynamically produced behavior. The paper does not engage that history, so the restructuring claim stays definitional instead of demonstrated.

This is for software engineering readers tracking how AI might change the field's object of study. It raises questions worth discussing even if the central distinction needs tighter grounding against prior dynamic systems.

Send it to peer review. The ideas are coherent enough on their own terms to benefit from referee feedback on the historical comparison and evidence.

Referee Report

3 major / 2 minor

Summary. The paper claims that AI agents—where LLMs act as the primary reasoning engine dynamically generating and discarding code—mark a fundamental restructuring of software, not an incremental improvement. It formalizes the distinction between traditional deterministic software (code as carrier of pre-written decision logic) and agentic software (agent as software with runtime-generated logic), traces the arc from licensed software to SaaS to Agent-as-a-Service (AaaS), introduces Agentic Engineering as a new paradigm with distinct objects of study and human roles, analyzes benchmarks such as SWE-bench Verified and EvoClaw to show potential and limitations, and outlines a four-stage roadmap to self-evolving agent ecosystems.

Significance. If the definitional distinction and historical arc can be substantiated against prior adaptive systems, the work would reposition software engineering around runtime agent behavior rather than static code, with practitioner implications for shifting from code authorship to intent architecture. The manuscript offers no machine-checked proofs, reproducible code, or parameter-free derivations; its value would rest on whether the paradigm claim survives engagement with existing literature on dynamic systems.

major comments (3)

[Abstract and formalization section] Abstract and formalization section: The central claim that the shift 'in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime' constitutes a structural break is load-bearing for the restructuring assertion and the definitions of Agentic Engineering and AaaS. The manuscript asserts this distinction without deriving it from independent external criteria or addressing prior runtime-adaptive systems (rule engines such as Drools, meta-programming, expert systems, or genetic programming) that already separate static carrier code from dynamically generated behavior.
[Historical arc and Agentic Engineering sections] Historical arc and Agentic Engineering sections: The progression licensed → SaaS → AaaS is presented as successively transferring decision-making complexity itself. This premise is undermined by the absence of engagement with historical precedents in automation that already perform similar transfers of complexity away from static pre-written logic, reducing the claimed ontological change to an increase in scale rather than a paradigm restructuring.
[Benchmark analysis section] Benchmark analysis section: References to SWE-bench Verified, EvoClaw, and LangChain multi-agent studies are invoked to demonstrate transformative potential. No specific quantitative results, controls, or comparisons are supplied showing how these benchmarks establish a fundamental structural break beyond incremental automation improvements.

minor comments (2)

[Terminology] The invented terms 'Agentic Engineering' and 'Agent-as-a-Service (AaaS)' are introduced without explicit differentiation from related concepts already present in the autonomous agents and multi-agent systems literature.
[Roadmap] The four-stage roadmap lacks concrete, falsifiable metrics or evaluation criteria for progression between stages.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can more explicitly substantiate its central claims. We address each major comment below and commit to revisions that engage the cited literatures and provide additional detail on benchmarks while preserving the paper's core argument.

read point-by-point responses

Referee: [Abstract and formalization section] Abstract and formalization section: The central claim that the shift 'in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime' constitutes a structural break is load-bearing for the restructuring assertion and the definitions of Agentic Engineering and AaaS. The manuscript asserts this distinction without deriving it from independent external criteria or addressing prior runtime-adaptive systems (rule engines such as Drools, meta-programming, expert systems, or genetic programming) that already separate static carrier code from dynamically generated behavior.

Authors: We agree that the formalization would be strengthened by explicit derivation from external criteria and direct comparison to prior systems. The manuscript's distinction centers on LLMs functioning as the primary reasoning engine that generates decision logic at runtime without reliance on pre-authored rules or knowledge bases; rule engines and expert systems still encode human-authored logic (even if conditionally applied), meta-programming operates within static language constraints, and genetic programming evolves code offline rather than through continuous LLM-driven runtime generation and discarding. We will revise the formalization section to include a dedicated comparison subsection that derives the structural-break criteria from the nature of the reasoning engine and addresses these precedents. revision: yes
Referee: [Historical arc and Agentic Engineering sections] Historical arc and Agentic Engineering sections: The progression licensed → SaaS → AaaS is presented as successively transferring decision-making complexity itself. This premise is undermined by the absence of engagement with historical precedents in automation that already perform similar transfers of complexity away from static pre-written logic, reducing the claimed ontological change to an increase in scale rather than a paradigm restructuring.

Authors: The historical arc focuses on the incremental transfer of decision-making complexity specifically enabled by LLM agents, which move beyond operational delegation to runtime generation of logic itself. We acknowledge that engagement with broader automation history (e.g., expert systems, workflow automation) would better situate this claim. We will expand the historical arc and Agentic Engineering sections to discuss relevant precedents, clarifying why the LLM-driven runtime generation represents a qualitative shift in the object of study and human role rather than merely a scale increase. revision: yes
Referee: [Benchmark analysis section] Benchmark analysis section: References to SWE-bench Verified, EvoClaw, and LangChain multi-agent studies are invoked to demonstrate transformative potential. No specific quantitative results, controls, or comparisons are supplied showing how these benchmarks establish a fundamental structural break beyond incremental automation improvements.

Authors: The benchmark analysis draws on published results from these sources to illustrate both capabilities and current limitations of agentic systems. To make the quantitative grounding explicit, we will revise the section to include specific metrics (e.g., success rates on SWE-bench Verified tasks, coordination metrics from LangChain studies) and note the controls and baselines reported in the original works, while acknowledging that these benchmarks do not yet provide definitive proof of a structural break. revision: yes

Circularity Check

1 steps flagged

Core restructuring claim reduces to self-defined distinction between static code and runtime-generated logic

specific steps

self definitional [Abstract]
"We formalize the distinction between traditional deterministic software and agentic software: in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime."

The paper claims this distinction constitutes a fundamental restructuring of software (not incremental), but the distinction itself is introduced definitionally to separate pre-written from runtime-generated logic; the restructuring conclusion is therefore equivalent to the definitional premise by construction rather than derived from external benchmarks or prior systems.

full rationale

The paper's central argument for a paradigm shift rests on a distinction introduced by definition in the abstract, with the 'fundamental restructuring' asserted as following directly from that definition rather than from independent external criteria or historical comparison. This matches the self-definitional pattern. The benchmark analysis sections appear to contain separate empirical content and are not reduced by construction. No self-citations, fitted predictions, or other enumerated patterns are present in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The argument depends on conceptual definitions of agentic software and Agentic Engineering with no free parameters, machine-checked results, or external data; new terms function as invented framing devices.

axioms (1)

domain assumption AI agents use large language models as the primary reasoning engine that dynamically generates and discards code as an instrumental resource.
This premise defines the boundary between traditional and agentic software in the abstract.

invented entities (2)

Agentic Engineering no independent evidence
purpose: New discipline expanding software engineering to agent systems
Introduced as distinct in object of study, control model, and human role.
Agent-as-a-Service (AaaS) no independent evidence
purpose: Next historical stage after SaaS that transfers decision-making complexity
Framed as the current shift in the software delivery arc.

pith-pipeline@v0.9.1-grok · 5796 in / 1339 out tokens · 46041 ms · 2026-06-28T00:44:55.399901+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 2 linked inside Pith

[1]

Naur and B

P . Naur and B. Randell, Eds.,Software Engineering: Report on a Conference Sponsored by the NATO Science Committee. Garmisch, Germany: NATO, 1968

1968
[2]

F. P . Brooks,The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1975. (Anniversary Edition with new chapters, 1995.)

1975
[3]

Software 2.0,

A. Karpathy, “Software 2.0,”Medium, Nov. 2017. [Online]. Available:https:// karpathy.medium.com/software-2-0-a64152b37c35(Accessed: June 4, 2026)

2017
[4]

Agents in Software Engineering: Survey, Landscape, and Vision,

Y. Wang, W. Zhong, Y. Huang, E. Shi, M. Yang, J. Chen, H. Li, Y. Ma, Q. Wang, and Z. Zheng, “Agents in Software Engineering: Survey, Landscape, and Vision,” arXiv preprint arXiv:2409.09030, 2024

arXiv 2024
[5]

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement,

Y. Ma, R. Cao, Y. Cao, Y. Zhang, J. Chen, Y. Liu, Y. Liu, B. Li, F. Huang, and Y. Li, “Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement,”arXiv preprint arXiv:2411.00622, 2024

arXiv 2024
[6]

EvoClaw: Evaluating AI Agents on Continuous Software Evolution,

G. Deng, Z. Chen, Z. Yu, H. Fan, Y. Liu, Y. Yang, D. Parikh, R. Kannan, L. Cong, M. Wang, Q. Zhang, V . Prasanna, X. Tang, and X. Wang, “EvoClaw: Evaluating AI Agents on Continuous Software Evolution,”arXiv preprint arXiv:2603.13428, 2026

Pith/arXiv arXiv 2026
[7]

Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering,

R. Kumar and P . Ramagopal, “Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering,”LangChain Blog, Apr. 2026. [Online]. Available:https://www.langchain.com/blog/ agentic-engineering-redefining-software-engineering(Accessed: June 4, 2026)

2026
[8]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022
[9]

ReAct: Syn- ergizing Reasoning and Acting in Language Models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Syn- ergizing Reasoning and Acting in Language Models,” inInternational Conference on Learning Representations (ICLR), 2023

2023
[10]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

X. Wang, Y. Wang, Y. Wan, F. Mi, Y. Li, P . Zhou, L. Shang, X. Jiang, and Q. Liu, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inIn- ternational Conference on Learning Representations (ICLR), 2024

2024
[11]

MetaGPT: Meta Programming for a Multi-Agent Collabo- rative Framework,

S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhouet al., “MetaGPT: Meta Programming for a Multi-Agent Collabo- rative Framework,” inInternational Conference on Learning Representations (ICLR), 2024. 14

2024
[12]

Language Models are Few-Shot Learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P . Dhariwal, A. Nee- lakantan, P . Shyam, G. Sastry, A. Askellet al., “Language Models are Few-Shot Learners,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020
[13]

Large Language Model based Multi-Agents: A Survey of Progress and Chal- lenges,

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V . Chawla, O. Wiest, and X. Zhang, “Large Language Model based Multi-Agents: A Survey of Progress and Chal- lenges,” inInternational Joint Conference on Artificial Intelligence (IJCAI), 2024. [On- line]. Available:https://arxiv.org/abs/2402.01680

Pith/arXiv arXiv 2024
[14]

Hermes Agent: The Self-Improving AI Agent,

Nous Research, “Hermes Agent: The Self-Improving AI Agent,” 2025–2026. [Online]. Available:https://github.com/NousResearch/hermes-agent— Doc- umentation:https://hermes-agent.nousresearch.com/docs(Accessed: June 4, 2026). 15

2025

[1] [1]

Naur and B

P . Naur and B. Randell, Eds.,Software Engineering: Report on a Conference Sponsored by the NATO Science Committee. Garmisch, Germany: NATO, 1968

1968

[2] [2]

F. P . Brooks,The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1975. (Anniversary Edition with new chapters, 1995.)

1975

[3] [3]

Software 2.0,

A. Karpathy, “Software 2.0,”Medium, Nov. 2017. [Online]. Available:https:// karpathy.medium.com/software-2-0-a64152b37c35(Accessed: June 4, 2026)

2017

[4] [4]

Agents in Software Engineering: Survey, Landscape, and Vision,

Y. Wang, W. Zhong, Y. Huang, E. Shi, M. Yang, J. Chen, H. Li, Y. Ma, Q. Wang, and Z. Zheng, “Agents in Software Engineering: Survey, Landscape, and Vision,” arXiv preprint arXiv:2409.09030, 2024

arXiv 2024

[5] [5]

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement,

Y. Ma, R. Cao, Y. Cao, Y. Zhang, J. Chen, Y. Liu, Y. Liu, B. Li, F. Huang, and Y. Li, “Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement,”arXiv preprint arXiv:2411.00622, 2024

arXiv 2024

[6] [6]

EvoClaw: Evaluating AI Agents on Continuous Software Evolution,

G. Deng, Z. Chen, Z. Yu, H. Fan, Y. Liu, Y. Yang, D. Parikh, R. Kannan, L. Cong, M. Wang, Q. Zhang, V . Prasanna, X. Tang, and X. Wang, “EvoClaw: Evaluating AI Agents on Continuous Software Evolution,”arXiv preprint arXiv:2603.13428, 2026

Pith/arXiv arXiv 2026

[7] [7]

Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering,

R. Kumar and P . Ramagopal, “Agentic Engineering: How Swarms of AI Agents Are Redefining Software Engineering,”LangChain Blog, Apr. 2026. [Online]. Available:https://www.langchain.com/blog/ agentic-engineering-redefining-software-engineering(Accessed: June 4, 2026)

2026

[8] [8]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022

[9] [9]

ReAct: Syn- ergizing Reasoning and Acting in Language Models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Syn- ergizing Reasoning and Acting in Language Models,” inInternational Conference on Learning Representations (ICLR), 2023

2023

[10] [10]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

X. Wang, Y. Wang, Y. Wan, F. Mi, Y. Li, P . Zhou, L. Shang, X. Jiang, and Q. Liu, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inIn- ternational Conference on Learning Representations (ICLR), 2024

2024

[11] [11]

MetaGPT: Meta Programming for a Multi-Agent Collabo- rative Framework,

S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhouet al., “MetaGPT: Meta Programming for a Multi-Agent Collabo- rative Framework,” inInternational Conference on Learning Representations (ICLR), 2024. 14

2024

[12] [12]

Language Models are Few-Shot Learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P . Dhariwal, A. Nee- lakantan, P . Shyam, G. Sastry, A. Askellet al., “Language Models are Few-Shot Learners,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020

[13] [13]

Large Language Model based Multi-Agents: A Survey of Progress and Chal- lenges,

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V . Chawla, O. Wiest, and X. Zhang, “Large Language Model based Multi-Agents: A Survey of Progress and Chal- lenges,” inInternational Joint Conference on Artificial Intelligence (IJCAI), 2024. [On- line]. Available:https://arxiv.org/abs/2402.01680

Pith/arXiv arXiv 2024

[14] [14]

Hermes Agent: The Self-Improving AI Agent,

Nous Research, “Hermes Agent: The Self-Improving AI Agent,” 2025–2026. [Online]. Available:https://github.com/NousResearch/hermes-agent— Doc- umentation:https://hermes-agent.nousresearch.com/docs(Accessed: June 4, 2026). 15

2025