pith. sign in

arxiv: 2606.28791 · v1 · pith:3OQ3KWKInew · submitted 2026-06-27 · 💻 cs.SE

From Determinism to Delegation: AI-Native Software Engineering and the Evolution of the Agentic Engineer

Pith reviewed 2026-06-30 09:01 UTC · model grok-4.3

classification 💻 cs.SE
keywords AI-native software engineeringagentic engineerparadigm shiftautonomous agentshuman-AI collaborationagent workflowssoftware engineering rolesoutcome ownership
0
0 comments X

The pith

AI-Native Software Engineering creates the Agentic Engineer whose main output is supervised autonomous systems instead of programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that large language models enabling multi-step autonomous behavior are driving a shift in software engineering from deterministic code writing to the supervision of probabilistic agent workflows. This change redefines professional roles by moving the primary artifact from traditional programs to agentic systems that incorporate reasoning loops, tool use, and memory. A sympathetic reader would care because the three transitions in work units, correctness standards, and accountability suggest that classical engineering skills must now include governance of uncertain behaviors rather than pure code authorship. The review draws on post-2022 studies to show mixed productivity results and concludes that success depends on symbiosis with established software engineering principles.

Core claim

AI-Native Software Engineering is a paradigm shift rather than a mere tooling advance, creating a new professional archetype: the Agentic Engineer, whose primary artifact is the agentic system rather than the program. The transition occurs through three changes: the unit of work shifts from functions to supervised agent workflows, correctness shifts from binary assertions to statistical evaluation under uncertainty, and accountability shifts from code authorship to outcome ownership. Core mechanisms of autonomous agents include reasoning-acting loops, context engineering, tool use, memory, behavioral drift, and compositional error, all placed within socio-technical human-AI collaboration fra

What carries the argument

The Agentic Engineer archetype, whose work centers on designing and overseeing agentic systems built from reasoning-acting loops, context engineering, tool use, and memory management.

If this is right

  • Engineering value moves from writing deterministic functions to supervising probabilistic workflows that require statistical evaluation.
  • Accountability centers on outcome ownership instead of individual code authorship.
  • Human oversight and governance become the critical competency rather than raw automation.
  • Agentic engineering depends on and extends classical software engineering principles in a symbiotic relationship.
  • Risks such as indirect prompt injection and behavioral drift must be managed through established governance frameworks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Training programs may need to add modules on statistical verification and agent workflow design alongside traditional coding.
  • Organizations could develop new metrics that track supervision effectiveness and outcome reliability instead of lines of code or commit counts.
  • The mixed productivity evidence points to a need for longitudinal studies separating novice gains from expert slowdowns in real projects.
  • If agentic systems proliferate, liability frameworks may shift from code defects to responsibility for delegated decision outcomes.

Load-bearing premise

The three transitions in unit of work, correctness criteria, and accountability represent a fundamental paradigm shift rather than an incremental change in tools and practices.

What would settle it

Empirical observation that experienced developers continue to spend the majority of their time writing and verifying deterministic code, with agent supervision remaining a minor or optional activity even after widespread LLM adoption.

Figures

Figures reproduced from arXiv: 2606.28791 by Mamdouh Alenezi.

Figure 1
Figure 1. Figure 1: The canonical agentic loop. A human-in-the-loop (HITL) gate mediates consequential or irreversible actions [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Compositional reliability under the independence assumption of Eq. (5). High per-step success rates decay [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The supervised-agency spectrum. Most enterprise deployments in regulated domains operate left of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Software engineering is experiencing its most significant transformation since the emergence of high-level programming languages. As large language models (LLMs) increasingly enable sustained, multi-step, tool-mediated execution, engineering value is shifting from writing deterministic code to supervising probabilistic and autonomous behavior. This paper argues that AI-Native Software Engineering is a paradigm shift rather than a mere tooling advance, creating a new professional archetype: the Agentic Engineer, whose primary artifact is the agentic system rather than the program. We characterize this transition through three changes: (i) the unit of work shifts from functions to supervised agent workflows, (ii) correctness shifts from binary assertions to statistical evaluation under uncertainty, and (iii) accountability shifts from code authorship to outcome ownership. Drawing on post-2022 research, we compare traditional and agentic engineering roles and define core mechanisms of autonomous agents, including reasoning-acting loops, context engineering, tool use, memory, behavioral drift, and compositional error. We place human-AI collaboration within socio-technical frameworks and examine mixed empirical evidence. While some studies report productivity gains, others show slowdowns among experienced developers, highlighting disciplined oversight rather than automation as the critical competency. Using established governance frameworks, we identify required skills and risks, including indirect prompt injection. We conclude that the future is one of symbiosis rather than substitution: agentic engineering builds upon and depends on classical software engineering principles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that software engineering is undergoing a paradigm shift driven by LLMs enabling sustained multi-step tool-mediated execution, moving from deterministic code writing to supervising probabilistic autonomous agents. It introduces the 'Agentic Engineer' as a new professional archetype whose primary artifact is the agentic system. The argument is organized around three transitions: (i) unit of work from functions to supervised agent workflows, (ii) correctness from binary assertions to statistical evaluation under uncertainty, and (iii) accountability from code authorship to outcome ownership. The manuscript synthesizes post-2022 research, defines agent mechanisms (reasoning-acting loops, context engineering, tool use, memory, behavioral drift, compositional error), situates collaboration in socio-technical frameworks, reviews mixed productivity evidence, identifies governance skills and risks such as indirect prompt injection, and concludes that the future involves symbiosis with classical software engineering principles rather than substitution.

Significance. If the interpretive framing is adopted by the community, the paper provides a balanced conceptual synthesis that could help organize discussion on evolving SE roles, education, and governance amid AI integration. It explicitly notes mixed empirical evidence on productivity gains versus slowdowns and emphasizes continuity with traditional principles, avoiding overstatement. The enumeration of specific agent mechanisms and risks offers concrete anchors for subsequent technical work. As a position paper without new derivations or datasets, its contribution is primarily in framing and synthesis rather than falsifiable claims.

minor comments (3)
  1. The three transitions are presented as characterizing the shift, but the manuscript does not provide explicit criteria or examples distinguishing them from incremental tooling changes; adding one or two concrete case contrasts would clarify the paradigm-shift claim without altering scope.
  2. References to 'post-2022 research' and 'established governance frameworks' are invoked repeatedly; expanding the citation list with specific examples in a dedicated related-work subsection would improve traceability for readers.
  3. The term 'Agentic Engineer' is introduced in the abstract and used throughout; a short dedicated paragraph early in the introduction defining its core responsibilities relative to existing roles (e.g., SRE, MLOps) would aid precision.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their balanced and constructive summary of the manuscript, including recognition of its framing as a position paper, the enumeration of agent mechanisms, and the emphasis on mixed empirical evidence and continuity with classical principles. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a conceptual position paper synthesizing post-2022 external research into an interpretive argument about three transitions (unit of work, correctness, accountability) framing a paradigm shift to the Agentic Engineer. No equations, derivations, parameter fittings, or quantitative predictions exist; claims rely on cited external literature rather than self-referential definitions or self-citation chains. The central framing is presented as a choice of perspective, not a technical result that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces conceptual distinctions and a new professional archetype without quantitative parameters or formal proofs; it rests on domain assumptions about what constitutes a paradigm shift in engineering practice.

axioms (2)
  • domain assumption The emergence of sustained, multi-step, tool-mediated LLM execution constitutes a fundamental change in the nature of software engineering work.
    Invoked in the opening characterization of the transformation and the three shifts.
  • domain assumption Correctness in agentic systems is appropriately evaluated through statistical measures under uncertainty rather than binary assertions.
    Stated as one of the three core changes defining the new paradigm.
invented entities (1)
  • Agentic Engineer no independent evidence
    purpose: New professional archetype whose primary artifact is the agentic system.
    Introduced as the central new role created by the paradigm shift; no independent empirical evidence provided for its distinct existence.

pith-pipeline@v0.9.1-grok · 5781 in / 1378 out tokens · 28212 ms · 2026-06-30T09:01:39.738719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 15 canonical work pages · 3 internal anchors

  1. [1]

    Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,

    E.-M. Grote, C. Koldewey, S. E. Schwarz, R. Dumitrescu, and A. Albers, “Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,” inProc. IEEE Int. Conf. Eng., Technol. Innov. (ICE/ITMC), 2025, pp. 1–9, doi:10.1109/ICE/ITMC65658.2025.11106533

  2. [2]

    Software Developers, Quality Assurance Analysts, and Testers,

    U.S. Bureau of Labor Statistics, “Software Developers, Quality Assurance Analysts, and Testers,” Occupational Outlook Handbook, 2024. [Online]. Available:https://www.bls.gov/ooh/ computer-and-information-technology/software-developers.htm

  3. [3]

    Agentic AI Systems: What It Is and Isn’t,

    Y . K. Dwivediet al., “Agentic AI Systems: What It Is and Isn’t,”Global Business and Organizational Excellence, vol. 45, no. 3, pp. 253–263, 2026, doi:10.1002/joe.70018

  4. [4]

    AI Agents and Agentic Systems: A Multi-Expert Analysis,

    L. Hugheset al., “AI Agents and Agentic Systems: A Multi-Expert Analysis,”Journal of Computer Information Systems, vol. 65, no. 4, pp. 489–517, 2025

  5. [5]

    A Survey on Large Language Model Based Autonomous Agents

    L. Wanget al., “A Survey on Large Language Model Based Autonomous Agents,”Frontiers of Computer Science, vol. 18, no. 6, art. 186345, 2024, doi:10.1007/s11704-024-40231-1

  6. [6]

    Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,

    T. Guoet al., “Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,” inProc. 33rd Int. Joint Conf. Artif. Intell. (IJCAI), 2024, pp. 8048–8057, doi:10.24963/ijcai.2024/890

  7. [7]

    Large Language Model-Based Agents for Software Engineering: A Survey

    J. Liu, K. Wang, Y . Chen, X. Peng, Z. Chen, L. Zhang, and Y . Lou, “Large Language Model-Based Agents for Software Engineering: A Survey,”arXiv:2409.02977, 2024

  8. [8]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. Narasimhan, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inProc. Int. Conf. Learning Representations (ICLR), 2024

  9. [9]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024

  10. [10]

    MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,

    S. Honget al., “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” inProc. Int. Conf. Learning Representations (ICLR), 2024

  11. [11]

    Zhang, H

    Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “AutoCodeRover: Autonomous Program Improvement,” inProc. 33rd ACM SIGSOFT Int. Symp. Software Testing and Analysis (ISSTA), 2024, pp. 1592–1604, doi:10.1145/3650212.3680384

  12. [12]

    AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,

    A. N. Saha and D. Patra, “AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,”ESP Journal of Engineering & Technology Advance- ments, vol. 6, no. 1, pp. 131–139, 2026. 10 APREPRINT- JUNE30, 2026

  13. [13]

    Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,

    F. R. Cogo, G. A. Oliva, and A. E. Hassan, “Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,”ACM Trans. Software Engineering and Methodology, 2026, doi:10.1145/3802581

  14. [14]

    Attention Is All You Need,

    A. Vaswaniet al., “Attention Is All You Need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

  15. [15]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

    J. Weiet al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 24824–24837

  16. [16]

    ReAct: Synergizing Reasoning and Acting in Language Models,

    S. Yaoet al., “ReAct: Synergizing Reasoning and Acting in Language Models,” inProc. Int. Conf. Learning Representations (ICLR), 2023

  17. [17]

    Toolformer: Language Models Can Teach Themselves to Use Tools,

    T. Schicket al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

  18. [18]

    Reflexion: Language Agents with Verbal Reinforcement Learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language Agents with Verbal Reinforcement Learning,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

  19. [19]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,

    P. Lewiset al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 9459–9474

  20. [20]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,

    Q. Wuet al., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” inProc. Conf. Language Modeling (COLM), 2024

  21. [21]

    Cognitive Architectures for Language Agents,

    T. R. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths, “Cognitive Architectures for Language Agents,”Trans- actions on Machine Learning Research (TMLR), 2024

  22. [22]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,

    L. Zhenget al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” inAdvances in Neural Informa- tion Processing Systems (NeurIPS), vol. 36, 2023

  23. [23]

    A Survey on LLM-as-a-Judge

    J. Guet al., “A Survey on LLM-as-a-Judge,”arXiv:2411.15594, 2024

  24. [24]

    From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,

    D. Liet al., “From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,” arXiv:2411.16594, 2024

  25. [25]

    Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security , pages =

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90, doi:10.1145/3605764.3623985

  26. [26]

    INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,

    Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,” inFindings of the Association for Computational Linguistics (ACL Findings), 2024, pp. 10471–10506

  27. [27]

    AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,

    E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tram `er, “AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, vol. 37, 2024

  28. [28]

    The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

    S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer, “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot,”arXiv:2302.06590, 2023

  29. [29]

    The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,

    Z. Cui, M. Demirer, S. Jaffe, L. Musolff, S. Peng, and T. Salz, “The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,”Management Science, 2026, doi:10.1287/mnsc.2025.00535

  30. [30]

    Measuring GitHub Copilot’s Impact on Productivity,

    A. Ziegleret al., “Measuring GitHub Copilot’s Impact on Productivity,”Communications of the ACM, vol. 67, no. 3, pp. 54–63, 2024, doi:10.1145/3633453

  31. [31]

    Becker, N

    J. Becker, N. Rush, B. Barnes, and D. Rein, “Measuring the Impact of Early-2025 AI on Experienced Open- Source Developer Productivity,” Model Evaluation & Threat Research (METR),arXiv:2507.09089, 2025

  32. [32]

    SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,

    I. Badertdinovet al., “SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

  33. [33]

    The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,

    D. Horne, “The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,” inProc. Int. Conf. AI Revolution, Cham: Springer Nature Switzerland, 2025, pp. 434–455

  34. [34]

    AI Engineering,

    Carnegie Mellon University Software Engineering Institute, “AI Engineering,” 2024. [Online]. Available: https://www.sei.cmu.edu/artificial-intelligence-engineering/

  35. [35]

    Introducing the Model Context Protocol,

    Anthropic, “Introducing the Model Context Protocol,” 2024. [Online]. Available:https://www.anthropic. com/news/model-context-protocol 11 APREPRINT- JUNE30, 2026

  36. [36]

    International Organization for Standardization,ISO/IEC 42001:2023 – Information Technology – Artificial Intel- ligence – Management System, 2023

  37. [37]

    IEEE,IEEE 7000-2021 – IEEE Standard Model Process for Addressing Ethical Concerns During System Design, 2021

  38. [38]

    National Institute of Standards and Technology,Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, 2023

  39. [39]

    [Online]

    SFIA Foundation,Skills Framework for the Information Age (SFIA 9), 2024. [Online]. Available:https:// sfia-online.org/

  40. [40]

    ACM/IEEE-CS/AAAI Joint Task Force,Computer Science Curricula 2023 (CS2023), ACM Press, 2024, doi:10.1145/3664191. 12