From Determinism to Delegation: AI-Native Software Engineering and the Evolution of the Agentic Engineer

Mamdouh Alenezi

arxiv: 2606.28791 · v1 · pith:3OQ3KWKInew · submitted 2026-06-27 · 💻 cs.SE

From Determinism to Delegation: AI-Native Software Engineering and the Evolution of the Agentic Engineer

Mamdouh Alenezi This is my paper

Pith reviewed 2026-06-30 09:01 UTC · model grok-4.3

classification 💻 cs.SE

keywords AI-native software engineeringagentic engineerparadigm shiftautonomous agentshuman-AI collaborationagent workflowssoftware engineering rolesoutcome ownership

0 comments

The pith

AI-Native Software Engineering creates the Agentic Engineer whose main output is supervised autonomous systems instead of programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that large language models enabling multi-step autonomous behavior are driving a shift in software engineering from deterministic code writing to the supervision of probabilistic agent workflows. This change redefines professional roles by moving the primary artifact from traditional programs to agentic systems that incorporate reasoning loops, tool use, and memory. A sympathetic reader would care because the three transitions in work units, correctness standards, and accountability suggest that classical engineering skills must now include governance of uncertain behaviors rather than pure code authorship. The review draws on post-2022 studies to show mixed productivity results and concludes that success depends on symbiosis with established software engineering principles.

Core claim

AI-Native Software Engineering is a paradigm shift rather than a mere tooling advance, creating a new professional archetype: the Agentic Engineer, whose primary artifact is the agentic system rather than the program. The transition occurs through three changes: the unit of work shifts from functions to supervised agent workflows, correctness shifts from binary assertions to statistical evaluation under uncertainty, and accountability shifts from code authorship to outcome ownership. Core mechanisms of autonomous agents include reasoning-acting loops, context engineering, tool use, memory, behavioral drift, and compositional error, all placed within socio-technical human-AI collaboration fra

What carries the argument

The Agentic Engineer archetype, whose work centers on designing and overseeing agentic systems built from reasoning-acting loops, context engineering, tool use, and memory management.

If this is right

Engineering value moves from writing deterministic functions to supervising probabilistic workflows that require statistical evaluation.
Accountability centers on outcome ownership instead of individual code authorship.
Human oversight and governance become the critical competency rather than raw automation.
Agentic engineering depends on and extends classical software engineering principles in a symbiotic relationship.
Risks such as indirect prompt injection and behavioral drift must be managed through established governance frameworks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training programs may need to add modules on statistical verification and agent workflow design alongside traditional coding.
Organizations could develop new metrics that track supervision effectiveness and outcome reliability instead of lines of code or commit counts.
The mixed productivity evidence points to a need for longitudinal studies separating novice gains from expert slowdowns in real projects.
If agentic systems proliferate, liability frameworks may shift from code defects to responsibility for delegated decision outcomes.

Load-bearing premise

The three transitions in unit of work, correctness criteria, and accountability represent a fundamental paradigm shift rather than an incremental change in tools and practices.

What would settle it

Empirical observation that experienced developers continue to spend the majority of their time writing and verifying deterministic code, with agent supervision remaining a minor or optional activity even after widespread LLM adoption.

Figures

Figures reproduced from arXiv: 2606.28791 by Mamdouh Alenezi.

**Figure 1.** Figure 1: The canonical agentic loop. A human-in-the-loop (HITL) gate mediates consequential or irreversible actions [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Compositional reliability under the independence assumption of Eq. (5). High per-step success rates decay [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The supervised-agency spectrum. Most enterprise deployments in regulated domains operate left of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Software engineering is experiencing its most significant transformation since the emergence of high-level programming languages. As large language models (LLMs) increasingly enable sustained, multi-step, tool-mediated execution, engineering value is shifting from writing deterministic code to supervising probabilistic and autonomous behavior. This paper argues that AI-Native Software Engineering is a paradigm shift rather than a mere tooling advance, creating a new professional archetype: the Agentic Engineer, whose primary artifact is the agentic system rather than the program. We characterize this transition through three changes: (i) the unit of work shifts from functions to supervised agent workflows, (ii) correctness shifts from binary assertions to statistical evaluation under uncertainty, and (iii) accountability shifts from code authorship to outcome ownership. Drawing on post-2022 research, we compare traditional and agentic engineering roles and define core mechanisms of autonomous agents, including reasoning-acting loops, context engineering, tool use, memory, behavioral drift, and compositional error. We place human-AI collaboration within socio-technical frameworks and examine mixed empirical evidence. While some studies report productivity gains, others show slowdowns among experienced developers, highlighting disciplined oversight rather than automation as the critical competency. Using established governance frameworks, we identify required skills and risks, including indirect prompt injection. We conclude that the future is one of symbiosis rather than substitution: agentic engineering builds upon and depends on classical software engineering principles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual synthesis that labels post-2022 trends as a paradigm shift to the Agentic Engineer but adds no new data, derivations, or mechanisms.

read the letter

The paper's core move is to argue that AI-native software engineering marks a real break, not just better tools, and that the new role is the Agentic Engineer who supervises probabilistic workflows instead of writing deterministic code. It lays out three transitions—unit of work from functions to agent workflows, correctness from binary to statistical, and accountability from authorship to outcome ownership—and sketches mechanisms like reasoning-acting loops, tool use, memory, and behavioral drift.

It does a fair job pulling together the mixed productivity findings from the literature and stressing that disciplined oversight matters more than raw automation. The discussion of risks such as indirect prompt injection and the placement inside socio-technical governance frameworks is practical and grounded in existing work. The closing emphasis on symbiosis rather than substitution follows directly from the cited evidence.

The limitation is that everything stays at the level of interpretive framing. No new measurements, controlled comparisons, or formal definitions appear in the paper itself; the three transitions are presented as fundamental without showing why they cannot be treated as incremental practice changes. The term Agentic Engineer functions as a label rather than a sharply defined construct with testable properties.

This piece would interest people working on SE curricula, team structures, or risk management in AI-assisted development. Readers wanting concrete mechanisms or fresh empirical results will not find them. I would not send it for peer review in a research journal because it lacks the evidential or formal content that referees need to assess; it could fit a perspectives or industry track instead.

Referee Report

0 major / 3 minor

Summary. The paper claims that software engineering is undergoing a paradigm shift driven by LLMs enabling sustained multi-step tool-mediated execution, moving from deterministic code writing to supervising probabilistic autonomous agents. It introduces the 'Agentic Engineer' as a new professional archetype whose primary artifact is the agentic system. The argument is organized around three transitions: (i) unit of work from functions to supervised agent workflows, (ii) correctness from binary assertions to statistical evaluation under uncertainty, and (iii) accountability from code authorship to outcome ownership. The manuscript synthesizes post-2022 research, defines agent mechanisms (reasoning-acting loops, context engineering, tool use, memory, behavioral drift, compositional error), situates collaboration in socio-technical frameworks, reviews mixed productivity evidence, identifies governance skills and risks such as indirect prompt injection, and concludes that the future involves symbiosis with classical software engineering principles rather than substitution.

Significance. If the interpretive framing is adopted by the community, the paper provides a balanced conceptual synthesis that could help organize discussion on evolving SE roles, education, and governance amid AI integration. It explicitly notes mixed empirical evidence on productivity gains versus slowdowns and emphasizes continuity with traditional principles, avoiding overstatement. The enumeration of specific agent mechanisms and risks offers concrete anchors for subsequent technical work. As a position paper without new derivations or datasets, its contribution is primarily in framing and synthesis rather than falsifiable claims.

minor comments (3)

The three transitions are presented as characterizing the shift, but the manuscript does not provide explicit criteria or examples distinguishing them from incremental tooling changes; adding one or two concrete case contrasts would clarify the paradigm-shift claim without altering scope.
References to 'post-2022 research' and 'established governance frameworks' are invoked repeatedly; expanding the citation list with specific examples in a dedicated related-work subsection would improve traceability for readers.
The term 'Agentic Engineer' is introduced in the abstract and used throughout; a short dedicated paragraph early in the introduction defining its core responsibilities relative to existing roles (e.g., SRE, MLOps) would aid precision.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their balanced and constructive summary of the manuscript, including recognition of its framing as a position paper, the enumeration of agent mechanisms, and the emphasis on mixed empirical evidence and continuity with classical principles. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a conceptual position paper synthesizing post-2022 external research into an interpretive argument about three transitions (unit of work, correctness, accountability) framing a paradigm shift to the Agentic Engineer. No equations, derivations, parameter fittings, or quantitative predictions exist; claims rely on cited external literature rather than self-referential definitions or self-citation chains. The central framing is presented as a choice of perspective, not a technical result that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces conceptual distinctions and a new professional archetype without quantitative parameters or formal proofs; it rests on domain assumptions about what constitutes a paradigm shift in engineering practice.

axioms (2)

domain assumption The emergence of sustained, multi-step, tool-mediated LLM execution constitutes a fundamental change in the nature of software engineering work.
Invoked in the opening characterization of the transformation and the three shifts.
domain assumption Correctness in agentic systems is appropriately evaluated through statistical measures under uncertainty rather than binary assertions.
Stated as one of the three core changes defining the new paradigm.

invented entities (1)

Agentic Engineer no independent evidence
purpose: New professional archetype whose primary artifact is the agentic system.
Introduced as the central new role created by the paradigm shift; no independent empirical evidence provided for its distinct existence.

pith-pipeline@v0.9.1-grok · 5781 in / 1378 out tokens · 28212 ms · 2026-06-30T09:01:39.738719+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,

E.-M. Grote, C. Koldewey, S. E. Schwarz, R. Dumitrescu, and A. Albers, “Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,” inProc. IEEE Int. Conf. Eng., Technol. Innov. (ICE/ITMC), 2025, pp. 1–9, doi:10.1109/ICE/ITMC65658.2025.11106533

work page doi:10.1109/ice/itmc65658.2025.11106533 2025
[2]

Software Developers, Quality Assurance Analysts, and Testers,

U.S. Bureau of Labor Statistics, “Software Developers, Quality Assurance Analysts, and Testers,” Occupational Outlook Handbook, 2024. [Online]. Available:https://www.bls.gov/ooh/ computer-and-information-technology/software-developers.htm

2024
[3]

Agentic AI Systems: What It Is and Isn’t,

Y . K. Dwivediet al., “Agentic AI Systems: What It Is and Isn’t,”Global Business and Organizational Excellence, vol. 45, no. 3, pp. 253–263, 2026, doi:10.1002/joe.70018

work page doi:10.1002/joe.70018 2026
[4]

AI Agents and Agentic Systems: A Multi-Expert Analysis,

L. Hugheset al., “AI Agents and Agentic Systems: A Multi-Expert Analysis,”Journal of Computer Information Systems, vol. 65, no. 4, pp. 489–517, 2025

2025
[5]

A Survey on Large Language Model Based Autonomous Agents

L. Wanget al., “A Survey on Large Language Model Based Autonomous Agents,”Frontiers of Computer Science, vol. 18, no. 6, art. 186345, 2024, doi:10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[6]

Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,

T. Guoet al., “Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,” inProc. 33rd Int. Joint Conf. Artif. Intell. (IJCAI), 2024, pp. 8048–8057, doi:10.24963/ijcai.2024/890

work page doi:10.24963/ijcai.2024/890 2024
[7]

Large Language Model-Based Agents for Software Engineering: A Survey

J. Liu, K. Wang, Y . Chen, X. Peng, Z. Chen, L. Zhang, and Y . Lou, “Large Language Model-Based Agents for Software Engineering: A Survey,”arXiv:2409.02977, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. Narasimhan, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inProc. Int. Conf. Learning Representations (ICLR), 2024

2024
[9]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024

2024
[10]

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,

S. Honget al., “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” inProc. Int. Conf. Learning Representations (ICLR), 2024

2024
[11]

Zhang, H

Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “AutoCodeRover: Autonomous Program Improvement,” inProc. 33rd ACM SIGSOFT Int. Symp. Software Testing and Analysis (ISSTA), 2024, pp. 1592–1604, doi:10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024
[12]

AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,

A. N. Saha and D. Patra, “AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,”ESP Journal of Engineering & Technology Advance- ments, vol. 6, no. 1, pp. 131–139, 2026. 10 APREPRINT- JUNE30, 2026

2026
[13]

Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,

F. R. Cogo, G. A. Oliva, and A. E. Hassan, “Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,”ACM Trans. Software Engineering and Methodology, 2026, doi:10.1145/3802581

work page doi:10.1145/3802581 2026
[14]

Attention Is All You Need,

A. Vaswaniet al., “Attention Is All You Need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

2017
[15]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

J. Weiet al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 24824–24837

2022
[16]

ReAct: Synergizing Reasoning and Acting in Language Models,

S. Yaoet al., “ReAct: Synergizing Reasoning and Acting in Language Models,” inProc. Int. Conf. Learning Representations (ICLR), 2023

2023
[17]

Toolformer: Language Models Can Teach Themselves to Use Tools,

T. Schicket al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

2023
[18]

Reflexion: Language Agents with Verbal Reinforcement Learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language Agents with Verbal Reinforcement Learning,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

2023
[19]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,

P. Lewiset al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 9459–9474

2020
[20]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,

Q. Wuet al., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” inProc. Conf. Language Modeling (COLM), 2024

2024
[21]

Cognitive Architectures for Language Agents,

T. R. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths, “Cognitive Architectures for Language Agents,”Trans- actions on Machine Learning Research (TMLR), 2024

2024
[22]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,

L. Zhenget al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” inAdvances in Neural Informa- tion Processing Systems (NeurIPS), vol. 36, 2023

2023
[23]

A Survey on LLM-as-a-Judge

J. Guet al., “A Survey on LLM-as-a-Judge,”arXiv:2411.15594, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,

D. Liet al., “From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,” arXiv:2411.16594, 2024

work page arXiv 2024
[25]

Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security , pages =

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90, doi:10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023
[26]

INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,” inFindings of the Association for Computational Linguistics (ACL Findings), 2024, pp. 10471–10506

2024
[27]

AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,

E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tram `er, “AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, vol. 37, 2024

2024
[28]

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer, “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot,”arXiv:2302.06590, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,

Z. Cui, M. Demirer, S. Jaffe, L. Musolff, S. Peng, and T. Salz, “The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,”Management Science, 2026, doi:10.1287/mnsc.2025.00535

work page doi:10.1287/mnsc.2025.00535 2026
[30]

Measuring GitHub Copilot’s Impact on Productivity,

A. Ziegleret al., “Measuring GitHub Copilot’s Impact on Productivity,”Communications of the ACM, vol. 67, no. 3, pp. 54–63, 2024, doi:10.1145/3633453

work page doi:10.1145/3633453 2024
[31]

Becker, N

J. Becker, N. Rush, B. Barnes, and D. Rein, “Measuring the Impact of Early-2025 AI on Experienced Open- Source Developer Productivity,” Model Evaluation & Threat Research (METR),arXiv:2507.09089, 2025

work page arXiv 2025
[32]

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,

I. Badertdinovet al., “SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[33]

The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,

D. Horne, “The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,” inProc. Int. Conf. AI Revolution, Cham: Springer Nature Switzerland, 2025, pp. 434–455

2025
[34]

AI Engineering,

Carnegie Mellon University Software Engineering Institute, “AI Engineering,” 2024. [Online]. Available: https://www.sei.cmu.edu/artificial-intelligence-engineering/

2024
[35]

Introducing the Model Context Protocol,

Anthropic, “Introducing the Model Context Protocol,” 2024. [Online]. Available:https://www.anthropic. com/news/model-context-protocol 11 APREPRINT- JUNE30, 2026

2024
[36]

International Organization for Standardization,ISO/IEC 42001:2023 – Information Technology – Artificial Intel- ligence – Management System, 2023

2023
[37]

IEEE,IEEE 7000-2021 – IEEE Standard Model Process for Addressing Ethical Concerns During System Design, 2021

2021
[38]

National Institute of Standards and Technology,Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, 2023

2023
[39]

[Online]

SFIA Foundation,Skills Framework for the Information Age (SFIA 9), 2024. [Online]. Available:https:// sfia-online.org/

2024
[40]

ACM/IEEE-CS/AAAI Joint Task Force,Computer Science Curricula 2023 (CS2023), ACM Press, 2024, doi:10.1145/3664191. 12

work page doi:10.1145/3664191 2023

[1] [1]

Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,

E.-M. Grote, C. Koldewey, S. E. Schwarz, R. Dumitrescu, and A. Albers, “Enabling Better Systems Through Better Teams: 27 Role Profiles for Engineering Advanced Systems,” inProc. IEEE Int. Conf. Eng., Technol. Innov. (ICE/ITMC), 2025, pp. 1–9, doi:10.1109/ICE/ITMC65658.2025.11106533

work page doi:10.1109/ice/itmc65658.2025.11106533 2025

[2] [2]

Software Developers, Quality Assurance Analysts, and Testers,

U.S. Bureau of Labor Statistics, “Software Developers, Quality Assurance Analysts, and Testers,” Occupational Outlook Handbook, 2024. [Online]. Available:https://www.bls.gov/ooh/ computer-and-information-technology/software-developers.htm

2024

[3] [3]

Agentic AI Systems: What It Is and Isn’t,

Y . K. Dwivediet al., “Agentic AI Systems: What It Is and Isn’t,”Global Business and Organizational Excellence, vol. 45, no. 3, pp. 253–263, 2026, doi:10.1002/joe.70018

work page doi:10.1002/joe.70018 2026

[4] [4]

AI Agents and Agentic Systems: A Multi-Expert Analysis,

L. Hugheset al., “AI Agents and Agentic Systems: A Multi-Expert Analysis,”Journal of Computer Information Systems, vol. 65, no. 4, pp. 489–517, 2025

2025

[5] [5]

A Survey on Large Language Model Based Autonomous Agents

L. Wanget al., “A Survey on Large Language Model Based Autonomous Agents,”Frontiers of Computer Science, vol. 18, no. 6, art. 186345, 2024, doi:10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024

[6] [6]

Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,

T. Guoet al., “Large Language Model Based Multi-Agents: A Survey of Progress and Challenges,” inProc. 33rd Int. Joint Conf. Artif. Intell. (IJCAI), 2024, pp. 8048–8057, doi:10.24963/ijcai.2024/890

work page doi:10.24963/ijcai.2024/890 2024

[7] [7]

Large Language Model-Based Agents for Software Engineering: A Survey

J. Liu, K. Wang, Y . Chen, X. Peng, Z. Chen, L. Zhang, and Y . Lou, “Large Language Model-Based Agents for Software Engineering: A Survey,”arXiv:2409.02977, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. Narasimhan, “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” inProc. Int. Conf. Learning Representations (ICLR), 2024

2024

[9] [9]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024

2024

[10] [10]

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,

S. Honget al., “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” inProc. Int. Conf. Learning Representations (ICLR), 2024

2024

[11] [11]

Zhang, H

Y . Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “AutoCodeRover: Autonomous Program Improvement,” inProc. 33rd ACM SIGSOFT Int. Symp. Software Testing and Analysis (ISSTA), 2024, pp. 1592–1604, doi:10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024

[12] [12]

AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,

A. N. Saha and D. Patra, “AI-First Software Development Lifecycle: An Agent-Driven Framework for Au- tonomous Planning, Coding, Testing, and Deployment,”ESP Journal of Engineering & Technology Advance- ments, vol. 6, no. 1, pp. 131–139, 2026. 10 APREPRINT- JUNE30, 2026

2026

[13] [13]

Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,

F. R. Cogo, G. A. Oliva, and A. E. Hassan, “Compiler.next: A Search-Based Compiler to Power the AI-Native Fu- ture of Software Engineering,”ACM Trans. Software Engineering and Methodology, 2026, doi:10.1145/3802581

work page doi:10.1145/3802581 2026

[14] [14]

Attention Is All You Need,

A. Vaswaniet al., “Attention Is All You Need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008

2017

[15] [15]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

J. Weiet al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 24824–24837

2022

[16] [16]

ReAct: Synergizing Reasoning and Acting in Language Models,

S. Yaoet al., “ReAct: Synergizing Reasoning and Acting in Language Models,” inProc. Int. Conf. Learning Representations (ICLR), 2023

2023

[17] [17]

Toolformer: Language Models Can Teach Themselves to Use Tools,

T. Schicket al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

2023

[18] [18]

Reflexion: Language Agents with Verbal Reinforcement Learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language Agents with Verbal Reinforcement Learning,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

2023

[19] [19]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,

P. Lewiset al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 9459–9474

2020

[20] [20]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,

Q. Wuet al., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” inProc. Conf. Language Modeling (COLM), 2024

2024

[21] [21]

Cognitive Architectures for Language Agents,

T. R. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths, “Cognitive Architectures for Language Agents,”Trans- actions on Machine Learning Research (TMLR), 2024

2024

[22] [22]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,

L. Zhenget al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” inAdvances in Neural Informa- tion Processing Systems (NeurIPS), vol. 36, 2023

2023

[23] [23]

A Survey on LLM-as-a-Judge

J. Guet al., “A Survey on LLM-as-a-Judge,”arXiv:2411.15594, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,

D. Liet al., “From Generation to Judgment: Opportunities and Challenges of LLM-as-a-Judge,” arXiv:2411.16594, 2024

work page arXiv 2024

[25] [25]

Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security , pages =

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90, doi:10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023

[26] [26]

INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “INJECAGENT: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents,” inFindings of the Association for Computational Linguistics (ACL Findings), 2024, pp. 10471–10506

2024

[27] [27]

AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,

E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tram `er, “AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, vol. 37, 2024

2024

[28] [28]

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer, “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot,”arXiv:2302.06590, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,

Z. Cui, M. Demirer, S. Jaffe, L. Musolff, S. Peng, and T. Salz, “The Effects of Generative AI on High- Skilled Work: Evidence from Three Field Experiments with Software Developers,”Management Science, 2026, doi:10.1287/mnsc.2025.00535

work page doi:10.1287/mnsc.2025.00535 2026

[30] [30]

Measuring GitHub Copilot’s Impact on Productivity,

A. Ziegleret al., “Measuring GitHub Copilot’s Impact on Productivity,”Communications of the ACM, vol. 67, no. 3, pp. 54–63, 2024, doi:10.1145/3633453

work page doi:10.1145/3633453 2024

[31] [31]

Becker, N

J. Becker, N. Rush, B. Barnes, and D. Rein, “Measuring the Impact of Early-2025 AI on Experienced Open- Source Developer Productivity,” Model Evaluation & Threat Research (METR),arXiv:2507.09089, 2025

work page arXiv 2025

[32] [32]

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,

I. Badertdinovet al., “SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evalua- tion of Software Engineering Agents,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025

[33] [33]

The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,

D. Horne, “The Agentic AI Mindset: A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation,” inProc. Int. Conf. AI Revolution, Cham: Springer Nature Switzerland, 2025, pp. 434–455

2025

[34] [34]

AI Engineering,

Carnegie Mellon University Software Engineering Institute, “AI Engineering,” 2024. [Online]. Available: https://www.sei.cmu.edu/artificial-intelligence-engineering/

2024

[35] [35]

Introducing the Model Context Protocol,

Anthropic, “Introducing the Model Context Protocol,” 2024. [Online]. Available:https://www.anthropic. com/news/model-context-protocol 11 APREPRINT- JUNE30, 2026

2024

[36] [36]

International Organization for Standardization,ISO/IEC 42001:2023 – Information Technology – Artificial Intel- ligence – Management System, 2023

2023

[37] [37]

IEEE,IEEE 7000-2021 – IEEE Standard Model Process for Addressing Ethical Concerns During System Design, 2021

2021

[38] [38]

National Institute of Standards and Technology,Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, 2023

2023

[39] [39]

[Online]

SFIA Foundation,Skills Framework for the Information Age (SFIA 9), 2024. [Online]. Available:https:// sfia-online.org/

2024

[40] [40]

ACM/IEEE-CS/AAAI Joint Task Force,Computer Science Curricula 2023 (CS2023), ACM Press, 2024, doi:10.1145/3664191. 12

work page doi:10.1145/3664191 2023