Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle

Helena Holmstr\"om Olsson; Jan Bosch; Spyridon Alvanakis Apostolou

arxiv: 2605.15245 · v1 · pith:4A66SVTMnew · submitted 2026-05-14 · 💻 cs.SE

Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle

Spyridon Alvanakis Apostolou , Jan Bosch , Helena Holmstr\"om Olsson This is my paper

Pith reviewed 2026-05-19 16:22 UTC · model grok-4.3

classification 💻 cs.SE

keywords agentic AIsoftware development life cyclesystematic literature reviewoutput verifiabilityAI agentsplanner executor reviewerindustrial adoptionSDLC phases

0 comments

The pith

Output verifiability enables agentic AI adoption mainly in later software development phases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The review synthesizes evidence that agentic AI systems reach industrial maturity where outputs can be checked objectively through execution or tests. Earlier phases such as requirements and design stay mostly academic because their outputs lack this direct feedback. The work shows that teams cope with limitations by restricting agents to bounded, verifiable actions. A reader seeking practical guidance would learn which parts of the development process are ready for agent deployment now. The authors also introduce a multi-agent screening method to handle large numbers of papers efficiently.

Core claim

Output verifiability is the primary enabler of agentic adoption: later SDLC phases, whose outputs are objectively evaluable through executable feedback, demonstrate the highest maturity and industrial presence, while earlier phases remain almost exclusively academic proofs-of-concept. The Planner-Executor-Reviewer role specialization is the dominant architectural pattern, with the Reviewer agent implementing verifiability through executable feedback loops. Across all challenge categories, industrial mitigation strategies converge on confining agent actions to verifiable, bounded spaces.

What carries the argument

The Planner-Executor-Reviewer role specialization, which divides responsibilities among agents so that a reviewer can apply executable feedback to confirm task completion.

Load-bearing premise

The 92 manually verified primary studies, selected after multi-agent screening of over 1600 candidates, form a representative sample that accurately captures dominant patterns in both academic and industrial agentic AI use.

What would settle it

Discovery of multiple documented industrial deployments of agentic AI in requirements engineering or high-level design that were missed by the review process.

Figures

Figures reproduced from arXiv: 2605.15245 by Helena Holmstr\"om Olsson, Jan Bosch, Spyridon Alvanakis Apostolou.

**Figure 1.** Figure 1: Analytical workflow of the multi-agent collaboration and consensus mechanism. minimal reviewer input, raw publication metadata (title, DOI, abstract, etc.) and a single detailed prompt containing the research purpose, research questions, and selection criteria. The pipeline includes self-curation of missing abstracts, inter-agent classification discussions, and produces binary relevant/irrelevant labels wi… view at source ↗

read the original abstract

Agentic AI in software product development is increasingly adopted by organizations, yet the field lacks a consolidated synthesis of where adoption is mature, which architectural patterns dominate, and what limitations and coping mechanisms exist in industrial deployments. This systematic literature review addresses these gaps by establishing a body of knowledge as a starting point. Following Kitchenham guidelines, we queried four major research databases, obtaining over 1600 candidate publications. To handle this volume, we developed and validated a domain-agnostic multi-agent screening pipeline that extends prior LLM-assisted review tools by combining automatic metadata curation, inter-agent iterative dialogue, and conflict-resolution defaults that minimize false negatives. From the 92 manually verified primary studies, our thematic synthesis reveals that output verifiability is the primary enabler of agentic adoption: later SDLC phases, whose outputs are objectively evaluable through executable feedback, demonstrate the highest maturity and industrial presence, while earlier phases remain almost exclusively academic proofs-of-concept. We identify the Planner-Executor-Reviewer role specialization as the dominant architectural pattern, with the Reviewer agent implementing verifiability through executable feedback loops. Across all challenge categories, industrial mitigation strategies converge on confining agent actions to verifiable, bounded spaces. This study contributes a comprehensive characterization of the current literature on agentic systems in software product development, and a methodological contribution in the form of an AI-assisted tool to automate the screening phase in high-volume SLR domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This review maps agentic AI maturity across SDLC phases and credits verifiability for higher adoption later on, but the 92-study sample leaves the main pattern open to retrieval bias questions.

read the letter

The main takeaway from this paper is that agentic AI shows higher maturity and industrial use in later stages of the software development lifecycle because those phases produce outputs that can be objectively verified through execution and feedback. Earlier phases like requirements and design mostly feature academic prototypes. They also describe a multi-agent pipeline for screening literature. What the work does well is the methodological contribution. The authors developed a domain-agnostic pipeline that uses multiple agents for iterative dialogue and conflict resolution to screen more than 1600 papers from four databases, reducing them to 92 manually verified primary studies. This extends previous LLM-assisted review techniques and could be adopted by others facing similar volumes. Their thematic synthesis consolidates observations on architectural patterns, particularly the Planner-Executor-Reviewer specialization where the reviewer enforces verifiability via executable loops. It also notes that industrial strategies focus on keeping agent actions within verifiable bounds. The softer area is the strength of the central claim about verifiability as the primary enabler. This rests on the thematic analysis of the 92 studies. The concern is whether the selection process captured a representative set, especially for industrial applications in early phases that might not use standard academic terms or appear in the searched databases. Without reported quantitative validation metrics for the pipeline, inter-rater details, or sensitivity analysis, it's possible the observed phase differences partly reflect publication and retrieval patterns rather than a pure effect of verifiability. They mention following Kitchenham guidelines and validating the pipeline, but more transparency there would help. This paper is aimed at software engineering researchers and practitioners interested in how AI agents are being integrated into development processes. It provides a useful overview and starting point for understanding current limitations and mitigations. Readers working on similar reviews or on agentic systems would find the map and the tool description valuable. It deserves a serious referee because it offers a grounded synthesis and a concrete methodological advance, even with room for strengthening the bias discussion. I would recommend sending this to peer review.

Referee Report

2 major / 2 minor

Summary. This systematic literature review follows Kitchenham guidelines to synthesize agentic AI applications across the SDLC. Querying four databases yielded over 1600 candidates; a domain-agnostic multi-agent screening pipeline (with automatic metadata curation, inter-agent dialogue, and conflict resolution) reduced this to 92 manually verified primary studies. Thematic synthesis concludes that output verifiability is the primary enabler of adoption: later SDLC phases exhibit highest maturity and industrial presence due to executable feedback, while earlier phases remain largely academic proofs-of-concept. The dominant architecture is Planner-Executor-Reviewer specialization, with the Reviewer implementing verifiability via feedback loops. Industrial mitigations converge on confining agents to verifiable bounded spaces. The work also contributes the AI-assisted screening tool for high-volume SLRs.

Significance. If the synthesis holds, the review consolidates knowledge on adoption patterns, architectural dominance, and practical limitations in agentic AI for software development, offering a useful body of knowledge for researchers and practitioners. Explicit credit is due for the methodological contribution of the validated multi-agent screening pipeline, which extends prior LLM-assisted tools and could improve efficiency in future SLRs. The phase-specific maturity distinction, if robust, provides a falsifiable framing for future empirical work on verifiability as an adoption driver.

major comments (2)

[Methods] Methods section (screening pipeline description): The abstract and methods claim validation of the multi-agent screening pipeline applied to >1600 candidates, yet no quantitative metrics (precision, recall, F1, or inter-rater agreement) or explicit handling of publication bias are reported. This is load-bearing for the central claim, as the representativeness of the final 92 studies directly supports inferences about phase-specific maturity and industrial presence versus academic proofs-of-concept.
[Results] Results/Thematic synthesis section: The conclusion that output verifiability drives higher maturity in later phases rests on thematic analysis of publication counts and descriptions across the 92 studies, without sensitivity analysis on screening thresholds or external validation against industry surveys. If the pipeline systematically under-samples non-academic or early-phase work (due to terminology or venue differences), the observed pattern may reflect retrieval bias rather than a genuine enabler effect.

minor comments (2)

[Abstract] Abstract and introduction: The phrasing 'domain-agnostic multi-agent screening pipeline' is introduced without a forward reference to its detailed specification or limitations in the methods; a brief cross-reference would improve readability.
[Introduction] The paper cites prior LLM-assisted review tools but does not explicitly contrast its conflict-resolution defaults against those baselines; adding one sentence on the incremental extension would clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential impact of our systematic review and the multi-agent screening pipeline. We provide point-by-point responses to the major comments below, indicating where revisions will be made to address the concerns raised.

read point-by-point responses

Referee: [Methods] Methods section (screening pipeline description): The abstract and methods claim validation of the multi-agent screening pipeline applied to >1600 candidates, yet no quantitative metrics (precision, recall, F1, or inter-rater agreement) or explicit handling of publication bias are reported. This is load-bearing for the central claim, as the representativeness of the final 92 studies directly supports inferences about phase-specific maturity and industrial presence versus academic proofs-of-concept.

Authors: We agree with the referee that quantitative metrics are important for validating the screening pipeline and supporting the representativeness of the 92 studies. Although the manuscript emphasizes the pipeline's design features for reducing false negatives and the subsequent manual verification, specific performance metrics were not reported in the initial version. In the revised manuscript, we will add these metrics, including precision, recall, F1-score, and inter-rater agreement (e.g., Cohen's kappa) calculated from a sample of papers screened both by the pipeline and human reviewers. We will also include a discussion of publication bias, detailing our search across multiple databases and efforts to include diverse sources, while acknowledging limitations. revision: yes
Referee: [Results] Results/Thematic synthesis section: The conclusion that output verifiability drives higher maturity in later phases rests on thematic analysis of publication counts and descriptions across the 92 studies, without sensitivity analysis on screening thresholds or external validation against industry surveys. If the pipeline systematically under-samples non-academic or early-phase work (due to terminology or venue differences), the observed pattern may reflect retrieval bias rather than a genuine enabler effect.

Authors: We take this concern seriously, as it questions whether the phase-specific maturity pattern is robust or an artifact of screening bias. Our synthesis is grounded in the detailed thematic analysis of the included studies, which consistently show greater industrial adoption and verifiability in later SDLC phases. To strengthen this, we will perform and report a sensitivity analysis varying the screening thresholds in the multi-agent pipeline and examining the impact on the observed distributions. We will also compare our findings with external industry surveys on AI adoption in software development to provide additional validation. We will expand the discussion of potential biases in the revised Limitations section. revision: yes

Circularity Check

0 steps flagged

Minor self-citation in methodological extension; central synthesis remains independent

full rationale

The paper follows standard Kitchenham SLR guidelines to query four databases, applies a multi-agent screening pipeline described as an extension of prior LLM-assisted review tools, manually verifies 92 primary studies, and performs thematic synthesis to identify patterns such as higher maturity in later SDLC phases due to output verifiability. The derivation chain consists of external evidence aggregation rather than any self-referential reduction; the 92 studies are drawn from the indexed literature and the conclusions are inferences from their reported content, not fitted parameters or definitions that presuppose the result. A single reference to extending prior tools constitutes at most a minor non-load-bearing self-citation, consistent with normal scholarly practice and not forcing the phase-maturity claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central synthesis rests on the representativeness of the screened literature and the fidelity of the thematic analysis; the methodological contribution introduces a new screening pipeline whose validation details are not supplied in the abstract.

axioms (1)

domain assumption Kitchenham guidelines provide a valid and complete protocol for conducting systematic literature reviews in software engineering
Invoked to justify the overall review process and search strategy.

invented entities (1)

domain-agnostic multi-agent screening pipeline no independent evidence
purpose: Automate initial screening of large candidate sets while minimizing false negatives through inter-agent dialogue and conflict resolution
Presented as an extension of prior LLM-assisted tools; no independent evidence of correctness beyond the authors' validation claim is given in the abstract.

pith-pipeline@v0.9.0 · 5793 in / 1287 out tokens · 57186 ms · 2026-05-19T16:22:24.733504+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

[1]

IEEE Access13, 18912–18936 (2025), https://ieeexplore.ieee.org/abstract/document/10849561

Acharya, D.B., Kuppan, K., Divya, B.: Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access13, 18912–18936 (2025), https://ieeexplore.ieee.org/abstract/document/10849561

work page arXiv 2025
[2]

In: 2025 IEEE International Conference for Women in Innovation, Technology & Entrepreneurship (ICWITE)

Adapa, C., Anjana, A., Rahim, R., Victor, A.: A multi-agent ai framework for agile workflow automation, issue resolution, and developer performance evaluation. In: 2025 IEEE International Conference for Women in Innovation, Technology & Entrepreneurship (ICWITE). pp. 1–6. IEEE (2025)

work page 2025
[3]

Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159

work page 2025
[4]

Future Internet17(9) (Sep 2025),https://www.mdpi.com/1999-5903/ 17/9/404

Bandi, A., Kongari, B., Naguru, R., et al.: The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges. Future Internet17(9) (Sep 2025),https://www.mdpi.com/1999-5903/ 17/9/404

work page 2025
[5]

ArXiv , year=

Becker, J., Rush, N., Barnes, E., Rein, D.: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (Jul 2025),http://arxiv. org/abs/2507.09089

work page arXiv 2025
[6]

Organization Studies29(3), 393–413 (Mar 2008),https:// doi.org/10.1177/0170840607088020

Denyer, D., Tranfield, D., van Aken, J.E.: Developing Design Propositions through Research Synthesis. Organization Studies29(3), 393–413 (Mar 2008),https:// doi.org/10.1177/0170840607088020

work page doi:10.1177/0170840607088020 2008
[7]

SAGE (2017)

Gough, D., Thomas, J., Oliver, S.: An introduction to systematic reviews. SAGE (2017)

work page 2017
[8]

Gusenbauer, M., Haddaway, N.R.: Which academic search systems are suitable for systematic reviews or meta-analyses? Research Synthesis Methods11(2), 181–217 (2020),https://onlinelibrary.wiley.com/doi/abs/10.1002/jrsm.1378

work page doi:10.1002/jrsm.1378 2020
[9]

Hariharan, M., Arvapalli, S., Barma, S., Sheela, E.: Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration (Oct 2025), http://arxiv.org/abs/2510.10824

work page arXiv 2025
[10]

He, J., Treude, C., Lo, D.: LLM-Based Multi-Agent Systems for Software En- gineering: Literature Review, Vision and the Road Ahead (Jul 2025), http: //arxiv.org/abs/2404.04834

work page arXiv 2025
[11]

Array26, 100399 (Jul 2025), https://www.sciencedirect

Hosseini, S., Seilani, H.: The role of agentic AI in shaping a smart future: A systematic review. Array26, 100399 (Jul 2025), https://www.sciencedirect. com/science/article/pii/S2590005625000268

work page 2025
[12]

Hu, Y., Cai, Y., Du, Y., et al.: Self-Evolving Multi-Agent Collaboration Networks for Software Development (Oct 2024),http://arxiv.org/abs/2410.16946

work page arXiv 2024
[13]

Electronics14(15), 3008 (Jan 2025),https://www.mdpi.com/2079-9292/14/15/3008

Ji, X., Zhang, L., Zhang, W., et al.: LEMAD: LLM-Empowered Multi-Agent System for Anomaly Detection in Power Grid Services. Electronics14(15), 3008 (Jan 2025),https://www.mdpi.com/2079-9292/14/15/3008

work page 2025
[14]

Jin, H., Sun, Z., Chen, H.: RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance (Oct 2024),http://arxiv.org/abs/2410.01242

work page arXiv 2024
[15]

Khoee, A.G., Yu, Y., Feldt, R., et al.: GoNoGo: An Efficient LLM-based Multi- Agent System for Streamlining Automotive Software Release Decision-Making (Sep 2024),http://arxiv.org/abs/2408.09785

work page arXiv 2024
[16]

Keele (2007)

Kitchenham, B., Charters, S., et al.: Guidelines for performing systematic literature reviews in software engineering. Keele (2007)

work page 2007
[17]

Kohl, J., Kruse, O., Mostafa, Y., et al.: Automated structural testing of LLM-based agents: methods, framework, and case studies (Jan 2026),http://arxiv.org/abs/ 2601.18827 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson

work page arXiv 2026
[18]

Liu, J., Wang, K., Chen, Y., et al.: Large Language Model-Based Agents for Software Engineering: A Survey (Dec 2025),http://arxiv.org/abs/2409.02977

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

In: 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC)

Mavani, V.K.: Codebase Aware Generative Agents for the SDLC: Automating Documentation, Dependency Analysis and Test Generation. In: 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC). pp. 1–4 (Feb 2026), https://ieeexplore.ieee.org/document/11395666

work page arXiv 2026
[20]

Murali, V., Maddila, C., Ahmad, I., et al.: AI-Assisted Code Authoring at Scale: Fine-Tuning, Deploying, and Mixed Methods Evaluation. Proc. ACM Softw. Eng. 1(FSE), 48:1066–48:1085 (Jul 2024),https://dl.acm.org/doi/10.1145/3643774

work page doi:10.1145/3643774 2024
[21]

IEEE Access14, 7443–7465 (2026),https:// ieeexplore.ieee.org/abstract/document/11343819

Otoum, N., Elkhalili, N.: Methods and Techniques of Agentic Software Engineering: A Systematic Literature Review. IEEE Access14, 7443–7465 (2026),https:// ieeexplore.ieee.org/abstract/document/11343819

work page arXiv 2026
[22]

Systematic Reviews5(1), 210 (Dec 2016), https://doi.org/10.1186/s13643-016-0384-4

Ouzzani, M., Hammady, H., Fedorowicz, Z., Elmagarmid, A.: Rayyan—a web and mobile app for systematic reviews. Systematic Reviews5(1), 210 (Dec 2016), https://doi.org/10.1186/s13643-016-0384-4

work page doi:10.1186/s13643-016-0384-4 2016
[23]

Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The Impact of AI on Developer Productivity: Evidence from GitHub Copilot (Feb 2023),http://arxiv.org/abs/ 2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

work page 2025
[25]

In: 2025 IEEE International Conference on Electro Information Tech- nology (eIT)

Raheem, T., Hossain, G.: Agentic AI Systems: Opportunities, Challenges, and Trust- worthiness. In: 2025 IEEE International Conference on Electro Information Tech- nology (eIT). pp. 618–624 (May 2025),https://ieeexplore.ieee.org/abstract/ document/11103638

work page arXiv 2025
[26]

Rouzrokh, P., Khosravi, B., Rouzrokh, P., Shariatnia, M.: LatteReview: A Multi- Agent Framework for Systematic Review Automation Using Large Language Models (Oct 2025),http://arxiv.org/abs/2501.05468

work page arXiv 2025
[27]

Agentic AI: A Conceptual Taxonomy, Applications and Challenges

Sapkota, R., Roumeliotis, K.I., Karkee, M.: AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. Information Fusion126, 103599 (Feb 2026),http://arxiv.org/abs/2505.10468

work page arXiv 2026
[28]

Schneider, J.: Generative to Agentic AI: Survey, Conceptualization, and Challenges (Apr 2025),http://arxiv.org/abs/2504.18875

work page arXiv 2025
[29]

Tahat, A., Amundson, I., Hardin, D., Cofer, D.: Agree-dog copilot: a neuro-symbolic approachtoenhancedmodel-basedsystemsengineering.In:InternationalConference on Bridging the Gap between AI and Reality. pp. 117–137. Springer (2025)

work page 2025
[30]

In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Wallace, B.C., Small, K., Brodley, C.E., et al.: Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. pp. 819–824. IHI ’12, Association for Computing Machinery, New York, NY, USA (Jan 2012), https://dl.acm.org/doi/10.1145/2110363.2110464

work page doi:10.1145/2110363.2110464 2012
[31]

Frontiers Comput

Wang, L., Ma, C., Feng, X., et al.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6), 186345 (Mar 2024), https://doi.org/10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[32]

Wang, Y., Zhong, W., Huang, Y., et al.: Agents in Software Engineering: Survey, Landscape, and Vision (Sep 2024),http://arxiv.org/abs/2409.09030

work page arXiv 2024
[33]

Yetiştiren, B., Özsoy, I., Ayerdem, M., Tüzün, E.: Evaluating the Code Quality of AI- Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT (Oct 2023),http://arxiv.org/abs/2304.10778

work page arXiv 2023

[1] [1]

IEEE Access13, 18912–18936 (2025), https://ieeexplore.ieee.org/abstract/document/10849561

Acharya, D.B., Kuppan, K., Divya, B.: Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access13, 18912–18936 (2025), https://ieeexplore.ieee.org/abstract/document/10849561

work page arXiv 2025

[2] [2]

In: 2025 IEEE International Conference for Women in Innovation, Technology & Entrepreneurship (ICWITE)

Adapa, C., Anjana, A., Rahim, R., Victor, A.: A multi-agent ai framework for agile workflow automation, issue resolution, and developer performance evaluation. In: 2025 IEEE International Conference for Women in Innovation, Technology & Entrepreneurship (ICWITE). pp. 1–6. IEEE (2025)

work page 2025

[3] [3]

Akbar, M.A., Khan, A.A., Hamza, M., et al.: Agentic AI in Software Engineering: Practitioner Perspectives Across the Software Development Life Cycle (Sep 2025), https://papers.ssrn.com/abstract=5520159

work page 2025

[4] [4]

Future Internet17(9) (Sep 2025),https://www.mdpi.com/1999-5903/ 17/9/404

Bandi, A., Kongari, B., Naguru, R., et al.: The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges. Future Internet17(9) (Sep 2025),https://www.mdpi.com/1999-5903/ 17/9/404

work page 2025

[5] [5]

ArXiv , year=

Becker, J., Rush, N., Barnes, E., Rein, D.: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (Jul 2025),http://arxiv. org/abs/2507.09089

work page arXiv 2025

[6] [6]

Organization Studies29(3), 393–413 (Mar 2008),https:// doi.org/10.1177/0170840607088020

Denyer, D., Tranfield, D., van Aken, J.E.: Developing Design Propositions through Research Synthesis. Organization Studies29(3), 393–413 (Mar 2008),https:// doi.org/10.1177/0170840607088020

work page doi:10.1177/0170840607088020 2008

[7] [7]

SAGE (2017)

Gough, D., Thomas, J., Oliver, S.: An introduction to systematic reviews. SAGE (2017)

work page 2017

[8] [8]

Gusenbauer, M., Haddaway, N.R.: Which academic search systems are suitable for systematic reviews or meta-analyses? Research Synthesis Methods11(2), 181–217 (2020),https://onlinelibrary.wiley.com/doi/abs/10.1002/jrsm.1378

work page doi:10.1002/jrsm.1378 2020

[9] [9]

Hariharan, M., Arvapalli, S., Barma, S., Sheela, E.: Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration (Oct 2025), http://arxiv.org/abs/2510.10824

work page arXiv 2025

[10] [10]

He, J., Treude, C., Lo, D.: LLM-Based Multi-Agent Systems for Software En- gineering: Literature Review, Vision and the Road Ahead (Jul 2025), http: //arxiv.org/abs/2404.04834

work page arXiv 2025

[11] [11]

Array26, 100399 (Jul 2025), https://www.sciencedirect

Hosseini, S., Seilani, H.: The role of agentic AI in shaping a smart future: A systematic review. Array26, 100399 (Jul 2025), https://www.sciencedirect. com/science/article/pii/S2590005625000268

work page 2025

[12] [12]

Hu, Y., Cai, Y., Du, Y., et al.: Self-Evolving Multi-Agent Collaboration Networks for Software Development (Oct 2024),http://arxiv.org/abs/2410.16946

work page arXiv 2024

[13] [13]

Electronics14(15), 3008 (Jan 2025),https://www.mdpi.com/2079-9292/14/15/3008

Ji, X., Zhang, L., Zhang, W., et al.: LEMAD: LLM-Empowered Multi-Agent System for Anomaly Detection in Power Grid Services. Electronics14(15), 3008 (Jan 2025),https://www.mdpi.com/2079-9292/14/15/3008

work page 2025

[14] [14]

Jin, H., Sun, Z., Chen, H.: RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance (Oct 2024),http://arxiv.org/abs/2410.01242

work page arXiv 2024

[15] [15]

Khoee, A.G., Yu, Y., Feldt, R., et al.: GoNoGo: An Efficient LLM-based Multi- Agent System for Streamlining Automotive Software Release Decision-Making (Sep 2024),http://arxiv.org/abs/2408.09785

work page arXiv 2024

[16] [16]

Keele (2007)

Kitchenham, B., Charters, S., et al.: Guidelines for performing systematic literature reviews in software engineering. Keele (2007)

work page 2007

[17] [17]

Kohl, J., Kruse, O., Mostafa, Y., et al.: Automated structural testing of LLM-based agents: methods, framework, and case studies (Jan 2026),http://arxiv.org/abs/ 2601.18827 16 Spyridon Alvanakis Apostolou, Jan Bosch, and Helena Holmström Olsson

work page arXiv 2026

[18] [18]

Liu, J., Wang, K., Chen, Y., et al.: Large Language Model-Based Agents for Software Engineering: A Survey (Dec 2025),http://arxiv.org/abs/2409.02977

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

In: 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC)

Mavani, V.K.: Codebase Aware Generative Agents for the SDLC: Automating Documentation, Dependency Analysis and Test Generation. In: 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC). pp. 1–4 (Feb 2026), https://ieeexplore.ieee.org/document/11395666

work page arXiv 2026

[20] [20]

Murali, V., Maddila, C., Ahmad, I., et al.: AI-Assisted Code Authoring at Scale: Fine-Tuning, Deploying, and Mixed Methods Evaluation. Proc. ACM Softw. Eng. 1(FSE), 48:1066–48:1085 (Jul 2024),https://dl.acm.org/doi/10.1145/3643774

work page doi:10.1145/3643774 2024

[21] [21]

IEEE Access14, 7443–7465 (2026),https:// ieeexplore.ieee.org/abstract/document/11343819

Otoum, N., Elkhalili, N.: Methods and Techniques of Agentic Software Engineering: A Systematic Literature Review. IEEE Access14, 7443–7465 (2026),https:// ieeexplore.ieee.org/abstract/document/11343819

work page arXiv 2026

[22] [22]

Systematic Reviews5(1), 210 (Dec 2016), https://doi.org/10.1186/s13643-016-0384-4

Ouzzani, M., Hammady, H., Fedorowicz, Z., Elmagarmid, A.: Rayyan—a web and mobile app for systematic reviews. Systematic Reviews5(1), 210 (Dec 2016), https://doi.org/10.1186/s13643-016-0384-4

work page doi:10.1186/s13643-016-0384-4 2016

[23] [23]

Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The Impact of AI on Developer Productivity: Evidence from GitHub Copilot (Feb 2023),http://arxiv.org/abs/ 2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

Qin, Y., Wang, S., Lou, Y., et al.: SoapFL: A Standard Operating Procedure for LLM-Based Method-Level Fault Localization. IEEE Transactions on Software Engi- neering51(4), 1173–1187 (Apr 2025),https://ieeexplore.ieee.org/document/ 10891926

work page 2025

[25] [25]

In: 2025 IEEE International Conference on Electro Information Tech- nology (eIT)

Raheem, T., Hossain, G.: Agentic AI Systems: Opportunities, Challenges, and Trust- worthiness. In: 2025 IEEE International Conference on Electro Information Tech- nology (eIT). pp. 618–624 (May 2025),https://ieeexplore.ieee.org/abstract/ document/11103638

work page arXiv 2025

[26] [26]

Rouzrokh, P., Khosravi, B., Rouzrokh, P., Shariatnia, M.: LatteReview: A Multi- Agent Framework for Systematic Review Automation Using Large Language Models (Oct 2025),http://arxiv.org/abs/2501.05468

work page arXiv 2025

[27] [27]

Agentic AI: A Conceptual Taxonomy, Applications and Challenges

Sapkota, R., Roumeliotis, K.I., Karkee, M.: AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. Information Fusion126, 103599 (Feb 2026),http://arxiv.org/abs/2505.10468

work page arXiv 2026

[28] [28]

Schneider, J.: Generative to Agentic AI: Survey, Conceptualization, and Challenges (Apr 2025),http://arxiv.org/abs/2504.18875

work page arXiv 2025

[29] [29]

Tahat, A., Amundson, I., Hardin, D., Cofer, D.: Agree-dog copilot: a neuro-symbolic approachtoenhancedmodel-basedsystemsengineering.In:InternationalConference on Bridging the Gap between AI and Reality. pp. 117–137. Springer (2025)

work page 2025

[30] [30]

In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Wallace, B.C., Small, K., Brodley, C.E., et al.: Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. pp. 819–824. IHI ’12, Association for Computing Machinery, New York, NY, USA (Jan 2012), https://dl.acm.org/doi/10.1145/2110363.2110464

work page doi:10.1145/2110363.2110464 2012

[31] [31]

Frontiers Comput

Wang, L., Ma, C., Feng, X., et al.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6), 186345 (Mar 2024), https://doi.org/10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024

[32] [32]

Wang, Y., Zhong, W., Huang, Y., et al.: Agents in Software Engineering: Survey, Landscape, and Vision (Sep 2024),http://arxiv.org/abs/2409.09030

work page arXiv 2024

[33] [33]

Yetiştiren, B., Özsoy, I., Ayerdem, M., Tüzün, E.: Evaluating the Code Quality of AI- Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT (Oct 2023),http://arxiv.org/abs/2304.10778

work page arXiv 2023