pith. sign in

arxiv: 2508.15411 · v3 · submitted 2025-08-21 · 💻 cs.SE · cs.CL· cs.LG· cs.MA

Foundational Design Principles and Patterns for Building Robust and Adaptive GenAI-Native Systems

Pith reviewed 2026-05-18 22:07 UTC · model grok-4.3

classification 💻 cs.SE cs.CLcs.LGcs.MA
keywords GenAI-native systemssoftware engineering principlesdesign patternsAI reliabilityself-evolving systemsarchitectural patternsrobust AI systemshybrid AI engineering
0
0 comments X

The pith

GenAI systems achieve robustness by merging their capabilities with classic software engineering principles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that generative AI can be made reliable and efficient only if systems are built as GenAI-native hybrids from the outset, rather than bolting AI onto conventional code. This requires combining AI's flexible cognition with structured engineering practices so that unpredictability is contained and adaptation becomes a built-in property. Five pillars—reliability, excellence, evolvability, self-reliance, and assurance—supply the goals, while patterns such as GenAI-native cells, organic substrates, and programmable routers supply the concrete mechanisms. If the approach works, the resulting systems would maintain consistent performance, improve themselves with minimal external help, and consume resources more predictably. The authors also sketch the supporting software stack and note the wider technical, adoption, economic, and legal consequences that would follow successful adoption.

Core claim

Future GenAI-native systems should integrate GenAI's cognitive capabilities with traditional software engineering principles to create robust, adaptive, and efficient systems. Foundational design principles are organized around five key pillars—reliability, excellence, evolvability, self-reliance, and assurance—while architectural patterns such as GenAI-native cells, organic substrates, and programmable routers provide the structure for resilient, self-evolving behavior. The paper also outlines the main ingredients of a GenAI-native software stack and examines impacts across technical, user, economic, and legal dimensions.

What carries the argument

Five pillars (reliability, excellence, evolvability, self-reliance, assurance) together with the architectural patterns of GenAI-native cells, organic substrates, and programmable routers that together embed AI capabilities inside disciplined, modular engineering structures.

If this is right

  • Systems maintain reliable output even when underlying AI models behave inconsistently.
  • Self-reliance mechanisms reduce the need for constant human oversight and retraining.
  • Evolvability allows the system to incorporate new capabilities without full redesign.
  • Assurance layers make safety and compliance easier to demonstrate and audit.
  • Overall efficiency improves because internal routing and substrates minimize wasted computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The cell-and-substrate model might map directly onto existing microservice or agent frameworks, turning each AI component into a living unit that grows or contracts.
  • Developers could create measurable benchmarks for each pillar to track whether reliability or evolvability actually increases after adoption.
  • The same patterns could be tried first in narrow domains such as automated testing or customer-support agents to gather early evidence before wider rollout.
  • Legal and regulatory discussions would need concrete metrics for what counts as sufficient assurance in a self-evolving system.

Load-bearing premise

That the five pillars and the patterns of GenAI-native cells, organic substrates, and programmable routers will actually reduce the unpredictability and inefficiency of generative AI when put into practice.

What would settle it

A working prototype built according to the pillars and patterns that shows no measurable improvement in error consistency, adaptation speed, or resource use compared with a conventional GenAI system would disprove the central claim.

Figures

Figures reproduced from arXiv: 2508.15411 by Frederik Vandeputte.

Figure 1
Figure 1. Figure 1: Example probability density functions of output quality or confidence for outputs produced by typical GenAI and traditional logic based solutions across the full range of acceptable inputs. 5.2 Design Perspectives and Analogies To better motivate these paradigms and patterns, we first briefly draw parallels with two historical transformations and reflect on how we organize ourselves to cope with imper￾fect… view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual view of a self-improving hybrid GenAI-native asset: fast routine traditional processing & interfacing, and slow occasional (semi-)cognitive processing (cfr. System 1 and System 2 thinking [28]), with gradual optimization loops. and slack time to improve overall outcomes or recover from anticipated challenges. For instance, downstream assets might opt for more costly or slower methods to accurate… view at source ↗
Figure 3
Figure 3. Figure 3: GenAI-native blueprint comprising several patterns for enhancing resilience to quality and uncertainty variations. Adopt organic lifecycle management. The lifecycle man￾agement of GenAI-native systems should natively incor￾porate evolvability and self-reliance. Next to traditional, planned offline application enhancements and optimizations, it is expected that the traditional boundaries between soft￾ware d… view at source ↗
Figure 4
Figure 4. Figure 4: This figure illustrates the inner workings of a GenAI-native service, knowledge, or cyberphysical cell. 7.6 Architectural and Operational Patterns GenAI-native cell. We propose a key architectural and or￾ganizational pattern for designing and implementing GenAI￾native assets, termed the GenAI-native cell, as depicted in [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: illustrates the five design pillars and their main direct interdependencies. For each pillar, we also show the relevant subaspects [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of an organic substrate within a GenAI-native system, incorporating multiple GenAI-native design patterns to enhance adaptability, resilience, and functionality. D GenAI-native Architectural Patterns: the Organic Substrate [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of the inherent complexity in reliably and flexibly parsing contact information. The input modality, formatting, and critical information availability can vary substantially. A GenAI-native implementation of this function should ideally balance the flexibility needed to handle a wide range of inputs with the downstream reliability and resilience, even when essential information is missing or i… view at source ↗
Figure 8
Figure 8. Figure 8: Illustration of a very simple web application, where the user, e.g., via chat, can ask the application for additional functionality, both related to the look-and-feel as well as core functionality. Some of these requests may be volatile, whereas others may need to be made persistent, either in the frontend, backend, or both. At the service level, an example GenAI-native application could be a simple TODO l… view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of a simple upgrade scenario, where a planned feature upgrade is announced via some communication channel to other services. These latter services decide whether this announcement is relevant, which possibly may trigger subsequent upgrades [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Illustration of a simple upgrade scenario, where a service detects inefficient use of its services, makes changes to its API and functionality, which subsequently is announced and propagated similarly to the baseline scenario. Future GenAI-native systems may support more seamless and continual software upgrade scenarios compared to traditional software systems. In [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
read the original abstract

Generative AI (GenAI) has emerged as a transformative technology, demonstrating remarkable capabilities across diverse application domains. However, GenAI faces several major challenges in developing reliable and efficient GenAI-empowered systems due to its unpredictability and inefficiency. This paper advocates for a paradigm shift: future GenAI-native systems should integrate GenAI's cognitive capabilities with traditional software engineering principles to create robust, adaptive, and efficient systems. We introduce foundational GenAI-native design principles centered around five key pillars -- reliability, excellence, evolvability, self-reliance, and assurance -- and propose architectural patterns such as GenAI-native cells, organic substrates, and programmable routers to guide the creation of resilient and self-evolving systems. Additionally, we outline the key ingredients of a GenAI-native software stack and discuss the impact of these systems from technical, user adoption, economic, and legal perspectives, underscoring the need for further validation and experimentation. Our work aims to inspire future research and encourage relevant communities to implement and refine this conceptual framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper advocates for integrating GenAI cognitive capabilities with traditional software engineering principles to address unpredictability and inefficiency in GenAI-empowered systems. It proposes five design pillars (reliability, excellence, evolvability, self-reliance, assurance) and three architectural patterns (GenAI-native cells, organic substrates, programmable routers), outlines a GenAI-native software stack, and discusses technical, adoption, economic, and legal impacts while calling for further validation.

Significance. If the framework proves effective upon validation, it could offer a useful high-level structure for designing robust GenAI-native systems and help bridge AI capabilities with established software engineering practices, potentially guiding future research in the area.

major comments (2)
  1. [Foundational Design Principles] The central advocacy for the five pillars rests on their ability to mitigate unpredictability, yet the pillars section defines them internally without reference to established SE metrics, benchmarks, or prior literature on AI reliability (e.g., no citations to work on verifiable AI or adaptive systems).
  2. [Architectural Patterns] The architectural patterns section introduces GenAI-native cells, organic substrates, and programmable routers as solutions but provides no concrete sketches, pseudocode, or illustrative scenarios showing how they would reduce inefficiency or enable self-evolution in practice.
minor comments (2)
  1. [Abstract and Discussion] The abstract states the need for validation but the discussion of impacts could more explicitly tie back to how the pillars and patterns would be evaluated.
  2. [Throughout] Notation for the proposed patterns is introduced without a summary table or diagram, which would aid clarity for readers unfamiliar with the terminology.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight opportunities to strengthen connections to prior literature and to illustrate the proposed patterns more concretely. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Foundational Design Principles] The central advocacy for the five pillars rests on their ability to mitigate unpredictability, yet the pillars section defines them internally without reference to established SE metrics, benchmarks, or prior literature on AI reliability (e.g., no citations to work on verifiable AI or adaptive systems).

    Authors: We agree that explicit links to established software engineering metrics and prior work on verifiable AI and adaptive systems would improve the grounding of the five pillars. Although the pillars were synthesized from the specific challenges of unpredictability and inefficiency in GenAI-empowered systems, we will revise the relevant section to incorporate citations to relevant literature on AI reliability, verifiable AI, and adaptive systems, along with references to standard SE quality metrics where applicable. revision: yes

  2. Referee: [Architectural Patterns] The architectural patterns section introduces GenAI-native cells, organic substrates, and programmable routers as solutions but provides no concrete sketches, pseudocode, or illustrative scenarios showing how they would reduce inefficiency or enable self-evolution in practice.

    Authors: The patterns are presented at a conceptual level to establish foundational ideas rather than as fully specified implementations. We acknowledge that the absence of illustrative scenarios limits the ability to demonstrate practical impact. In the revision we will add concise, high-level scenarios for each pattern that illustrate potential mechanisms for reducing inefficiency and supporting self-evolution, while clarifying that detailed pseudocode or empirical validation lies beyond the scope of this conceptual framework and is left for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity in conceptual framework proposal

full rationale

The paper is explicitly a conceptual position paper that introduces five design pillars (reliability, excellence, evolvability, self-reliance, assurance) and three architectural patterns (GenAI-native cells, organic substrates, programmable routers) as a proposed high-level framework. It contains no equations, derivations, empirical predictions, fitted parameters, or self-citations that reduce any central claim to its own inputs by construction. The text states the need for further validation and experimentation rather than asserting that the proposals have been shown to work, keeping the normative suggestions self-contained as original design ideas without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The framework assumes traditional software engineering principles can be directly adapted to resolve GenAI-specific issues and introduces new architectural concepts without prior empirical support.

axioms (1)
  • domain assumption GenAI systems face major challenges due to unpredictability and inefficiency that integration with SE principles can resolve.
    Stated as motivation in the abstract for the paradigm shift.
invented entities (3)
  • GenAI-native cells no independent evidence
    purpose: Basic building blocks for resilient and self-evolving systems
    New architectural pattern proposed without independent evidence or prior literature grounding.
  • organic substrates no independent evidence
    purpose: Adaptive foundation for system growth and evolution
    Invented concept to support evolvability pillar.
  • programmable routers no independent evidence
    purpose: Mechanisms to direct and manage information flow in GenAI systems
    Proposed pattern for self-reliance and assurance.

pith-pipeline@v0.9.0 · 5709 in / 1163 out tokens · 36265 ms · 2026-05-18T22:07:42.836024+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 9 internal anchors

  1. [1]

    A2A Protocol. 2025. A2A Protocol – Agent-to-Agent Communication Standard.https://a2aprotocol.ai/Accessed: 2025-06

  2. [2]

    Amazon Web Services. 2025. Amazon Bedrock Agents.https://aws. amazon.com/bedrock/agents/Accessed: 2025-06

  3. [3]

    Anthropic. 2024. Anthropic introduces the Model Context Protocol. https://www.anthropic.com/news/model-context-protocolAccessed: 2025-04

  4. [4]

    Anthropic. 2025. Claude Code: Your code’s new collaborator.https: //www.anthropic.com/claude-codeAccessed: 2025-06

  5. [5]

    Marcus Arvan. 2024. ‘Interpretability’and ‘alignment’are fool’s er- rands: a proof that controlling misaligned large language models is the best anyone can hope for.AI & SOCIETY(2024), 1–16

  6. [6]

    Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, and Wai Lam

  7. [7]

    arXiv preprint arXiv:2406.10248(2024)

    On the worst prompt performance of large language models. arXiv preprint arXiv:2406.10248(2024)

  8. [8]

    Yunmo Chen, Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin, Jason Eisner, and Benjamin Van Durme. 2024. Learning to retrieve iteratively for in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 7156–7168

  9. [9]

    Yiqun Chen, Lingyong Yan, Weiwei Sun, Xinyu Ma, Yi Zhang, Shuaiqiang Wang, Dawei Yin, Yiming Yang, and Jiaxin Mao. 2025. Improving Retrieval-Augmented Generation through Multi-Agent Re- inforcement Learning.arXiv preprint arXiv:2501.15228(2025)

  10. [10]

    Continue.dev. 2025. Continue.dev: The Open-Source AI Code Assis- tant.https://www.continue.dev/Accessed: 2025-06

  11. [11]

    Cursor. 2025. Cursor: The AI Code Editor.https://www.cursor.sh Accessed: 2025-06

  12. [12]

    Emilia David. 2025. ‘Sandbox first’: Andrew Ng’s blueprint for acceler- ating enterprise AI innovation.https://venturebeat.com/ai/sandbox- first-andrew-ngs-blueprint-for-accelerating-enterprise-ai- innovation/Accessed: 2025-06

  13. [13]

    European Parliament and Council. 2024. Regulation (EU) 2024/1689 – Artificial Intelligence Act. Official Journal of the European Union, OJ L 1689, 12 July 2024

  14. [14]

    Martin Fowler. 2024. Patterns of Generative AI.https://martinfowler. com/articles/gen-ai-patterns/Accessed: 2025-06

  15. [15]

    Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. PAL: Program-aided language models. InInternational Conference on Machine Learning. PMLR, 10764–10799

  16. [16]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval- augmented generation for large language models: A survey.arXiv preprint arXiv:2312.109972 (2023)

  17. [17]

    GitHub. 2021. GitHub Copilot: Your AI Pair Programmer. https://github.blog/2021-06-29-introducing-github-copilot-ai- pair-programmer/Accessed: 2025-06

  18. [18]

    Google Cloud. [n. d.]. Vertex AI Agent Builder / Agent Engine

  19. [19]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforce- ment learning.arXiv preprint arXiv:2501.12948(2025)

  20. [20]

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large lan- guage model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680(2024)

  21. [21]

    Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, and Chaoyang He. 2024. LLM multi-agent systems: Challenges and open problems.arXiv preprint arXiv:2402.03578(2024)

  22. [22]

    Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. 2023. Metagpt: Meta programming for multi-agent col- laborative framework.arXiv preprint arXiv:2308.003523, 4 (2023), 6

  23. [23]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology 33, 8 (2024), 1–79

  24. [24]

    Sheryl Hsu, Omar Khattab, Chelsea Finn, and Archit Sharma. 2024. Grounding by trying: Llms with reinforcement learning-enhanced retrieval.arXiv preprint arXiv:2410.23214(2024)

  25. [25]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transac- tions on Information Systems43, 2 (2025), 1–55

  26. [26]

    James Huckle and Sean Williams. 2025. Easy Problems that LLMs Get Wrong. InFuture of Information and Communication Conference. Springer, 313–332

  27. [27]

    1986.Kaizen: The Key to Japan’s Competitive Success

    Masaaki Imai. 1986.Kaizen: The Key to Japan’s Competitive Success. McGraw-Hill, New York

  28. [28]

    Anubha Kabra, Sanketh Rangreji, Yash Mathur, Aman Madaan, Emmy Liu, and Graham Neubig. 2023. Program-aided reasoners (better) know what they know.arXiv preprint arXiv:2311.09553(2023)

  29. [29]

    2011.Thinking, fast and slow

    Daniel Kahneman. 2011.Thinking, fast and slow. macmillan

  30. [30]

    Vivek Ladsariya. 2025. The Future Is Agent-to-Agent: A Call for Founders.https://www.psl.com/feed-posts/the-future-is-agent-to- agent-a-call-for-foundersPublished March 31, 2025; accessed 2025- 07-09

  31. [31]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al . 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems33 (2020), 9459–9474

  32. [32]

    Lexology. 2025. Agentic AI and EU Legal Considerations. https://www.lexology.com/library/detail.aspx?g=0695e657-7d5a- 49e5-9e44-9da10418dc7aAccessed: 2025-06

  33. [33]

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2023. Encouraging divergent thinking in large language models through multi-agent debate.arXiv preprint arXiv:2305.19118(2023)

  34. [34]

    Geoffrey Litt. 2023. LLMs and End-User Programming.https://www. geoffreylitt.com/2023/03/25/llm-end-user-programmingAccessed: 2025-04

  35. [35]

    Haoyan Luo and Lucia Specia. 2024. From understanding to utilization: A survey on explainability for large language models.arXiv preprint arXiv:2401.12874(2024)

  36. [36]

    Samuele Marro, Emanuele La Malfa, Jesse Wright, Guohao Li, Nigel Shadbolt, Michael Wooldridge, and Philip Torr. 2024. A Scalable Communication Protocol for Networks of Large Language Models. arXiv:2410.11905 [cs.AI]https://arxiv.org/abs/2410.11905

  37. [37]

    Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. 2023. Gaia: a benchmark for general ai assistants. Foundational Design Principles and Patterns for Building Robust and Adaptive GenAI-Native Systems InThe Twelfth International Conference on Learning Representations

  38. [38]

    Microsoft Azure. 2025. Azure AI Foundry Agent Service.https: //learn.microsoft.com/azure/ai-foundry/agents/overviewAccessed: 2025-06

  39. [39]

    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al . 2021. Webgpt: Browser-assisted question- answering with human feedback.arXiv preprint arXiv:2112.09332 (2021)

  40. [40]

    John F. Nash. 1951. Non-cooperative games.Annals of Mathematics 54, 2 (1951), 286–295

  41. [41]

    Sergii Netesanyi. 2024. Top 10 Cloud-Native Best Practices for Ap- plication Development.https://www.n-ix.com/cloud-native-best- practices/Accessed: 2025-04

  42. [42]

    Casey Newton. 2025. The AI browser wars are about to begin.https:// www.platformer.news/ai-web-browsers-openai-perplexity-opera/Ac- cessed: 2025-06

  43. [43]

    Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, et al. 2024. Dynasaur: Large language agents beyond predefined actions.arXiv preprint arXiv:2411.01747(2024)

  44. [44]

    OpenAI. 2024. Learning to Reason with LLMs.https://openai.com/ index/learning-to-reason-with-llms/Accessed: 2025-04

  45. [45]

    OpenAI. 2025. Introducing Codex: AI Software Engineering Agent. https://openai.com/codex/Accessed: 2025-06

  46. [46]

    OpenAI. 2025. Introducing Operator.https://openai.com/index/ introducing-operator/Accessed: 2025-04

  47. [47]

    OWASP Foundation. 2025. OWASP Generative AI Security Project. https://genai.owasp.org/Accessed: 2025-04

  48. [48]

    Thomas Pyzdek and Paul A. Keller. 2023.The Six Sigma Handbook: A Complete Guide for Green Belts, Black Belts, and Managers at All Levels (6 ed.). McGraw-Hill Education, New York

  49. [49]

    Hongzhou Rao, Yanjie Zhao, Xinyi Hou, Shenao Wang, and Haoyu Wang. 2025. Software Engineering for Large Lan- guage Models: Research Status, Challenges and the Road Ahead. arXiv:2506.23762 [cs.SE]https://arxiv.org/abs/2506.23762

  50. [50]

    Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927(2024)

  51. [51]

    Roumeliotis, and Manoj Karkee

    Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee

  52. [52]

    Vibe coding vs

    Vibe Coding vs. Agentic Coding: Fundamentals and Prac- tical Implications of Agentic AI. arXiv:2505.19443 [cs.SE]https: //arxiv.org/abs/2505.19443

  53. [53]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems36 (2023), 68539–68551

  54. [54]

    Amazon Web Services. 2025. What is Cloud Native?https://aws. amazon.com/what-is/cloud-native/Accessed: 2025-04

  55. [55]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems36 (2023), 8634–8652

  56. [56]

    StackBlitz. 2025. Bolt.new: AI-Powered Web App Builder.https: //bolt.new/Accessed: 2025-06

  57. [57]

    Maira Ladeira Tanke, Mark Roy, Navneet Sabbineni, and Mon- ica Sunkara. 2024. Best practices for building robust gener- ative AI applications with Amazon Bedrock Agents – Part 1. https://aws.amazon.com/blogs/machine-learning/best-practices- for-building-robust-generative-ai-applications-with-amazon- bedrock-agents-part-1/Accessed: 2025-06

  58. [58]

    Maira Ladeira Tanke, Mark Roy, Navneet Sabbineni, and Mon- ica Sunkara. 2024. Best practices for building robust gener- ative AI applications with Amazon Bedrock Agents – Part 2. https://aws.amazon.com/blogs/machine-learning/best-practices- for-building-robust-generative-ai-applications-with-amazon- bedrock-agents-part-2/Accessed: 2025-06

  59. [59]

    Co Tran, Salman Paracha, Adil Hafeez, and Shuguang Chen. 2025. Arch-Router: Aligning LLM Routing with Human Preferences.arXiv preprint arXiv:2506.16655(2025)

  60. [60]

    Teng Wang, Wing-Yin Yu, Ruifeng She, Wenhan Yang, Taijie Chen, and Jianping Zhang. 2024. Leveraging large language models for solving rare mip challenges.arXiv preprint arXiv:2409.04464(2024)

  61. [61]

    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. 2024. Executable Code Actions Elicit Better LLM Agents. InForty-first International Conference on Machine Learning. https://openreview.net/forum?id=jJ9BoXAfFa

  62. [62]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompt- ing elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  63. [63]

    Scott Wu and Cognition Labs. 2024. Introducing Devin, the First AI Software Engineer.https://cognition.ai/blog/introducing-devin

  64. [64]

    Kangwei Xu, Ruidi Qiu, Zhuorui Zhao, Grace Li Zhang, Ulf Schlicht- mann, and Bing Li. 2024. LLM-Aided Efficient Hardware Design Automation.arXiv preprint arXiv:2410.18582(2024)

  65. [65]

    Ruoxi Xu, Hongyu Lin, Xianpei Han, Jia Zheng, Weixiang Zhou, Le Sun, and Yingfei Sun. 2025. Large Language Models Often Say One Thing and Do Another.arXiv preprint arXiv:2503.07003(2025)

  66. [66]

    Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023. A survey on large language models for software engineering.arXiv preprint arXiv:2312.15223(2023)

  67. [67]

    Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et al. 2023. Siren’s song in the AI ocean: a survey on hallucination in large language models.arXiv preprint arXiv:2309.01219(2023)

  68. [68]

    Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad, and Jun Wang. 2023. How do large language models capture the ever-changing world knowledge? a review of recent advances.arXiv preprint arXiv:2310.07343(2023)

  69. [69]

    Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, and Mengnan Du. 2024. Explainability for large language models: A survey.ACM Transactions on Intelligent Systems and Technology15, 2 (2024), 1–38

  70. [70]

    Yueheng Zhu, Chao Liu, Xuan He, Xiaoxue Ren, Zhongxin Liu, Ruwei Pan, and Hongyu Zhang. 2025. AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation.arXiv preprint arXiv:2504.04220(2025). Frederik Vandeputte A GenAI-native Design Pillars Overview Figure 5 illustrates the five design pillars and their main direct interd...