Recognition: unknown
More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems
Pith reviewed 2026-05-10 04:12 UTC · model grok-4.3
The pith
AI-native software ecosystems must be analyzed as complex adaptive systems because their failures emerge from agent interactions rather than individual components.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that AI-native software ecosystems display emergent behaviors characteristic of complex adaptive systems, including properties that appear only at the system level such as increasing disorder from interactions and chain reactions across components. It distinguishes these from simpler architectures like microservices by showing how agent autonomy creates novel dynamics. The core advance is a set of seven falsifiable propositions that tie complex systems concepts to observable software changes, along with a way to measure emergence using state variables and coarse-graining functions.
What carries the argument
The mapping of Holland's six properties of complex adaptive systems to the observable behaviors in AI software ecosystems, supported by a measurement framework of micro-level state variables and coarse-graining functions.
If this is right
- Governance of AI systems must shift to ecosystem-level monitoring as the primary mechanism rather than component-level controls.
- Existing laws of software evolution will require extension or replacement in domains where autonomous agents interact.
- Failures such as cascade breakdowns can be anticipated from interaction patterns instead of isolated component faults.
- New metrics for architectural entropy and comprehension debt become necessary for ongoing management of these systems.
Where Pith is reading between the lines
- The same interaction-driven emergence patterns could appear in other autonomous multi-agent setups such as networks of AI services or coordinated robotic systems.
- Practical tools could be built to automatically flag when a software system crosses into complex adaptive behavior through rising coarse-grained measures.
- If the seven propositions are refuted, current engineering practices would be shown to scale to AI agents with only modest adjustments.
Load-bearing premise
The assumption that traditional software engineering theories fall short in explaining and managing the specific failures observed in multi-agent AI systems, requiring instead the complete toolkit of complex adaptive systems theory.
What would settle it
A controlled study of a multi-agent AI software ecosystem in which standard software evolution models alone accurately predict and prevent all observed cascade failures and entropy growth without invoking system-level emergence or new measurement constructs.
Figures
read the original abstract
Software engineering faces a fundamental challenge: multi-agent AI systems fail in ways that defy explanation by traditional theories. While individual agents perform correctly, their interactions degrade entire ecosystems, revealing a gap in our understanding of software evolution. This paper argues that AI-native software ecosystems must be studied as complex adaptive systems (CAS), where emergent properties like architectural entropy, cascade failures, and comprehension debt arise not from individual components, but from their interactions. We map Holland's six CAS properties onto observable ecosystem dynamics, distinguishing these systems from microservices or open-source networks. To measure causal emergence, we define micro-level state variables, coarse-graining functions, and a tractable measurement framework. Seven falsifiable propositions link CAS theory to software evolution, challenging or extending Lehman's laws where agent-level assumptions fail. If confirmed, these findings would demand a radical shift: ecosystem-level monitoring as the primary governance mechanism for AI-native systems. If refuted, existing theories may only need incremental updates. Either way, this work forces us to ask: Can software engineering's core assumptions survive the age of autonomous agents?
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that AI-native software ecosystems, driven by multi-agent interactions, exhibit emergent degradations (architectural entropy, cascade failures, comprehension debt) unexplained by traditional software engineering theories such as Lehman's laws. It maps Holland's six CAS properties to observable ecosystem dynamics, distinguishes these systems from microservices or open-source networks, defines a measurement framework via micro-level state variables and coarse-graining functions to quantify causal emergence, and advances seven falsifiable propositions that challenge or extend existing laws where agent-level assumptions break down. The work concludes that confirmation would necessitate ecosystem-level monitoring as primary governance, while refutation would permit only incremental theory updates.
Significance. If the mapping and propositions receive independent empirical grounding, the paper could meaningfully extend software evolution theory by supplying a falsifiable bridge to complex adaptive systems, potentially guiding new monitoring practices for autonomous-agent ecosystems. The explicit framing that either outcome (radical shift or incremental update) is informative is a strength of the conceptual approach. No machine-checked proofs, reproducible code, or parameter-free derivations are present, so the significance remains conditional on future validation of the proposed framework.
major comments (3)
- [Abstract] Abstract: The central assertion that multi-agent AI failures 'defy explanation by traditional theories' and require the full CAS apparatus is load-bearing, yet no concrete counter-example is supplied showing a scenario (e.g., a specific cascade) where dependency-graph models or Lehman's laws demonstrably fail to predict the observed degradation while the CAS mapping succeeds.
- [Measurement framework] Measurement framework section: The definitions of micro-level state variables, coarse-graining functions, and the tractable measurement framework for causal emergence are presented without any worked example, pseudocode, or demonstration that the resulting quantities are computable from observable software artifacts; this leaves the claim of explanatory gain over existing SE metrics untested.
- [Propositions] Propositions section: The seven falsifiable propositions are stated as linking CAS properties to software evolution and challenging Lehman's laws, but the manuscript supplies neither operationalizations (e.g., how 'architectural entropy' would be measured in a concrete codebase) nor suggested empirical tests, rendering the falsifiability claim unsupported.
minor comments (2)
- [Terminology] The three invented entities (architectural entropy, comprehension debt, cascade failures) are introduced without citations to prior SE literature on analogous concepts such as technical debt or architectural smells; a short differentiation paragraph would clarify novelty.
- [Abstract] The abstract references 'Holland's six CAS properties' without enumerating them; adding the list (aggregation, nonlinearity, flows, diversity, tags, internal models) would aid readers outside the CAS community.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments correctly identify areas where the conceptual nature of the manuscript would benefit from additional illustrative material to strengthen the claims. We address each major comment below and will incorporate revisions to enhance clarity and support for the proposed framework.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central assertion that multi-agent AI failures 'defy explanation by traditional theories' and require the full CAS apparatus is load-bearing, yet no concrete counter-example is supplied showing a scenario (e.g., a specific cascade) where dependency-graph models or Lehman's laws demonstrably fail to predict the observed degradation while the CAS mapping succeeds.
Authors: We agree that an explicit illustrative counter-example would make the load-bearing claim more accessible. The manuscript distinguishes AI-native ecosystems through multi-agent adaptive interactions that produce non-reducible emergent degradations, unlike the static or non-adaptive assumptions in dependency graphs and Lehman's laws. To address this directly, we will revise the abstract and add a concise hypothetical scenario in the introduction (e.g., a cascade where agents autonomously refactor interdependent modules, generating architectural entropy unpredictable from initial dependency models). This will show where the CAS mapping supplies explanatory gain without altering the theoretical focus. revision: yes
-
Referee: [Measurement framework] Measurement framework section: The definitions of micro-level state variables, coarse-graining functions, and the tractable measurement framework for causal emergence are presented without any worked example, pseudocode, or demonstration that the resulting quantities are computable from observable software artifacts; this leaves the claim of explanatory gain over existing SE metrics untested.
Authors: The framework is presented at a conceptual level to bridge CAS theory with software artifacts. We acknowledge that the lack of a concrete demonstration leaves the computability and gain over existing metrics (such as standard complexity measures) unillustrated. In revision, we will insert a new subsection containing pseudocode for deriving coarse-graining functions from observable data (e.g., agent logs and dependency graphs) and a minimal worked example computing an emergence metric, thereby demonstrating tractability and potential explanatory advantage. revision: yes
-
Referee: [Propositions] Propositions section: The seven falsifiable propositions are stated as linking CAS properties to software evolution and challenging Lehman's laws, but the manuscript supplies neither operationalizations (e.g., how 'architectural entropy' would be measured in a concrete codebase) nor suggested empirical tests, rendering the falsifiability claim unsupported.
Authors: The propositions are advanced as hypotheses whose falsifiability rests on future empirical validation, consistent with the paper's theoretical orientation. We accept that explicit operationalizations and test outlines would better substantiate this. We will revise the propositions section to include operational definitions (e.g., architectural entropy as Shannon entropy over weighted dependency graphs incorporating agent modification rates) and brief suggested empirical designs, such as longitudinal analysis of AI-agent code repositories, to make the falsifiability claim more concrete. revision: yes
Circularity Check
No circularity: CAS mapping and propositions are proposed extensions, not reductions to inputs
full rationale
The paper asserts that multi-agent AI failures defy traditional SE theories like Lehman's laws, then proposes an external mapping of Holland's six CAS properties to define micro-level state variables, coarse-graining functions, and seven falsifiable propositions as a new framework. No derivation, equation, or prediction is shown that reduces by construction to a fitted parameter, self-definition, or self-citation chain; the propositions are explicitly framed as testable challenges to existing models rather than tautologies. The argument remains self-contained against external benchmarks, with any confirmation or refutation left to future empirical work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Holland's six properties of complex adaptive systems apply directly to observable dynamics in AI-native software ecosystems
- domain assumption Lehman's laws of software evolution require extension or challenge when agent-level assumptions fail in multi-agent AI systems
invented entities (3)
-
architectural entropy
no independent evidence
-
comprehension debt
no independent evidence
-
cascade failures
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Philip W. Anderson. 1972. More is different.Science177, 4047 (1972), 393–396. https://doi.org/10.1126/science.177. 4047.393
-
[2]
Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proceedings of the 35th International Conference on Software Engineering (ICSE ’13). IEEE Press, 712–721. https: //doi.org/10.1109/ICSE.2013.6606617
-
[3]
Grounded Copilot: How Programmers Interact with Code-Generating Models
Sarah Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7, OOPSLA1, Article 78 (2023). https: //doi.org/10.1145/3586030
-
[4]
Gordon Baxter and Ian Sommerville. 2011. Socio-technical Systems: From Design Methods to Systems Engineering. Interacting with Computers23, 1 (2011), 4–17. https://doi.org/10.1016/j.intcom.2010.07.003
-
[5]
Mehmet Cemri et al. 2025. Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657 [cs.AI] https://arxiv.org/abs/ 2503.13657
work page internal anchor Pith review arXiv 2025
-
[6]
Aaron Clauset, Cosma R. Shalizi, and M. E. J. Newman. 2009. Power-law distributions in empirical data.SIAM Rev.51, 4 (2009), 661–703. https://doi.org/10.1137/070710111
-
[7]
2024.2024 United States Data Center Energy Usage Report
DevOps Research and Assessment. 2024.2024 State of DevOps Report. Industry research. https://cloud.google.com/ devops/state-of-devops Google Cloud, Puppet, Lacework collaboration
-
[8]
Dijkstra
Edsger W. Dijkstra. 1972. Notes on Structured Programming. InStructured Programming, Ole-Johan Dahl, Edsger W. Dijkstra, and C. A. R. Hoare (Eds.). Academic Press, London, 1–82
1972
-
[9]
Carlos Gershenson. 2025. Self-organizing systems: what, how, and why?npj Complexity2, 10 (2025). https: //doi.org/10.1038/s44260-025-00031-5
-
[10]
2024.The State of Code Quality: 2024-2025 Report
GitClear. 2024.The State of Code Quality: 2024-2025 Report. Industry report. https://www.gitclear.com/coding_on_ copilot_data_shows_ais_downward_pressure_on_code_quality Analysis of 211 million lines of code
2024
- [11]
-
[12]
C. A. R. Hoare. 1969. An Axiomatic Basis for Computer Programming.Commun. ACM12, 10 (1969), 576–580. https://doi.org/10.1145/363235.363259
- [13]
-
[14]
Erik P. Hoel, Larissa Albantakis, and Giulio Tononi. 2013. Quantifying causal emergence shows that macro can beat micro.Proceedings of the National Academy of Sciences110, 49 (2013), 19790–19795. https://doi.org/10.1073/pnas. 1314922110
-
[15]
John H. Holland. 1995.Hidden Order: How Adaptation Builds Complexity. Addison-Wesley, Reading, MA
1995
-
[16]
John H. Holland. 1998.Emergence: From Chaos to Order. Oxford University Press, New York, NY
1998
-
[17]
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, et al. 2024. MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework. InProceedings of the Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VtmBAGCN7o
2024
-
[18]
Jackson, Bogdan Vasilescu, Paul Ralph, Daniel Russo, et al
K. Jackson, Bogdan Vasilescu, Paul Ralph, Daniel Russo, et al. 2025. The Impact of Generative AI on Creativity in Software Development: A Research Agenda.ACM Transactions on Software Engineering and Methodology34, 5 (2025). , Vol. 1, No. 1, Article . Publication date: April 2026. 20 Daniel Russo https://doi.org/10.1145/3712004
-
[19]
Slinger Jansen, Sjaak Brinkkemper, and Anthony Finkelstein. 2009. Business Network Management as a Survival Strategy: A Tale of Two Software Ecosystems. InProceedings of the First International Workshop on Software Ecosystems (IWSECO-2009) (CEUR Workshop Proceedings, Vol. 505). CEUR-WS, 34–48
2009
-
[20]
Kauffman
Stuart A. Kauffman. 1993.The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, New York, NY
1993
-
[21]
Kai-Kristian Kemell, Matti Saarikallio, Anh Nguyen-Duc, and Pekka Abrahamsson. 2025. Still just personal assistants? A multiple case study of generative AI adoption in software organizations.Information and Software Technology186, Article 107805 (2025). https://doi.org/10.1016/j.infsof.2025.107805
-
[22]
Sukrit Kumar, Drishti Goel, Thomas Zimmermann, Brian Houck, Balasubramanyan Ashok, and Chetan Bansal. 2025. Time Warp: The Gap Between Developers’ Ideal vs Actual Workweeks in an AI-Driven Era. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 12–22
2025
-
[23]
Meir M. Lehman. 1980. Programs, life cycles, and laws of software evolution.Proc. IEEE68, 9 (1980), 1060–1076. https://doi.org/10.1109/PROC.1980.11805
-
[24]
Mark W. Maier. 1998. Architecting principles for systems-of-systems.Systems Engineering1, 4 (1998), 267–284. https://doi.org/10.1002/(SICI)1520-6858(1998)1:4<267::AID-SYS3>3.0.CO;2-D
-
[25]
Konstantinos Manikas and Klaus M. Hansen. 2013. Software ecosystems: A systematic literature review.Journal of Systems and Software86, 5 (2013), 1294–1306. https://doi.org/10.1016/j.jss.2012.12.026
- [26]
-
[27]
Bertrand Meyer. 1992. Applying “Design by Contract”.Computer25, 10 (1992), 40–51. https://doi.org/10.1109/2.161279
-
[28]
David L. Parnas. 1972. On the Criteria to Be Used in Decomposing Systems into Modules.Commun. ACM15, 12 (1972), 1053–1058. https://doi.org/10.1145/361598.361623
-
[29]
Sida Peng, Eirini Kalliamvakou, Peter Croft, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE] https://arxiv.org/abs/2302.06590
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
1984.Normal Accidents: Living with High-Risk Technologies
Charles Perrow. 1984.Normal Accidents: Living with High-Risk Technologies. Basic Books, New York, NY
1984
- [31]
-
[32]
Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. InProceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 202–212. https://doi.org/10.1145/2491411.2491444
-
[33]
Daniel Russo. 2024. Navigating the Complexity of Generative AI Adoption in Software Engineering.ACM Transactions on Software Engineering and Methodology33, 5, Article 221 (2024). https://doi.org/10.1145/3680471
-
[34]
Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’18). ACM, 181–190. https://doi.org/10.1145/3183519.3183525
-
[35]
Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto. 2021. Automatically Assessing Code Understandability.IEEE Transactions on Software Engineering47, 3 (2021), 595–613. https://doi.org/10.1109/TSE.2019.2901468
-
[36]
Dag I. K. Sjøberg, Tore Dybå, Bente C. D. Anda, and Jo E. Hannay. 2008. Building Theories in Software Engineering. In Guide to Advanced Empirical Software Engineering, Forrest Shull, Janice Singer, and Dag I. K. Sjøberg (Eds.). Springer, 312–336. https://doi.org/10.1007/978-1-84800-044-5_12
-
[37]
Klaas-Jan Stol and Brian Fitzgerald. 2015. Theory-oriented Software Engineering.Science of Computer Programming 101 (2015), 79–98. https://doi.org/10.1016/j.scico.2014.11.010
-
[38]
Margaret-Anne Storey. 2026. From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI. arXiv:2603.22106 [cs.SE] https://arxiv.org/abs/2603.22106
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
Minh V. T. Thai, Tue Le, Dung Nguyen Manh, Huy Phan Nhat, and Nghi D. Q. Bui. 2025. SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios. arXiv:2512.18470 [cs.SE] https://arxiv.org/abs/2512. 18470v1 Statistics cited from v1; later versions report updated figures
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Eric L. Trist and Kenneth W. Bamforth. 1951. Some Social and Psychological Consequences of the Longwall Method of Coal-Getting.Human Relations4, 1 (1951), 3–38. https://doi.org/10.1177/001872675100400101
-
[41]
Nonnegative Decomposition of Multivariate Information
Paul L. Williams and Randall D. Beer. 2010. Nonnegative Decomposition of Multivariate Information. arXiv:1004.2515 [cs.IT] https://arxiv.org/abs/1004.2515
work page Pith review arXiv 2010
-
[42]
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155 , Vol. 1, No. 1, Article . Publication date: April 2026. More Is Different: Toward a Theory o...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
Chunqiu Steven Xia, Yinlin Zhang, and Lingming Zhang. 2024. Agentless: Demystifying LLM-based Software Engi- neering Agents. arXiv:2407.01489 [cs.SE] https://arxiv.org/abs/2407.01489 , Vol. 1, No. 1, Article . Publication date: April 2026
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.