pith. sign in

arxiv: 2606.02859 · v1 · pith:RRDD2WLRnew · submitted 2026-06-01 · 💻 cs.CL · cs.AI· cs.MA

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Pith reviewed 2026-06-28 14:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.MA
keywords multi-agent systemseconomic incentivesdecentralized coordinationemergent reasoningauction mechanismswealth-based selectioncredit assignmentagent populations
0
0 comments X

The pith

A population of agents competing via auctions for actions and accumulating wealth develops emergent multi-step reasoning that beats stronger single models on five tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether agents can self-orchestrate stronger collective intelligence through market-like interactions rather than through engineered coordination. Agents bid in auctions for the right to act, exchange payments, and gain or lose wealth based on environmental rewards; wealth determines which agents persist and mutate while bankrupt ones are replaced. Starting from weak agents, these local economic signals produce decentralized credit assignment and long-term planning without any central controller or communication protocol. The resulting population outperforms stronger monolithic baselines on mathematical reasoning, financial research, scientific research, accelerator design, and distributed-system optimization. If the mechanism works as described, multi-agent intelligence can emerge from incentive design alone.

Core claim

Initialized with weak agents, the economy produces emergent multi-step reasoning strategies and outperforms stronger monolithic baselines across five agentic tasks, including mathematical reasoning, financial research, scientific research, accelerator design, and distributed-system optimization. Economic signals induce decentralized credit assignment, driving planning without global orchestration or explicit communication protocols. The population evolves through economic selection where effective agents accumulate wealth and are mutated while ineffective ones go bankrupt and are replaced.

What carries the argument

Auction-based rights to act combined with wealth accumulation and replacement of bankrupt agents, which together create decentralized credit assignment from local incentives.

If this is right

  • Local auction and payment signals produce decentralized credit assignment without any global controller.
  • Long-term planning emerges in the population even though no agent communicates plans explicitly.
  • Performance gains appear across mathematical, financial, scientific, design, and optimization tasks.
  • Economic selection replaces weak agents while preserving and mutating effective ones.
  • Theoretical links connect local incentives directly to measured global task performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same incentive structure might be tested in physical multi-robot settings where communication is costly.
  • Varying the auction rules or wealth thresholds could reveal which parameters most affect emergence speed.
  • If the economy is run for many more generations, new agent specializations might appear beyond the five tasks studied.
  • The approach suggests exploring whether similar selection pressures could improve single-agent systems without adding new agents.

Load-bearing premise

The combination of auction-based action rights and wealth-based replacement will reliably drive credit assignment and long-term planning without global orchestration or explicit communication.

What would settle it

Running the described economy on the five listed tasks and finding no performance gain over the stronger monolithic baselines or no appearance of multi-step reasoning strategies would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.02859 by Ao Qu, Chenyu Wang, Guowei Xu, Han Zheng, Himabindu Lakkaraju, Huangyuan Su, Ju Li, Kushal Chattopadhyay, Paul Pu Liang, Sham Kakade, Vijay Janapa Reddi, Weirui Ye, Yilun Du, Yu Yao, Zhenting Qi, Zihan Wang.

Figure 1
Figure 1. Figure 1: Evolution of an agent society over a stream of tasks. Each panel shows the population at a given stage, where agents are continuously created, selected, connected, and eliminated. As the society encounters more tasks, ineffective agents are removed and corrected, while useful ones persist and diversify, leading to an alive and increasingly structured population. ∗Corresponding authors Preprint. arXiv:2606.… view at source ↗
Figure 2
Figure 2. Figure 2: Auctions. Agents whose wake-up conditions are satisfied become eligible to bid; the highest bidder wins the auction, executes the action, and advances the environment from st to st+1. At each environment step, agents compete for control through an auction ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transactions. Credit assignment naturally emerges as profits flow backward through the action sequence, rewarding agents whose actions enable successful downstream outcomes. reward rt. Let a ⋆ t−1 denote the previous winning agent in the same episode. We then apply a bucket-brigade transfer rule ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows that, on Finance-Agent-Bench, EOM rises from 45.0% at initialization to 60.0% after 30 training tasks. This outperforms Multi-Agent Debate at 50.0%, REACT at 45.0%, and GEA at 50.0%, even though each partial agent in EOM can access only one tool. On FrontierScience￾Research, EOM reaches 8.5% mean accuracy and 20.0% best-run accuracy, compared with 1.8% mean and 5.0% best-run accuracy for GEA under th… view at source ↗
Figure 5
Figure 5. Figure 5: Training dynamics in accelerator design. Per-agent wealth on three representative ResNet-50 kernels. Wealth flow to agents that produce new EDP records; rent uniformly deducts wealth. Periodic births spawn good-birth children (⋆, exploitation: mutated from the richest agent) and bad-birth children (+, exploration: amended from the weakest); wealth < 0 triggers bankruptcy (×). Shaded bands are rolling ±1σ. … view at source ↗
Figure 6
Figure 6. Figure 6: Easy-to-hard generalization on MATH. Test accuracy across MATH difficulty levels during training. The partial agent population improves not only on the easier levels encountered earlier, but also on harder levels that are initially beyond its capability, indicating that behaviors learned on simple problems can be reused on more difficult ones. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mechanism, robustness, and generalization analyses. (a) Per-kernel EDP on ResNet-50. Best EDP found by DOSA, REACT, and EOM on log scale; lower is better. (b) Comparison between the default easy-to-hard curriculum and a reversed hard-to-easy curriculum. (c) Adding a strong generalist agent with access to all tools does not automatically dominate specialized agents. Can a complete generalist monopolize the … view at source ↗
read the original abstract

How can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordination in markets, we study this question through an agent economy in which agents compete via auctions for the right to act, exchange payments, and accumulate wealth from environmental rewards. These simple economic signals induce decentralized credit assignment, driving planning without global orchestration or explicit communication protocols. The population evolves through economic selection: effective agents accumulate wealth and are mutated via exploitation, while ineffective ones go bankrupt and are replaced via exploration. We show that, initialized with weak agents, the economy produces emergent multi-step reasoning strategies and outperforms stronger monolithic baselines across five agentic tasks, including mathematical reasoning, financial research, scientific research, accelerator design, and distributed-system optimization. We further provide theoretical insights into how economic dynamics shape agent behaviors, linking local incentives to long-term global performance. Our results suggest a new path to multi-agent intelligence: rather than engineering coordination, we can design decentralized incentive structures under which it automatically emerges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes an 'Economy of Minds' framework in which a population of agents competes via auctions for action rights, exchanges payments, and accumulates wealth from environmental rewards. Inspired by Hayek, the authors claim that these economic signals induce decentralized credit assignment and emergent multi-step reasoning without global orchestration or explicit communication. Initialized with weak agents, the economy is said to evolve via wealth-based selection (mutation of successful agents, replacement of bankrupt ones) and to outperform stronger monolithic baselines on five agentic tasks: mathematical reasoning, financial research, scientific research, accelerator design, and distributed-system optimization. Theoretical insights link local incentives to global performance.

Significance. If the empirical results and decentralization claims hold, the work would offer a novel incentive-based route to multi-agent intelligence that avoids explicit coordination engineering. This could influence scalable agent systems in complex domains, provided the mechanisms demonstrably operate without hidden central components and the performance gains are robustly documented.

major comments (2)
  1. [Abstract] Abstract: the claim of outperformance 'across five agentic tasks' is asserted without any experimental details, baseline definitions, statistical tests, ablation studies, or quantitative results. This prevents evaluation of whether the data support the central claim that economic interactions produce emergent strategies superior to monolithic baselines.
  2. [Abstract] Abstract (paragraph on economic signals): the assertion that 'auction-based action rights and wealth-based replacement' produce decentralized credit assignment 'without global orchestration' is load-bearing for the no-central-control claim, yet the manuscript supplies no implementation description showing how simultaneous bids are resolved or bankruptcy/replacement is triggered without a shared ledger, central clearinghouse, or population-level ranking step. If either mechanism requires such coordination, the observed improvements may stem from implicit global structure rather than pure local economic signals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below. The abstract is intentionally concise, but we agree it can be strengthened with additional detail from the body of the paper without altering its length constraints.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of outperformance 'across five agentic tasks' is asserted without any experimental details, baseline definitions, statistical tests, ablation studies, or quantitative results. This prevents evaluation of whether the data support the central claim that economic interactions produce emergent strategies superior to monolithic baselines.

    Authors: The abstract summarizes results that are fully detailed in Sections 4 and 5 of the manuscript, including task-specific baselines (e.g., single-agent GPT-4, chain-of-thought, and multi-agent variants without economic mechanisms), performance tables with means and standard deviations over multiple runs, and ablation studies on auction and wealth components. Statistical significance is reported via paired t-tests. To improve evaluability from the abstract alone, we will revise it to include one or two key quantitative results (e.g., average improvement margins) while preserving brevity. revision: partial

  2. Referee: [Abstract] Abstract (paragraph on economic signals): the assertion that 'auction-based action rights and wealth-based replacement' produce decentralized credit assignment 'without global orchestration' is load-bearing for the no-central-control claim, yet the manuscript supplies no implementation description showing how simultaneous bids are resolved or bankruptcy/replacement is triggered without a shared ledger, central clearinghouse, or population-level ranking step. If either mechanism requires such coordination, the observed improvements may stem from implicit global structure rather than pure local economic signals.

    Authors: The Methods section (Section 3) specifies that bids are submitted locally by each agent to a market that clears via a standard second-price auction rule applied only to the current action slot; no agent receives global state or rankings. Wealth is updated from per-task environmental rewards paid directly to the acting agent. Bankruptcy simply removes the agent and samples a new random initialization; there is no population-level fitness ranking or central controller that dictates which agents plan or reason. The simulation infrastructure necessarily resolves bids, but this is limited to market clearing and does not orchestrate multi-step reasoning or credit assignment. We will add a short clarifying paragraph in the revised Methods to emphasize the locality of all decision-making. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical outcomes

full rationale

The paper presents an empirical setup in which weak agents interact via auctions and wealth accumulation, then reports that emergent strategies appear and outperform monolithic baselines on five tasks. No equations, fitted parameters, or self-referential definitions are supplied that would make any claimed prediction equivalent to its inputs by construction. Theoretical insights are asserted but not shown to reduce to self-citation chains or ansatzes imported from prior author work. The argument is therefore self-contained against external benchmarks and receives the default non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that market-style signals suffice for credit assignment; no free parameters or invented physical entities are introduced in the abstract.

axioms (1)
  • domain assumption Economic signals via auctions and payments induce decentralized credit assignment without global orchestration
    Invoked as the core inductive step in the abstract description of agent behavior.

pith-pipeline@v0.9.1-grok · 5770 in / 1102 out tokens · 23298 ms · 2026-06-28T14:24:54.749862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AgentDSE: Reasoning-Augmented Architectural Design Space Exploration

    cs.AR 2026-06 unverdicted novelty 7.0

    AgentDSE uses an LLM agent in a simulator-in-the-loop setup to achieve competitive or superior architectural designs with up to 100x fewer evaluations than traditional black-box optimization methods.

Reference graph

Works this paper leans on

56 extracted references · 28 canonical work pages · cited by 1 Pith paper · 9 internal anchors

  1. [1]

    Eric B. Baum. Toward a model of mind as a laissez-faire economy of idiots. InProceedings of the 13th International Conference on Machine Learning, pages 20–27, 1996. 2, 17

  2. [2]

    Toward a model of intelligence as an economy of agents.Machine Learning, 35 (2):155–185, 1999

    Eric B Baum. Toward a model of intelligence as an economy of agents.Machine Learning, 35 (2):155–185, 1999. 2, 3

  3. [3]

    Evolution of cooperative problem solving in an artificial economy.Neural Computation, 12(12):2743–2775, 2000

    Eric B Baum and Igor Durdanovic. Evolution of cooperative problem solving in an artificial economy.Neural Computation, 12(12):2743–2775, 2000. 3, 17

  4. [4]

    The emergence of complex behavior in large-scale ecological environments.arXiv preprint arXiv:2510.18221, 2025

    Joseph Bejjani, Chase Van Amburg, Chengrui Wang, Chloe Huangyuan Su, Sarah M Pratt, Yasin Mazloumi, Naeem Khoshnevis, Sham M Kakade, Kianté Brantley, and Aaron Walsman. The emergence of complex behavior in large-scale ecological environments.arXiv preprint arXiv:2510.18221, 2025. 9

  5. [5]

    Finance agent benchmark: Benchmarking llms on real-world financial research tasks.arXiv preprint arXiv:2508.00828, 2025

    Antoine Bigeard, Langston Nashold, Rayan Krishnan, and Shirley Wu. Finance agent benchmark: Benchmarking llms on real-world financial research tasks.arXiv preprint arXiv:2508.00828, 2025. 5, 25

  6. [6]

    Why Do Multi-Agent LLM Systems Fail?

    Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi- agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025. 2

  7. [7]

    Multi-agent evolve: Llm self-improve through co-evolution.arXiv preprint arXiv:2510.23595, 2025

    Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhang, Mostofa Patwary, and Jiaxuan You. Multi-agent evolve: Llm self-improve through co-evolution.arXiv preprint arXiv:2510.23595, 2025. 2, 9

  8. [8]

    Barbarians at the gate: How ai is upending systems research.arXiv preprint arXiv:2510.06189, 2025

    Audrey Cheng, Shu Liu, Melissa Pan, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, et al. Barbarians at the gate: How ai is upending systems research.arXiv preprint arXiv:2510.06189, 2025. 5, 26

  9. [9]

    X. Dai, Y . Zhu, et al. Geoevolver: A self-evolving multi-agent system for earth observation. arXiv preprint arXiv:2602.02559, 2026. 9

  10. [10]

    Market-based multirobot coordination: A survey and analysis.Proceedings of the IEEE, 94(7):1257–1270, 2006

    M Bernardine Dias, Robert Zlot, Nidhi Kalra, and Anthony Stentz. Market-based multirobot coordination: A survey and analysis.Proceedings of the IEEE, 94(7):1257–1270, 2006. 17

  11. [11]

    Improv- ing factuality and reasoning in language models through multiagent debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning, 2024. 5, 17, 24

  12. [12]

    Mechanism design for large language models

    Paul Duetting, Vahab Mirrokni, Renato Paes Leme, Haifeng Xu, and Song Zuo. Mechanism design for large language models. InProceedings of the ACM Web Conference 2024, pages 144–155, 2024. 17

  13. [13]

    Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

    Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, et al. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In2021 58th ACM/IEEE Design Automation Conference (DAC), pages 769–774. IEEE, 2021. 5, 25

  14. [14]

    Friedrich A. Hayek. The use of knowledge in society.The American Economic Review, 35(4): 519–530, 1945. 2

  15. [15]

    Hayek.Law, Legislation and Liberty

    Friedrich A. Hayek.Law, Legislation and Liberty. University of Chicago Press, 1973. 17

  16. [16]

    Measuring Mathematical Problem Solving With the MATH Dataset

    Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021. 5, 24 10

  17. [17]

    John H. Holland. Properties of the bucket brigade algorithm. InProceedings of the 1st International Conference on Genetic Algorithms, 1985. 17

  18. [18]

    Dosa: Differentiable model-based one-loop search for dnn accelerators

    Charles Hong, Qijing Huang, Grace Dinh, Mahesh Subedar, and Yakun Sophia Shao. Dosa: Differentiable model-based one-loop search for dnn accelerators. InProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, pages 209–224, 2023. 6, 24, 25

  19. [19]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023. 2

  20. [20]

    Alma: Hierarchical learning for composite multi- agent tasks.Advances in neural information processing systems, 35:7155–7166, 2022

    Shariq Iqbal, Robby Costales, and Fei Sha. Alma: Hierarchical learning for composite multi- agent tasks.Advances in neural information processing systems, 35:7155–7166, 2022. 2

  21. [21]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024. 2

  22. [22]

    Towards a Science of Scaling Agent Systems

    Yubin Kim, Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, A Ali Heydari, Yao Yan, Zhihan Zhang, Yuchen Zhuang, Yun Liu, et al. Towards a science of scaling agent systems. arXiv preprint arXiv:2512.08296, 2025. 2

  23. [23]

    Multi-agent verification: Scaling test-time compute with multiple verifiers.arXiv preprint arXiv:2502.20379, 2025

    Shalev Lifshitz, Sheila A McIlraith, and Yilun Du. Multi-agent verification: Scaling test-time compute with multiple verifiers.arXiv preprint arXiv:2502.20379, 2025. 17

  24. [24]

    Improved multi-agent collaboration with multi-turn reinforcement learning

    Shuo Liu, Tianle Chen, and Christopher Amato. Improved multi-agent collaboration with multi-turn reinforcement learning. InFirst Workshop on Multi-Turn Interactions in Large Language Models. 17

  25. [25]

    Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

    Carol Xuan Long, David Simchi-Levi, Feng Zhu, Huangyuan Su, Andre P. Calmon, and Flavio P. Calmon. Reliability and effectiveness of autonomous ai agents in supply chain management. arXiv preprint arXiv:2605.17036, 2026. 17

  26. [26]

    Simon and Schuster, 1988

    Marvin Minsky.The Society of Mind. Simon and Schuster, 1988. 17

  27. [27]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025. 5, 24, 26

  28. [28]

    Deep decentralized multi-task multi-agent reinforcement learning under partial observability

    Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International conference on machine learning, pages 2681–2690. PMLR, 2017. 2

  29. [29]

    Maporl: Multi-agent post-co-training for collaborative large language models with reinforcement learning

    Chanwoo Park, Seungju Han, Xingzhi Guo, Asuman E Ozdaglar, Kaiqing Zhang, and Joo- Kyung Kim. Maporl: Multi-agent post-co-training for collaborative large language models with reinforcement learning. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30215–30248, 2025. 17

  30. [30]

    Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37: 126544–126565, 2024

    Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37: 126544–126565, 2024. 2

  31. [31]

    Quarch: A benchmark for evaluating llm reasoning in computer architecture.arXiv preprint arXiv:2510.22087, 2025

    Shvetank Prakash, Andrew Cheng, Arya Tschand, Mark Mazumder, Varun Gohil, Jeffrey Ma, Jason Yik, Zishen Wan, Jessica Quaye, Elisavet Lydia Alvanaki, et al. Quarch: A benchmark for evaluating llm reasoning in computer architecture.arXiv preprint arXiv:2510.22087, 2025. 25

  32. [32]

    Chatdev: Communicative agents for software development

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers), pages 15174–15186, 2024. 2 11

  33. [33]

    CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

    Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026. 2

  34. [34]

    The neural bucket brigade: A local learning algorithm for dynamic feedforward and recurrent networks.Connection Science, 1(4):403–412, 1989

    Jürgen Schmidhuber. The neural bucket brigade: A local learning algorithm for dynamic feedforward and recurrent networks.Connection Science, 1(4):403–412, 1989. 17

  35. [35]

    Openevolve: an open-source evolutionary coding agent, 2025

    Asankhaya Sharma. Openevolve: an open-source evolutionary coding agent, 2025. URL https://github.com/algorithmicsuperintelligence/openevolve. 5, 24, 26

  36. [36]

    Multiagent finetuning: Self improvement with diverse reasoning chains.arXiv preprint arXiv:2501.05707, 2025

    Vighnesh Subramaniam, Yilun Du, Joshua B Tenenbaum, Antonio Torralba, Shuang Li, and Igor Mordatch. Multiagent finetuning: Self improvement with diverse reasoning chains.arXiv preprint arXiv:2501.05707, 2025. 17

  37. [37]

    Market-based architectures in rl and beyond

    Abhimanyu Pallavi Sudhir and Long Tran-Thanh. Market-based architectures in rl and beyond. Accepted to AAMAS 2025 Blue Sky Track, abs/2503.05828, 2025. 17

  38. [38]

    Gibbon: Efficient co-exploration of nn model and processing-in-memory architecture

    Hanbo Sun, Chenyu Wang, Zhenhua Zhu, Xuefei Ning, Guohao Dai, Huazhong Yang, and Yu Wang. Gibbon: Efficient co-exploration of nn model and processing-in-memory architecture. In2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 867–872. IEEE, 2022. 25

  39. [39]

    Scaling long-horizon LLM agent via context-folding.arXiv preprint arXiv:2510.11967,

    Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, and Jiecao Chen. Scaling long-horizon llm agent via context-folding.arXiv preprint arXiv:2510.11967, 2025. 2

  40. [40]

    Genai for systems: Recurring challenges and design principles from software to silicon, 2026

    Arya Tschand, Chenyu Wang, Zishen Wan, Andrew Cheng, Ioana Cristescu, Kevin He, Howard Huang, Alexander Ingare, Akseli Kangaslahti, Sara Kangaslahti, Theo Lebryk, Hongjin Lin, Jeffrey Jian Ma, Alexandru Meterez, Clara Mohri, Depen Morwani, Sunny Qin, Roy Rinberg, Paula Rodriguez-Diaz, Alyssa Mia Taliotis, Pernille Undrum Fathi, Rosie Zhao, Todd Zhou, and ...

  41. [41]

    Slm-mux: Orchestrating small language models for reasoning

    Chenyu Wang, Zishen Wan, Hao Kang, Emma Chen, Zhiqiang Xie, Tushar Krishna, Vi- jay Janapa Reddi, and Yilun Du. Slm-mux: Orchestrating small language models for reasoning. arXiv preprint arXiv:2510.05077, 2025. 17

  42. [42]

    Aesp: A human-sovereign economic protocol for ai agents with privacy- preserving settlement.arXiv preprint arXiv:2603.00318, 2026

    Jian Sheng Wang. Aesp: A human-sovereign economic protocol for ai agents with privacy- preserving settlement.arXiv preprint arXiv:2603.00318, 2026. 17

  43. [43]

    Frontierscience: Evaluating ai’s ability to perform expert-level scientific tasks.arXiv preprint arXiv:2601.21165, 2026

    Miles Wang, Robi Lin, Kat Hu, Joy Jiao, Neil Chowdhury, Ethan Chang, and Tejal Patwardhan. Frontierscience: Evaluating ai’s ability to perform expert-level scientific tasks.arXiv preprint arXiv:2601.21165, 2026. 5, 25

  44. [44]

    ThetaEvolve: Test-time Learning on Open Problems

    Yiping Wang, Shao-Rong Su, Zhiyuan Zeng, Eva Xu, Liliang Ren, Xinyu Yang, Zeyi Huang, Xuehai He, Luyao Ma, Baolin Peng, et al. Thetaevolve: Test-time learning on open problems. arXiv preprint arXiv:2511.23473, 2025. 26

  45. [45]

    arXiv preprint arXiv:2602.04837 , year=

    Zhaotian Weng, Antonis Antoniades, Deepak Nathani, Zhen Zhang, Xiao Pu, and Xin Eric Wang. Group-evolving agents: Open-ended self-improvement via experience sharing.arXiv preprint arXiv:2602.04837, 2026. 5, 9, 24

  46. [46]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024. 2

  47. [47]

    The agent economy: A blockchain-based foundation for autonomous ai agents

    Minghui Xu. The agent economy: A blockchain-based foundation for autonomous ai agents

  48. [48]

    Comas: Co-evolving multi-agent systems via interaction rewards.arXiv preprint arXiv:2510.08529, 2025

    Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, and Lei Bai. Comas: Co-evolving multi-agent systems via interaction rewards.arXiv preprint arXiv:2510.08529, 2025. 2

  49. [49]

    Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems,

    Yingxuan Yang, Huacan Chai, Shuai Shao, Yuanyi Song, Siyuan Qi, Renting Rui, and Weinan Zhang. Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems,

  50. [50]

    2, 17 12

    URLhttps://arxiv.org/abs/2504.00587. 2, 17 12

  51. [51]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022. 2, 5, 24

  52. [52]

    World Action Models are Zero-shot Policies

    Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, Shenyuan Gao, Sihyun Yu, George Kurian, Suneel Indupuru, You Liang Tan, Chuning Zhu, Jiannan Xiang, et al. World action models are zero-shot policies.arXiv preprint arXiv:2602.15922, 2026. 2

  53. [53]

    Learning to discover at test time.arXiv preprint, 2026

    Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, and Yu Sun. Learning to discover at test time.arXiv preprint, 2026. 26

  54. [54]

    Dynamic role assignment for multi-agent debate.arXiv preprint arXiv:2601.17152, 2026

    Miao Zhang, Junsik Kim, Siyuan Xiang, Jian Gao, and Cheng Cao. Dynamic role assignment for multi-agent debate.arXiv preprint arXiv:2601.17152, 2026. 17

  55. [55]

    Reso: A reward-driven self-organizing llm-based multi-agent system for reasoning tasks

    Heng Zhou, Hejia Geng, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin, and Lei Bai. Reso: A reward-driven self-organizing llm-based multi-agent system for reasoning tasks. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 15990–16009, 2025. 9

  56. [56]

    society of minds,

    Andy Zhu and Yingjun Du. A role-aware multi-agent framework for financial education question answering with llms.arXiv preprint arXiv:2509.09727, 2025. 17 13 Appendix - Table of Contents A Pseudo Code 16 B Extended Related Works 17 C Theoretical Motivations 17 C.1 Market selection drives bids toward value . . . . . . . . . . . . . . . . . . . . . . 18 C.2...