arxiv: 2605.09128 · v1 · submitted 2026-05-09 · 💻 cs.MA · cs.AI

Recognition: no theorem link

Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

Hershraj Niranjani , Ujwal Kumar , Phan Xuan Tan

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:42 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords multi-agent AIconstitutional designdeliberationevolutioncooperationpublic goods gamepunishment mechanisms

0 comments

The pith

Evolution outperforms deliberation for multi-agent constitutions in collective-action settings but not in bilateral trading.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares two approaches to designing behavioral rules for multi-agent AI systems: internal deliberation among agents versus external evolution through optimization. In controlled simulations across a coordination grid-world, an iterated public goods game, and a bilateral trading market, evolution produces better outcomes than deliberation in the collective-action environments. Neither approach improves performance in trading. The key difference is that evolution consistently introduces punishment mechanisms to enforce cooperation, while deliberation never does across many trials. This suggests that external search can find effective but rigid rules, whereas internal processes prioritize flexibility over peak performance.

Core claim

In three social environments, external evolution significantly outperforms internal deliberation in collective-action settings (p < 0.01), while neither method improves outcomes in bilateral trading. Evolution reliably discovers punishment as a cooperation-sustaining mechanism, which no deliberation run ever proposes. When incentives are altered by reducing the pool multiplier to 0.75, evolution's advantage inverts and it forces value-destroying cooperation.

What carries the argument

The controlled comparison of internal deliberation (agent self-governance) and external evolution (optimization) across simulation environments for constitutional design.

Load-bearing premise

The chosen simulation environments, agent implementations, and ways of operationalizing deliberation and evolution represent the real trade-offs in multi-agent constitutional design.

What would settle it

If deliberation runs in the public goods game propose punishment mechanisms or if evolution does not outperform deliberation in collective action settings under the same conditions, the central findings would be falsified.

read the original abstract

Multi-agent AI systems need behavioral constitutions, but it is unresolved whether such rules should emerge internally through agent self-governance or be discovered externally through optimization. We present the first controlled comparison of internal deliberation and external evolution across three social environments: a coordination grid-world, an iterated public goods game, and a bilateral trading market. Across 180 simulation runs, evolution significantly outperforms deliberation in collective-action settings (p < 0.01), while neither method improves outcomes in bilateral trading. A multiplier ablation reveals that evolution's advantage inverts when incentives shift: at pool multiplier (m = 0.75) the evolved constitution forces value-destroying cooperation and becomes the worst-performing method. Notably, no deliberation run across thirty trials ever proposed punishment -- the canonical cooperation-sustaining mechanism evolution reliably discovers -- suggesting external optimization wins on peaks while internal self-governance trades peaks for structural responsiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper runs head-to-head simulations showing external evolution beats internal deliberation on performance in collective-action environments but can lock into value-destroying cooperation when incentives weaken, with deliberation never surfacing punishment.

read the letter

The main takeaway is that external evolutionary search finds stronger constitutions than internal agent deliberation in coordination and public-goods settings, while neither approach helps in bilateral trading, and the evolutionary advantage reverses at a low pool multiplier where it enforces harmful cooperation. The work also notes that none of the thirty deliberation trials ever proposed punishment, unlike the evolutionary runs that reliably did so. This supplies a concrete empirical contrast and flags a failure mode worth avoiding in future designs. The controlled comparison across three environments with 180 runs and reported significance is the clearest new element; the multiplier ablation adds a useful sensitivity check that shows the result is not unconditional. The environments are standard and the statistical reporting is straightforward, which gives the findings some grounding. The soft spots sit mostly in the implementation details. The abstract and available text give little on the exact deliberation protocol, agent action spaces, or whether punishment was even a feasible proposal for the deliberating agents. If the setup simply did not allow or incentivize sanctioning mechanisms, the zero rate is not strong evidence of a general internal-versus-external trade-off. The ablation is described as post-hoc, and without error bars or raw data it is difficult to judge robustness. These are fixable gaps rather than load-bearing flaws, but they limit how far the conclusions can be generalized right now. Readers working on multi-agent constitutional AI or cooperative systems will find the results and the identified risk directly relevant. The paper shows clear thinking about the internal-external distinction and honest reporting of the inversion, so it merits serious referee time even if the methods section needs expansion. I would send it to review with a request for fuller protocol description and any pre-registered analysis plan.

Referee Report

2 major / 3 minor

Summary. The manuscript presents a controlled simulation study comparing internal deliberation and external evolutionary optimization for designing behavioral constitutions in multi-agent AI systems. Across three environments (coordination grid-world, iterated public goods game, bilateral trading market) and 180 runs, evolution significantly outperforms deliberation in collective-action settings (p < 0.01) while neither improves bilateral trading outcomes. An ablation on pool multiplier m reveals that evolution's advantage inverts at m=0.75, producing value-destroying cooperation. The paper emphasizes that no deliberation trial proposed punishment—the mechanism reliably discovered by evolution—suggesting external methods reach performance peaks while internal governance prioritizes responsiveness.

Significance. If the simulation results and operationalizations hold, the work supplies empirical evidence on trade-offs in multi-agent constitutional design, showing that external optimization can discover cooperation-sustaining rules like punishment but risks brittleness under incentive shifts, while deliberation avoids such outcomes yet misses canonical mechanisms. The multi-environment design, statistical reporting, and targeted ablation provide a replicable framework for testing internal vs. external approaches, contributing to AI alignment and collective intelligence research with falsifiable simulation-based predictions.

major comments (2)

[§4] §4 (Deliberation protocol): The claim that zero of thirty deliberation trials proposed punishment is load-bearing for the conclusion that internal governance trades peaks for responsiveness. However, the description of the deliberation action space, constitutional proposal rules, and whether sanctioning mechanisms were feasible outputs is insufficient to rule out that this zero rate is an artifact of the chosen operationalization rather than a general property of deliberation.
[§5.2] §5.2 (Multiplier ablation): The inversion at m=0.75 is presented as evidence that evolved constitutions can force value-destroying cooperation, but the section does not clarify whether this ablation was pre-specified or post-hoc, nor does it report full parameter settings, baseline comparisons, or variance across the thirty runs per condition needed to assess robustness.

minor comments (3)

[Abstract and §5] The abstract and results sections report 180 runs and p < 0.01 but omit error bars, confidence intervals, or raw data availability; supplementary material with per-run outcomes would strengthen reproducibility.
[§3.2] Notation for the pool multiplier m and its role in the public goods game is introduced without an explicit equation linking it to agent payoffs; adding this in §3.2 would improve clarity.
[§5.3] The bilateral trading environment results are summarized as 'neither method improves outcomes' without a table or figure showing the no-constitution baseline; this comparison should be explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and positive review, which highlights the work's potential contribution to multi-agent constitutional design. We address each major comment point by point below, agreeing where additional clarity is needed and outlining specific revisions.

read point-by-point responses

Referee: [§4] §4 (Deliberation protocol): The claim that zero of thirty deliberation trials proposed punishment is load-bearing for the conclusion that internal governance trades peaks for responsiveness. However, the description of the deliberation action space, constitutional proposal rules, and whether sanctioning mechanisms were feasible outputs is insufficient to rule out that this zero rate is an artifact of the chosen operationalization rather than a general property of deliberation.

Authors: We agree that the load-bearing nature of this result requires stronger documentation of the deliberation protocol. The current manuscript describes the high-level deliberation process but does not exhaustively enumerate the action space or explicitly confirm that sanctioning mechanisms (including punishment) were valid proposal outputs. In the revised manuscript, we will expand §4 with: (i) a complete specification of the constitutional proposal action space, (ii) explicit confirmation and examples showing that punishment and other sanctions were feasible outputs, and (iii) sample deliberation traces illustrating that agents could and did propose other mechanisms but never punishment across the 30 trials. These additions will allow readers to evaluate whether the zero rate reflects the internal deliberation dynamics rather than an operationalization artifact. We will also add a limitations paragraph noting that the finding is specific to the tested environments and protocol. revision: yes
Referee: [§5.2] §5.2 (Multiplier ablation): The inversion at m=0.75 is presented as evidence that evolved constitutions can force value-destroying cooperation, but the section does not clarify whether this ablation was pre-specified or post-hoc, nor does it report full parameter settings, baseline comparisons, or variance across the thirty runs per condition needed to assess robustness.

Authors: We acknowledge that §5.2 would benefit from greater transparency on experimental design and reporting. The multiplier ablation was pre-specified as part of the robustness analysis in the original experimental plan (detailed in the methods), but this is not stated explicitly in the results section, and variance statistics plus full parameter tables are omitted. In the revision we will: (i) state that the ablation was pre-specified, (ii) provide the complete parameter settings (including the full range of m values tested and all other fixed parameters), (iii) add direct baseline comparisons to the primary m=1 condition, and (iv) report variance measures (standard deviations, confidence intervals, and per-condition statistical tests) across the 30 runs. These changes will enable proper assessment of the inversion's robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical simulation results independent of inputs

full rationale

The paper reports outcomes from 180 controlled simulation runs across three environments, with statistical comparisons (e.g., p < 0.01) and an ablation on multipliers. No equations, fitted parameters, or derivations are presented that reduce the central claims to self-referential definitions or prior self-citations. The key observation (zero punishment proposals in deliberation trials) is stated as a direct empirical count from the runs, not constructed by re-labeling inputs. The work is self-contained against external benchmarks via explicit simulation protocols.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; the pool multiplier is treated as an experimental variable rather than a fitted constant, and no new entities or unstated axioms are introduced.

free parameters (1)

pool multiplier m
Ablation variable used to demonstrate inversion of performance; value 0.75 is reported as the point where evolved constitutions become harmful.

pith-pipeline@v0.9.0 · 5454 in / 1172 out tokens · 66486 ms · 2026-05-12T02:42:02.017913+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 4 internal anchors

[1]

arXiv preprint arXiv:2602.00755 , year=

Evolving Interpretable Constitutions for Multi-Agent Coordination , author=. arXiv preprint arXiv:2602.00755 , year=

work page arXiv
[2]

Findings of the Association for Computational Linguistics: EACL 2026 , year=

Stay Focused: Problem Drift in Multi-Agent Debate , author=. Findings of the Association for Computational Linguistics: EACL 2026 , year=

work page 2026
[3]

Thareja, Rushil and Gupta, Gautam and Pinto, Francesco and Lukas, Nils , journal=

work page
[4]

arXiv preprint arXiv:2506.00066 , year=

A Literature Review of Multi-Agent Debate for Problem-Solving , author=. arXiv preprint arXiv:2506.00066 , year=

work page arXiv
[5]

arXiv preprint arXiv:2603.13189 , year=

de Curt\`. arXiv preprint arXiv:2603.13189 , year=

work page arXiv
[6]

Enhancing

Bohnet, Bernd and Kamienny, Pierre-Alexandre and Sedghi, Hanie and Gorur, Dilan and Awasthi, Pranjal and Parisi, Aaron and Swersky, Kevin and Liu, Rosanne and Nova, Azade and Fiedel, Noah , journal=. Enhancing

work page
[7]

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback , author=. arXiv preprint arXiv:2212.08073 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Science , volume=

The evolution of cooperation , author=. Science , volume=

work page
[9]

AAMAS , pages=

Multi-agent reinforcement learning in sequential social dilemmas , author=. AAMAS , pages=

work page
[10]

Generative Agents: Interactive Simulacra of Human Behavior

Generative Agents: Interactive Simulacra of Human Behavior , author=. arXiv preprint arXiv:2304.03442 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

2025 , publisher =

OpenEvolve: An Open-Source Evolutionary Coding Agent , author =. 2025 , publisher =

work page 2025
[12]

2025 , howpublished =

work page 2025
[13]

arXiv preprint arXiv:2402.12590 , year=

Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation , author=. arXiv preprint arXiv:2402.12590 , year=

work page arXiv
[14]

Nature Human Behaviour , volume=

Human-centred mechanism design with Democratic AI , author=. Nature Human Behaviour , volume=

work page
[15]

arXiv preprint arXiv:2506.01080 , year=

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process , author=. arXiv preprint arXiv:2506.01080 , year=

work page arXiv
[16]

1971 , publisher=

A Theory of Justice , author=. 1971 , publisher=

work page 1971
[17]

Journal of Political Economy , volume=

A difficulty in the concept of social welfare , author=. Journal of Political Economy , volume=

work page
[18]

Nature , volume=

Mathematical Discoveries from Program Search with Large Language Models , author=. Nature , volume=. 2024 , publisher=

work page 2024
[19]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Improving Factuality and Reasoning in Language Models through Multiagent Debate , author=. arXiv preprint arXiv:2305.14325 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

NeurIPS , year=

Inequity aversion improves cooperation in intertemporal social dilemmas , author=. NeurIPS , year=

work page
[21]

net/forum?id=ByxBFsRqYm

Evolution through large models , author=. arXiv preprint arXiv:2206.08896 , year=

work page arXiv
[22]

The capacity for moral self-correction in large language models

The capacity for moral self-correction in large language models , author=. arXiv preprint arXiv:2302.07459 , year=

work page arXiv
[23]

A General Language Assistant as a Laboratory for Alignment

A general language assistant as a laboratory for alignment , author=. arXiv preprint arXiv:2112.00861 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Agentic misalignment: How llms could be insider threats,

Agentic Misalignment: How LLMs Could Be Insider Threats , author=. arXiv preprint arXiv:2510.05179 , year=

work page arXiv
[25]

ICML , pages=

Automl-zero: Evolving machine learning algorithms from scratch , author=. ICML , pages=

work page
[26]

Illuminating search spaces by mapping elites

Illuminating Search Spaces by Mapping Elites , author=. arXiv preprint arXiv:1504.04909 , year=

work page Pith review arXiv
[27]

NeurIPS , year=

Deep reinforcement learning from human preferences , author=. NeurIPS , year=

work page
[28]

2025 , url=

The Claude Model Spec , author=. 2025 , url=

work page 2025