pith. sign in

arxiv: 2605.29062 · v1 · pith:F3X3W5CZnew · submitted 2026-05-27 · 💻 cs.CL

Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies

Pith reviewed 2026-06-29 12:29 UTC · model grok-4.3

classification 💻 cs.CL
keywords LLM agentspower asymmetrycommons managementcooperation breakdownmulti-agent simulationresource sustainabilitygenerative agentsgovernance simulation
0
0 comments X

The pith

Power asymmetry in LLM agent societies causes up to 87.3% worse survival rates for shared resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how large language models act as agents managing shared resources when one agent holds extra control over extraction and outcomes. It builds a simulation framework that places a boss or king agent alongside symmetric worker or peasant agents who all draw from the same pool. Results across eleven models show sharp drops in cooperation, with survival rates falling as much as 87.3 percent compared to equal-power versions. This setup matters because real commons like fisheries and forests often operate under similar imbalances, and LLMs are increasingly used in synthetic governance tests. The work highlights the need to account for power differences when evaluating AI agent groups.

Core claim

Across eleven state-of-the-art models, introducing asymmetric power leads to severe breakdowns in cooperation and sustainability, with up to an 87.3% degradation in survival rate relative to symmetric settings. The simulation incorporates an agent with asymmetric power (boss or king) into a society of symmetric agents (workers or peasants), where all agents extract from a shared resource, collectively determining its sustainability over time.

What carries the argument

Sovereignty over the Commons Simulation (SovSim), a generative multi-agent simulation framework that adds one agent with extra control over resource extraction and collective outcomes to a group of symmetric agents.

If this is right

  • LLM societies need symmetric power distributions to sustain cooperative norms around shared resources.
  • Asymmetric power structures produce faster resource depletion and lower group survival in multi-agent simulations.
  • Standard evaluations of LLM agents must add power asymmetry tests to measure realistic governance performance.
  • Ostrom-style self-governance findings may not transfer to LLM groups once one agent gains disproportionate control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI systems placed in real decision roles could encounter parallel cooperation failures if power concentrates in one agent.
  • Different ways of coding power asymmetry might show at what level of control the cooperation threshold is crossed.
  • Adding human overrides or mixed agent types could reduce the observed sustainability losses in follow-up tests.

Load-bearing premise

The specific implementation of asymmetric power through a boss or king agent's extra control is assumed to model real-world power imbalances in the same way they would affect LLM behavior.

What would settle it

Running the same simulations but replacing the boss or king control with a different mechanism, such as random outcome influence or voting weight, and checking whether the 87.3 percent degradation still appears.

Figures

Figures reproduced from arXiv: 2605.29062 by Abhilekh Borah.

Figure 1
Figure 1. Figure 1: SOVSIM is grounded in the study of asymmetric power in social dilemmas, motivated by the “bosses and kings” experimental paradigm (Cox et al., 2011), which shows how differences in authority among agents can significantly alter efficiency and collective outcomes in common-pool resource settings. As shown in the figure, agents with equal power first decide how much to extract from a shared resource (commons… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the SOVSIM workflow for common-pool resource games. Given a shared pool (center) with an initial value, agents interact over repeated rounds (left), where symmetric agents (peasants or workers) independently decide how much to extract from the pool in multiples of 3. In asymmetric game conditions such as KCPR and BCPR (see Section 2.2), a dominant agent (boss or king) observes others’ extractio… view at source ↗
Figure 3
Figure 3. Figure 3: Dynamics of the shared resource pool across four game conditions. Each plot shows the evolution of the pool value over 12 rounds across multiple LLM agents. The dashed red line denotes the collapse threshold ($12). Across conditions, increasing power asymmetry (BCPR, KCPR and KCPR-M) leads to earlier and more frequent resource collapse, while symmetric agents in CPR sustain the pool near capacity. Shaded r… view at source ↗
Figure 4
Figure 4. Figure 4: Agent-level resource extraction trajectories and pool dynamics in the King Common Pool Resource (KCPR) game for (a) GPT-4o and (b) o3. Out of 5 simulation seeds, we show runs where the system survives until the final round (the two best-performing models). (a) GPT-4o: Peasants extract consistently at moderate levels (similar values each round), keeping the pool near capacity (∼ $120). (b) o3: Peasant extra… view at source ↗
Figure 5
Figure 5. Figure 5: Task reasoning accuracy vs. survival time across four reasoning tasks: (a) sustainable extraction choice (KCPR), (b) misrepresentation detection (KCPR-M), (c) pool regeneration computation (KCPR), and (d) multi-round payoff maximization (KCPR). Each point corresponds to a model, plotting task accuracy (x-axis) against achieved survival time (y-axis). Higher reasoning accuracy does not consistently translat… view at source ↗
Figure 6
Figure 6. Figure 6: Role-label ablation results on KCPR comparing Role-labelled (King and Peasant) settings against Neutral [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Social Value Orientation (SVO) classification [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Social Value Orientation (SVO) classification [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Agent-level extraction dynamics in BCPR, showing dominant agents consistently extracting more than [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Agent-level extraction dynamics in KCPR, showing dominant agents extracting heavily in early rounds: [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Agent-level extraction dynamics in KCPR-M, showing dominant agents extracting heavily in the earliest [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Agent-level resource extraction trajectories and pool dynamics in the Boss Common Pool Resource [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Agent-level resource extraction trajectories and pool dynamics in the King Common Pool Resource with [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Task reasoning accuracy vs. survival time across two reasoning tasks: (a) sustainable extraction choice [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Task reasoning accuracy vs. survival time across three reasoning tasks: (a) sustainable extraction choice [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Task reasoning accuracy vs. survival time across four reasoning tasks: (a) sustainable extraction choice [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗
read the original abstract

Communities can sustainably manage shared resources (commons) through self-governance and cooperative norms, a central finding of Ostrom's theory of self-governance. However, real-world commons (e.g., fisheries, forests, and irrigation systems) are often governed under asymmetric power structures, where certain individuals or institutions possess disproportionate control over resource extraction and collective outcomes. As Large Language Models (LLMs) are increasingly explored as agents in synthetic governance simulations, understanding how LLM societies behave under asymmetric power structures is becoming increasingly important, yet existing evaluations largely ignore such asymmetries. We introduce Sovereignty over the Commons Simulation (SovSim), a generative multi-agent simulation framework that incorporates an agent with asymmetric power (boss or king) into a society of symmetric agents (workers or peasants), where all agents extract from a shared resource, collectively determining its sustainability over time. Across eleven state-of-the-art models, we find that introducing asymmetric power leads to severe breakdowns in cooperation and sustainability, with up to an 87.3% degradation in survival rate relative to symmetric settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SovSim, a generative multi-agent simulation in which LLM agents extract from a shared resource, comparing symmetric settings to asymmetric ones that include a 'boss' or 'king' agent with extra control over extraction and outcomes. Across eleven state-of-the-art models, it reports that asymmetric power produces severe breakdowns in cooperation, with up to an 87.3% degradation in survival rate relative to the symmetric baseline.

Significance. If the central comparison is valid, the result supplies concrete evidence that power asymmetries can destabilize cooperation even in LLM societies, extending Ostrom-style commons research into synthetic governance and flagging a practical risk for multi-agent LLM deployments.

major comments (2)
  1. [Abstract / Methods] Abstract and Methods: the headline 87.3% degradation is reported without any description of the number of runs per condition, variance across runs, statistical tests, or controls for prompt sensitivity; without these the quantitative claim cannot be evaluated.
  2. [Simulation Design] Simulation Design (SovSim): the skeptic concern is load-bearing—the manuscript must explicitly confirm that symmetric agents retain identical action spaces, state observations, prompt templates, and extraction mechanics in both conditions; any change to the base agents' decision environment would confound attribution of the survival-rate drop to power asymmetry alone.
minor comments (2)
  1. Define 'survival rate' precisely and state how collective outcomes are computed when the asymmetric agent intervenes.
  2. Add a table or figure showing per-model degradation percentages with error bars rather than a single aggregate figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight important aspects of experimental rigor and design clarity. We address each point below.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods: the headline 87.3% degradation is reported without any description of the number of runs per condition, variance across runs, statistical tests, or controls for prompt sensitivity; without these the quantitative claim cannot be evaluated.

    Authors: We agree that these details are required to allow proper evaluation of the quantitative results. The current manuscript reports the 87.3% figure in the abstract without accompanying statistical information in the Methods section. In the revised version we will expand the Methods section to specify the number of independent runs per condition, report means together with variance or standard deviation across runs, describe the statistical tests used to compare conditions, and document the controls applied for prompt sensitivity (identical base prompts with only the minimal modifications required for the asymmetric agent). revision: yes

  2. Referee: [Simulation Design] Simulation Design (SovSim): the skeptic concern is load-bearing—the manuscript must explicitly confirm that symmetric agents retain identical action spaces, state observations, prompt templates, and extraction mechanics in both conditions; any change to the base agents' decision environment would confound attribution of the survival-rate drop to power asymmetry alone.

    Authors: This concern is valid and central to the causal claim. In SovSim the worker/peasant agents are given identical action spaces, state observations, prompt templates, and extraction mechanics in the symmetric and asymmetric conditions; the only difference is the additional capabilities granted to the boss or king agent. We will add an explicit paragraph in the revised Methods section stating this equivalence to remove any ambiguity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical simulation results measured directly from runs

full rationale

The paper reports measured outcomes (survival rates, cooperation breakdowns) from running LLM agents in SovSim under symmetric vs. asymmetric power conditions. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains are present. The 87.3% degradation figure is a direct empirical comparison within the same simulation framework, not a reduction to inputs by construction. Self-citations (if any) are not load-bearing for the central claim. This matches the default expectation of an honest non-finding for simulation-based work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that the SovSim power-asymmetry rule set is a valid proxy for real governance asymmetries and that LLM next-token behavior under that rule set generalizes to the intended phenomenon. No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5712 in / 1271 out tokens · 18251 ms · 2026-06-29T12:29:41.656290+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Aanisha Bhattacharyya, Abhilekh Borah, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, and Bal- aji Krishnamurthy

    Out of one, many: Using language mod- els to simulate human samples.Political Analysis, 31(3):337–351. Aanisha Bhattacharyya, Abhilekh Borah, Yaman Kumar Singla, Rajiv Ratn Shah, Changyou Chen, and Bal- aji Krishnamurthy. 2026. Social agents: Collective intelligence improves LLM predictions. InProceed- ings of the International Conference on Learning Repr...

  2. [2]

    Revealed altruism.Econometrica, 76(1):31– 69. James C. Cox, Elinor Ostrom, and James M. Walker

  3. [3]

    Work- ing Paper 2011-06, Experimental Economics Center, Georgia State University

    Bosses and kings: Asymmetric power in paired common pool and public good games. Work- ing Paper 2011-06, Experimental Economics Center, Georgia State University. DeepSeek-AI. 2025. Deepseek-v3.2: Sparse mixture- of-experts scaling and training. https://www. deepseek.com/. Technical report. Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, ...

  4. [4]

    The Llama 3 Herd of Models

    Association for Computing Machinery. Gemma Team, Google DeepMind. 2025. Gemma 3: An open language model family. https://ai.google. dev/gemma. Technical report. Google DeepMind. 2026. Gemini 3.1: Multimodal frontier models. https://deepmind.google/ technologies/gemini/. Model release. 9 Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab...

  5. [5]

    GPT-4o System Card

    Deception abilities emerged in large language models.Proceedings of the National Academy of Sciences, 121(24). Garrett Hardin. 1968. The tragedy of the commons. Science, 162(3859):1243–1248. Sture Holm. 1979. A simple sequentially rejective mul- tiple test procedure.Scandinavian Journal of Statis- tics, 6(2):65–70. Tiancheng Hu, Joachim Baumann, Lorenzo L...

  6. [6]

    Joon Sung Park, Joseph C

    Covenants with and without a sword: Self- governance is possible.American Political Science Review, 86(2):404–417. Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative agents: Interactive simu- lacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interf...

  7. [7]

    Bosses and Kings

    measures LLM simulation fidelity against twenty behavioural human datasets and reports that even the strongest models achieve only modest alignment with empirical human distributions. Across all these settings, agents either (i) op- erate under equal action spaces with simultane- ous decisions, or (ii) face institutional mechanisms (sanctions, negotiation...