Recognition: no theorem link
Strategic Algorithmic Monoculture: Experimental Evidence from Coordination Games
Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3
The pith
Large language models exhibit high baseline similarity in actions and adjust it in response to coordination incentives, like humans but with less ability to sustain differences when rewarded.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects. LLMs exhibit high levels of baseline similarity (primary monoculture) and, like humans, they regulate it in response to coordination incentives (strategic monoculture). While LLMs coordinate extremely well on similar actions, they lag behind humans in sustaining heterogeneity when divergence is rewarded.
What carries the argument
The experimental design that isolates primary algorithmic monoculture (default similarity across choices) from strategic algorithmic monoculture (changes in similarity driven by payoffs for matching or differing) in coordination games.
Load-bearing premise
The coordination games cleanly separate baseline similarity from incentive-driven adjustments, and LLM behavior in the lab represents how deployed AI agents would act in multi-agent environments.
What would settle it
An observation that LLMs keep the same level of action similarity no matter whether payoffs reward matching choices or differing choices, or that they sustain as much action heterogeneity as human subjects when divergence pays off.
Figures
read the original abstract
AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish primary algorithmic monoculture -- baseline action similarity -- from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects. LLMs exhibit high levels of baseline similarity (primary monoculture) and, like humans, they regulate it in response to coordination incentives (strategic monoculture). While LLMs coordinate extremely well on similar actions, they lag behind humans in sustaining heterogeneity when divergence is rewarded.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs exhibit high baseline action similarity (primary algorithmic monoculture) in coordination games and, like humans, strategically adjust this similarity in response to incentives (strategic algorithmic monoculture). Experiments show LLMs coordinate well on similar actions but lag humans in sustaining heterogeneity when divergence is rewarded. The design is presented as cleanly separating the two forms of monoculture via baseline and incentive conditions applied to both human and LLM subjects.
Significance. If the separation holds and results replicate, the work is significant for multi-agent AI research: it provides empirical grounding for concerns about algorithmic monoculture while showing that LLMs can respond to coordination incentives. The human-LLM comparison is a strength, offering a template for studying strategic behavior in deployed agents. The experimental paradigm is simple and potentially replicable, which adds value if controls for prompt sensitivity are strengthened.
major comments (2)
- [Experimental Design] Experimental Design section: The baseline LLM condition uses a single prompting regime without reported ablations on temperature, few-shot examples, or prompt variants. This is load-bearing for the central claim because the separation of primary from strategic monoculture requires showing that high baseline similarity persists across neutral framings while only the incentive condition modulates it; without such checks, regulation could be an artifact of task description rather than genuine strategic response.
- [Results] Results section: The claim that LLMs lag humans in sustaining heterogeneity when divergence is rewarded is presented without reported effect sizes, confidence intervals, or per-game statistical comparisons between conditions. This weakens assessment of whether the difference is practically meaningful or driven by specific game parameters.
minor comments (3)
- [Abstract] Abstract: Could briefly note the number of LLM queries or human participants per condition to give readers immediate scale.
- [Figures] Figures: Ensure action-similarity plots include error bars or individual-subject traces and clearly distinguish baseline vs. incentive conditions in legends.
- [Introduction] Introduction: The terms 'primary' and 'strategic' algorithmic monoculture are introduced clearly but would benefit from an explicit one-sentence contrast with related concepts such as model collapse or output homogenization.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments, which help clarify the robustness of our experimental separation between primary and strategic algorithmic monoculture. We agree that additional prompting ablations and enhanced statistical reporting will strengthen the manuscript. Below we respond point-by-point to the major comments and indicate the revisions we will implement.
read point-by-point responses
-
Referee: [Experimental Design] Experimental Design section: The baseline LLM condition uses a single prompting regime without reported ablations on temperature, few-shot examples, or prompt variants. This is load-bearing for the central claim because the separation of primary from strategic monoculture requires showing that high baseline similarity persists across neutral framings while only the incentive condition modulates it; without such checks, regulation could be an artifact of task description rather than genuine strategic response.
Authors: We acknowledge that the original submission reported a single prompting regime for the baseline LLM condition. To directly address this concern, we will add a new appendix with systematic ablations: varying temperature from 0.0 to 1.0, including zero- and few-shot variants, and testing alternative neutral prompt phrasings that preserve the coordination task without incentive language. These checks will confirm that baseline similarity remains high and stable across neutral framings, while the incentive condition continues to produce the observed strategic adjustment. This revision will make the separation between primary and strategic monoculture more robust and less vulnerable to prompt-specific artifacts. revision: yes
-
Referee: [Results] Results section: The claim that LLMs lag humans in sustaining heterogeneity when divergence is rewarded is presented without reported effect sizes, confidence intervals, or per-game statistical comparisons between conditions. This weakens assessment of whether the difference is practically meaningful or driven by specific game parameters.
Authors: We agree that the results section would benefit from more granular statistical detail. In the revised manuscript we will add: (i) effect sizes (Cohen's d) for all human-LLM comparisons in the heterogeneity-rewarded conditions, (ii) 95% confidence intervals around mean similarity and coordination rates, and (iii) per-game pairwise statistical tests (t-tests or non-parametric equivalents with appropriate corrections) between baseline and incentive conditions for both subject types. These additions will allow readers to evaluate both statistical significance and practical magnitude of the observed human-LLM gap in sustaining beneficial heterogeneity. revision: yes
Circularity Check
No circularity: purely experimental design with no derivations or self-referential claims
full rationale
The paper is an empirical study that implements a coordination game experiment to measure baseline action similarity (primary monoculture) versus incentive-driven adjustment (strategic monoculture) in LLMs and humans. No equations, parameters, or derivations are present in the provided text or abstract. The central distinction is operationalized directly through the experimental conditions rather than derived from prior results or self-citations in a load-bearing way. Claims rest on observed behavior under different incentive structures, which are falsifiable via the experiment itself and do not reduce to definitional equivalence or fitted inputs renamed as predictions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in coordination game theory
Forward citations
Cited by 1 Pith paper
-
Expressing Social Emotions: Misalignment Between LLMs and Human Cultural Emotion Norms
Frontier LLMs over-express engaging emotions relative to disengaging ones and generate deterministic responses that fail to match the cultural and individual diversity observed in human social emotion expression.
Reference graph
Works this paper leans on
-
[1]
Have any data been collected for this study already? No, no data have been collected for this study yet
-
[2]
To do so, we will consider two distinct settings
What's the main question being asked or hypothesis being tested in this study? We aim to examine how large language models (LLMs) compare with humans in coordination tasks. To do so, we will consider two distinct settings. In the coordinated convergence setting, agents (i.e, humans or LLMs) will be incentivized to select the same action, whereas in the co...
-
[3]
We have a panel of 12 open-ended questions with multiple valid answers
Describe the key dependent variable(s) specifying how they will be measured. We have a panel of 12 open-ended questions with multiple valid answers. We will elicit responses to these questions from human participants and different LLMs by querying API. Our primary outcome is the agreement rate (Mehta et al., 1994), which measures the probability that two a...
1994
-
[4]
In the question-only condition, participants simply answer each question without receiving any coordination instruction
How many and which conditions will participants be assigned to? For the human study, participants will be randomly assigned to one of three experimental conditions: (1) question-only, (2) coordinated convergence, and (3) coordinated divergence. In the question-only condition, participants simply answer each question without receiving any coordination inst...
-
[5]
We will compute the agreement rate across agents for each prompt and for each experimental condition
Specify exactly which analyses you will conduct to examine the main question/hypothesis. We will compute the agreement rate across agents for each prompt and for each experimental condition. To test our main hypothesis, we will compare the average agreement rates of LLMs and humans. Specifically, we will conduct two-sample tests of means (i.e., Welch's t-t...
-
[6]
Rose" is an invalid answer to the question
Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations. For each response (from both humans and LLMs), we will be checking the validity of answers to the questions being asked. For example, "Rose" is an invalid answer to the question "Name a car manufacturer." These validity checks will be conducted i...
-
[7]
For each of the three treatments with humans, we aim to collect responses from 100 participants (total = 300)
How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. For each of the three treatments with humans, we aim to collect responses from 100 participants (total = 300). Participants will be recruited through Prolific following our pre-specified inc...
-
[8]
Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?) We shall be conducting the following secondary analyses on LLMs' behavior in coordinated convergence and divergence tasks:
-
[9]
another LLM,
The effect of belief: In contrast to the original instruction where LLMs are told they are participating with "another LLM," we will evaluate the difference in behavior when told they are playing against (i) "another person," and (ii) "a copy of yourself." We expect that the level of coordinated convergence will not change with (i), and will increase with (...
-
[10]
The effect of temperature: We shall sample responses from LLMs at the lowest temperature (0) and at higher than default temperatures (1 for open-source models and 2 for closed-source models). We expect that coordinated convergence will improve and coordinated divergence will decrease at lower temperatures, while coordinated convergence will decrease and co...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.