When Numbers Start Talking: Implicit Numerical Coordination Among LLM-Based Agents
Pith reviewed 2026-05-16 16:36 UTC · model grok-4.3
The pith
LLM-based agents develop covert numerical signals that coordinate their actions in game settings even when explicit communication is restricted or absent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In LLM-driven multi-agent systems, covert numerical signals arise in canonical game-theoretic settings. When explicit communication is restricted or removed, agents produce measurable numerical patterns in their actions that influence coordination and strategic outcomes, with the strength and effect of these signals depending on game type, one-shot versus repeated interaction, and heterogeneity of agent personalities.
What carries the argument
Covert numerical signals embedded in action choices that operate as non-linguistic coordination channels across communication regimes.
If this is right
- Coordination success increases in repeated games because agents can learn to read and respond to each other's numerical patterns over time.
- Heterogeneous agent personalities produce distinct signal styles that alter which strategies become stable.
- Outcomes under restricted communication converge toward those under explicit communication when numerical signals are available.
- Strategic payoffs shift measurably once agents begin to exploit the implicit channel in the tested settings.
Where Pith is reading between the lines
- Designers of multi-agent AI systems may need to monitor numerical outputs for unintended coordination even when text channels are blocked.
- The same mechanism could appear in non-game domains such as resource allocation or negotiation tasks where agents output quantities.
- Testing whether signal emergence persists across model sizes or training regimes would clarify robustness.
Load-bearing premise
That the numerical patterns observed in agent outputs are reliable emergent coordination devices rather than side effects of prompting or model idiosyncrasies.
What would settle it
Re-running the same four games with identical prompts but replacing the LLMs with fixed random number generators or non-LLM rule-based agents; absence of comparable coordination effects would falsify the claim that the signals are LLM-specific.
Figures
read the original abstract
LLMs-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or on interacting agents sharing explicit communication, less is known about how interacting agents coordinate implicitly. In particular, agents may engage in covert communication, relying on indirect or non-linguistic signals embedded in their actions rather than on explicit messages. This paper presents a game-theoretic study of covert communication in LLM-driven multi-agent systems. We analyse interactions across four canonical game-theoretic settings under different communication regimes, including explicit, restricted, and absent communication. Considering heterogeneous agent personalities and both one-shot and repeated games, we characterise when covert signals emerge and how they shape coordination and strategic outcomes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a game-theoretic analysis of covert numerical communication among LLM-based agents in multi-agent settings. It studies four canonical games under varying communication regimes (explicit, restricted, absent), incorporating heterogeneous agent personalities and both one-shot and repeated interactions. The central contribution is a characterization of the conditions under which covert signals emerge and their effects on coordination success and strategic payoffs.
Significance. If the empirical patterns hold under the described protocols, this work offers a useful framework for detecting and quantifying implicit coordination in LLM multi-agent systems. Grounding the analysis in canonical games with explicit metrics for signal emergence and outcome impact, plus reproducible experimental setups across personality prompts and game lengths, positions it as a concrete step toward understanding non-explicit communication mechanisms that could inform safer multi-agent AI design.
major comments (3)
- [§4.1] §4.1 (Signal detection methodology): The definition of covert numerical signals via action-sequence correlations does not include a control condition with agents prompted to output random numbers; without this, it is difficult to rule out that observed alignments are artifacts of shared LLM training distributions rather than strategic implicit coordination.
- [§5.3] §5.3 (Absent-communication regime results): The claim that numerical coordination emerges reliably in the no-communication condition rests on payoff improvements, but the manuscript reports no cross-model ablation (e.g., GPT-4 vs. Claude vs. Llama) or prompt-variation sweeps; this leaves open whether the characterization generalizes or is sensitive to model-specific quirks.
- [Table 3] Table 3 (repeated-game coordination rates): The reported 18–32 % coordination lift under restricted communication lacks statistical significance tests or confidence intervals across the 50 runs per condition; without these, the quantitative characterization of how signals shape outcomes remains under-supported for the central claim.
minor comments (3)
- [Introduction] The four canonical games are listed only in §3.2; moving an explicit enumeration (Prisoner’s Dilemma, Stag Hunt, Battle of the Sexes, Coordination game) to the abstract or introduction would improve immediate clarity.
- [Figure 2] Figure 2 caption does not define the y-axis units for “signal strength”; adding this would prevent reader misinterpretation of the plotted values.
- [Related Work] A few citations (e.g., [15] on LLM prompting) are from 2022; updating or supplementing with 2024 references on multi-agent LLM coordination would strengthen the related-work section.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below, indicating where revisions will be made to improve the manuscript.
read point-by-point responses
-
Referee: [§4.1] §4.1 (Signal detection methodology): The definition of covert numerical signals via action-sequence correlations does not include a control condition with agents prompted to output random numbers; without this, it is difficult to rule out that observed alignments are artifacts of shared LLM training distributions rather than strategic implicit coordination.
Authors: We agree that a control condition with agents prompted to generate random numbers independently would help isolate strategic coordination from potential training-distribution artifacts. In the revised manuscript we will add this control to the signal-detection methodology in §4.1, compute the corresponding correlation baselines, and report the comparative results. revision: yes
-
Referee: [§5.3] §5.3 (Absent-communication regime results): The claim that numerical coordination emerges reliably in the no-communication condition rests on payoff improvements, but the manuscript reports no cross-model ablation (e.g., GPT-4 vs. Claude vs. Llama) or prompt-variation sweeps; this leaves open whether the characterization generalizes or is sensitive to model-specific quirks.
Authors: We acknowledge that the current experiments are limited to a single model family. To address generalizability we will run the no-communication conditions on Claude-3 and include a modest prompt-variation sweep. A full cross-model ablation across every game and condition is computationally prohibitive, but the added results will be reported for the key absent-communication settings in §5.3. revision: partial
-
Referee: [Table 3] Table 3 (repeated-game coordination rates): The reported 18–32 % coordination lift under restricted communication lacks statistical significance tests or confidence intervals across the 50 runs per condition; without these, the quantitative characterization of how signals shape outcomes remains under-supported for the central claim.
Authors: We agree that statistical support is required. In the revised manuscript we will augment Table 3 (and all related quantitative results) with 95 % bootstrap confidence intervals and report p-values from paired statistical tests comparing coordination rates and payoffs across communication regimes. revision: yes
Circularity Check
No significant circularity in empirical characterization
full rationale
The paper is an empirical study that characterises covert numerical signals in LLM agents through experimental protocols across four game-theoretic settings, using heterogeneous personality prompts, one-shot vs repeated conditions, and quantitative metrics for signal emergence and outcome impact. No derivation chain, equations, fitted parameters, or self-citations are invoked to reduce any prediction or result to its inputs by construction; the central claims follow directly from the described experimental setup and observed agent behaviours without self-referential definitions or load-bearing reductions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[Abakuet al., 2024 ] E.A. Abaku, T.E. Edunjobi, and A.C. Odimarha. Theoretical approaches to AI in supply chain optimization: Pathways to efficiency and resilience.Int. J. Sci. Tech. Res. Archive, 6(1):092–107,
work page 2024
- [2]
-
[3]
Trusting intelligent machines: Deepening trust within socio-technical systems.IEEE Tech
[Andraset al., 2018 ] Peter Andras, Lukas Esterle, Michael Guckert, The Anh Han, Peter R Lewis, Kristina Milanovic, et al. Trusting intelligent machines: Deepening trust within socio-technical systems.IEEE Tech. Soc. Maga- zine, 37(4):76–83,
work page 2018
-
[4]
[Bahtizinet al., 2019 ] A.R. Bahtizin, V .Y . Bortalevich, E.L. Loginov, and A.I. Soldatov. Using artificial intelligence to optimize intermodal networking of organizational agents within the digital economy. InJ. Phys: conference series, volume 1327, page 012042. IOP Publishing,
work page 2019
-
[5]
[Balabanovaet al., 2025 ] N. Balabanova, A. Bashir, P. Bova, A. Buscemi, T. Cimpeanu, H.C. da Fonseca, et al. Media Table 4: Top 5 most frequent symbols in repeated games for covert communication types (D or H), with their frequency over total messages. C (D) C (H) Game1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th H5|83.4% 2|5.6% 3|1.9% 200|1.2% 100|1.1% 5A|12....
-
[6]
[Brooks, 2022] W. Brooks. Artificial bias: the ethical con- cerns of ai-driven dispute resolution in family matters.J. Disp. Resol., page 117,
work page 2022
-
[7]
[Buscemi and Proverbio, 2024] A. Buscemi and D. Prover- bio. Large language models’ detection of political orienta- tion in newspapers.arxiv:2406.00018,
- [8]
- [9]
-
[10]
[Cloudet al., 2025 ] Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, and Owain Evans. Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,
- [11]
-
[12]
[Falc˜ao Filho, 2024] H.A. Falc˜ao Filho. Making sense of ne- gotiation and AI: The blossoming of a new collaboration. Int. J. Commerce Contract., 8(1-2):44–64,
work page 2024
-
[13]
[Fanet al., 2024 ] C. Fan, Z. Tariq, N. Saadiq Bhuiyan, M.G. Yankoski, and T.W. Ford. Comp-husim: Persistent digital personality simulation platform. InProc. 32nd ACM Conf. User Mod., Adapt. Person., pages 98–101,
work page 2024
-
[14]
Cheap talk.Journal of Economic perspectives, 10(3):103–118,
[Farrell and Rabin, 1996] Joseph Farrell and Matthew Ra- bin. Cheap talk.Journal of Economic perspectives, 10(3):103–118,
work page 1996
-
[15]
[Fontanaet al., 2024 ] N. Fontana, F. Pierri, and L.M. Aiello. Nicer than humans: How do large language models behave in the prisoner’s dilemma?arXiv:2406.13605,
-
[16]
[Fujimoto and Ito, 2024] Yuma Fujimoto and Sosuke Ito. Game-theoretical approach to minimum entropy produc- tions in information thermodynamics.Physical Review Research, 6(1):013023,
work page 2024
-
[17]
[Fulgu and Capraro, 2024] R.A. Fulgu and V . Capraro. Sur- prising gender biases in gpt.Comp. Human Beha. Rep., 16:100533,
work page 2024
-
[18]
arXiv preprint arXiv:2502.14143 , year=
[Hammondet al., 2025 ] L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, et al. Multi- agent risks from advanced ai.arXiv:2502.14143,
-
[19]
Cooperation versus social wel- fare.Physics of Life Reviews, 56:33–60,
[Hanet al., 2026 ] The Anh Han, Zhao Song, Theodor Cim- peanu, Manh Hong Duong, Marcus Krellner, Valerio Capraro, and Matjaz Perc. Cooperation versus social wel- fare.Physics of Life Reviews, 56:33–60,
work page 2026
-
[20]
[Han, 2022] T.A. Han. Emergent behaviours in multi-agent systems with evolutionary game theory.AI Commun., 35(4),
work page 2022
- [21]
-
[22]
[Heet al., 2025 ] L. He, G. Sun, D. Niyato, H. Du, F. Mei, J. Kang, et al. Generative ai for game theory-based mo- bile networking.IEEE Wireless Commun., 32(1):122–130,
work page 2025
-
[23]
[Hernandez-Lagos, 2019] Pablo Hernandez-Lagos. Cooper- ative initiative through pre-play communication in sim- ple games.Journal of Behavioral and Experimental Eco- nomics, 80:108–120,
work page 2019
- [24]
-
[25]
[Liet al., 2023b ] Y . Li, Y . Du, K. Zhou, J. Wang, W.X. Zhao, and J.-R. Wen. Evaluating object hallucination in large vision-language models.arXiv:2305.10355,
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
[Luet al., 2024 ] Y . Lu, A. Aleta, C. Du, L. Shi, and Y . Moreno. Llms and generative agent-based models for complex systems research.Phys. Life Rev.,
work page 2024
-
[27]
[Min, 2010] H. Min. Artificial intelligence in supply chain management: theory and applications.Int. J. Logistics: Res. Appl., 13(1):13–39,
work page 2010
-
[28]
[Newshamet al., 2025 ] Lewis Newsham, Ryan Hyland, and Daniel Prince. Inducing personality in llm-based honeypot agents: Measuring the effect on human-like agenda gener- ation.arXiv:2503.19752,
- [29]
-
[30]
[Patel and Trivedi, 2020] N. Patel and S. Trivedi. Leverag- ing predictive modeling, machine learning personaliza- tion, NLP customer support, and AI chatbots to increase customer loyalty.Empir. Quests Manage. Essenc., 3(3):1– 24,
work page 2020
-
[31]
[Ramachandranet al., 2022 ] D. Ramachandran, A. Keshari, and M. Kumar Tiwari. Contract price negotiation using an ai-based chatbot. InInt. Conf. Data An. Pub. Proc. Supply Chain, pages 303–310. Springer,
work page 2022
-
[32]
[Sigmund, 2010] Karl Sigmund.The calculus of selfishness. Princeton University Press,
work page 2010
-
[33]
[Skyrms, 2010] Brian Skyrms.Signals: Evolution, learning, and information. Oxford University Press,
work page 2010
-
[34]
[Songet al., 2026 ] Zhao Song, Chen Shen, and The Anh Han. Network reciprocity turns cheap talk into a force for cooperation.Journal of Theoretical Biology, 617:112303,
work page 2026
-
[35]
[Stewartet al., 2024 ] A.J. Stewart, A.A. Arechar, D.G. Rand, and J.B. Plotkin. The distorting effects of pro- ducer strategies: Why engagement does not reveal con- sumer preferences for misinformation.Proc. Natl. Acad. Sci., 121(10):e2315195121,
work page 2024
- [36]
-
[37]
Game theory meets large language models: A systematic survey
[Sunet al., 2025 ] Haoran Sun, Yusen Wu, Yukun Cheng, and Xu Chu. Game theory meets large language models: A systematic survey. InIJCAI, pages 10669–10677,
work page 2025
-
[38]
[Talaji´cet al., 2024 ] M. Talaji ´c, I.. Vranki ´c, and M. Peji ´c Bach. Strategic management of workforce diversity: An evolutionary game theory approach as a foundation for ai-driven systems.Information, 15(6):366,
work page 2024
-
[39]
[Tessleret al., 2024 ] M.H. Tessler, M.A. Bakker, D. Jarrett, H. Sheahan, M.J. Chadwick, et al. Ai can help humans find common ground in democratic deliberation.Science, 386(6719):eadq2852,
work page 2024
-
[40]
[Wanget al., 2015 ] Z. Wang, S. Kokubo, M. Jusup, and J. Tanimoto. Universal scaling for the dilemma strength in evolutionary games.Phys. Life Rev., 14:1–30,
work page 2015
- [41]
-
[42]
Will systems of llm agents cooperate: An investigation into a social dilemma,
[Williset al., 2025 ] R. Willis, Y . Du, J.Z. Leibo, and M. Luck. Will systems of llm agents cooperate: An inves- tigation into a social dilemma.arXiv:2501.16173, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.