pith. sign in

arxiv: 2601.03846 · v2 · submitted 2026-01-07 · 💻 cs.MA · cs.AI

When Numbers Start Talking: Implicit Numerical Coordination Among LLM-Based Agents

Pith reviewed 2026-05-16 16:36 UTC · model grok-4.3

classification 💻 cs.MA cs.AI
keywords LLM agentsmulti-agent systemscovert communicationgame theoryimplicit coordinationnumerical signalsstrategic interaction
0
0 comments X

The pith

LLM-based agents develop covert numerical signals that coordinate their actions in game settings even when explicit communication is restricted or absent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how LLM agents interact in four standard game-theoretic environments under explicit, limited, and no-communication conditions. It finds that agents embed measurable numerical patterns in their choices that function as implicit signals, shaping alignment and payoffs especially in repeated play and with varied agent personalities. A sympathetic reader would care because these hidden channels could influence real-world deployments of multiple AI systems that must reach joint decisions without direct messaging.

Core claim

In LLM-driven multi-agent systems, covert numerical signals arise in canonical game-theoretic settings. When explicit communication is restricted or removed, agents produce measurable numerical patterns in their actions that influence coordination and strategic outcomes, with the strength and effect of these signals depending on game type, one-shot versus repeated interaction, and heterogeneity of agent personalities.

What carries the argument

Covert numerical signals embedded in action choices that operate as non-linguistic coordination channels across communication regimes.

If this is right

  • Coordination success increases in repeated games because agents can learn to read and respond to each other's numerical patterns over time.
  • Heterogeneous agent personalities produce distinct signal styles that alter which strategies become stable.
  • Outcomes under restricted communication converge toward those under explicit communication when numerical signals are available.
  • Strategic payoffs shift measurably once agents begin to exploit the implicit channel in the tested settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of multi-agent AI systems may need to monitor numerical outputs for unintended coordination even when text channels are blocked.
  • The same mechanism could appear in non-game domains such as resource allocation or negotiation tasks where agents output quantities.
  • Testing whether signal emergence persists across model sizes or training regimes would clarify robustness.

Load-bearing premise

That the numerical patterns observed in agent outputs are reliable emergent coordination devices rather than side effects of prompting or model idiosyncrasies.

What would settle it

Re-running the same four games with identical prompts but replacing the LLMs with fixed random number generators or non-LLM rule-based agents; absence of comparable coordination effects would falsify the claim that the signals are LLM-specific.

Figures

Figures reproduced from arXiv: 2601.03846 by Alessandro Di Stefano, Alessio Buscemi, Daniele Proverbio, German Castignani, Pietro Li\`o, The-Anh Han.

Figure 1
Figure 1. Figure 1: Mean cooperation (pure Cooperation = 1, pure Defection = 0) by communication type (axes of the radar plot, see Tab. 1 for legend) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

LLMs-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or on interacting agents sharing explicit communication, less is known about how interacting agents coordinate implicitly. In particular, agents may engage in covert communication, relying on indirect or non-linguistic signals embedded in their actions rather than on explicit messages. This paper presents a game-theoretic study of covert communication in LLM-driven multi-agent systems. We analyse interactions across four canonical game-theoretic settings under different communication regimes, including explicit, restricted, and absent communication. Considering heterogeneous agent personalities and both one-shot and repeated games, we characterise when covert signals emerge and how they shape coordination and strategic outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript presents a game-theoretic analysis of covert numerical communication among LLM-based agents in multi-agent settings. It studies four canonical games under varying communication regimes (explicit, restricted, absent), incorporating heterogeneous agent personalities and both one-shot and repeated interactions. The central contribution is a characterization of the conditions under which covert signals emerge and their effects on coordination success and strategic payoffs.

Significance. If the empirical patterns hold under the described protocols, this work offers a useful framework for detecting and quantifying implicit coordination in LLM multi-agent systems. Grounding the analysis in canonical games with explicit metrics for signal emergence and outcome impact, plus reproducible experimental setups across personality prompts and game lengths, positions it as a concrete step toward understanding non-explicit communication mechanisms that could inform safer multi-agent AI design.

major comments (3)
  1. [§4.1] §4.1 (Signal detection methodology): The definition of covert numerical signals via action-sequence correlations does not include a control condition with agents prompted to output random numbers; without this, it is difficult to rule out that observed alignments are artifacts of shared LLM training distributions rather than strategic implicit coordination.
  2. [§5.3] §5.3 (Absent-communication regime results): The claim that numerical coordination emerges reliably in the no-communication condition rests on payoff improvements, but the manuscript reports no cross-model ablation (e.g., GPT-4 vs. Claude vs. Llama) or prompt-variation sweeps; this leaves open whether the characterization generalizes or is sensitive to model-specific quirks.
  3. [Table 3] Table 3 (repeated-game coordination rates): The reported 18–32 % coordination lift under restricted communication lacks statistical significance tests or confidence intervals across the 50 runs per condition; without these, the quantitative characterization of how signals shape outcomes remains under-supported for the central claim.
minor comments (3)
  1. [Introduction] The four canonical games are listed only in §3.2; moving an explicit enumeration (Prisoner’s Dilemma, Stag Hunt, Battle of the Sexes, Coordination game) to the abstract or introduction would improve immediate clarity.
  2. [Figure 2] Figure 2 caption does not define the y-axis units for “signal strength”; adding this would prevent reader misinterpretation of the plotted values.
  3. [Related Work] A few citations (e.g., [15] on LLM prompting) are from 2022; updating or supplementing with 2024 references on multi-agent LLM coordination would strengthen the related-work section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, indicating where revisions will be made to improve the manuscript.

read point-by-point responses
  1. Referee: [§4.1] §4.1 (Signal detection methodology): The definition of covert numerical signals via action-sequence correlations does not include a control condition with agents prompted to output random numbers; without this, it is difficult to rule out that observed alignments are artifacts of shared LLM training distributions rather than strategic implicit coordination.

    Authors: We agree that a control condition with agents prompted to generate random numbers independently would help isolate strategic coordination from potential training-distribution artifacts. In the revised manuscript we will add this control to the signal-detection methodology in §4.1, compute the corresponding correlation baselines, and report the comparative results. revision: yes

  2. Referee: [§5.3] §5.3 (Absent-communication regime results): The claim that numerical coordination emerges reliably in the no-communication condition rests on payoff improvements, but the manuscript reports no cross-model ablation (e.g., GPT-4 vs. Claude vs. Llama) or prompt-variation sweeps; this leaves open whether the characterization generalizes or is sensitive to model-specific quirks.

    Authors: We acknowledge that the current experiments are limited to a single model family. To address generalizability we will run the no-communication conditions on Claude-3 and include a modest prompt-variation sweep. A full cross-model ablation across every game and condition is computationally prohibitive, but the added results will be reported for the key absent-communication settings in §5.3. revision: partial

  3. Referee: [Table 3] Table 3 (repeated-game coordination rates): The reported 18–32 % coordination lift under restricted communication lacks statistical significance tests or confidence intervals across the 50 runs per condition; without these, the quantitative characterization of how signals shape outcomes remains under-supported for the central claim.

    Authors: We agree that statistical support is required. In the revised manuscript we will augment Table 3 (and all related quantitative results) with 95 % bootstrap confidence intervals and report p-values from paired statistical tests comparing coordination rates and payoffs across communication regimes. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical characterization

full rationale

The paper is an empirical study that characterises covert numerical signals in LLM agents through experimental protocols across four game-theoretic settings, using heterogeneous personality prompts, one-shot vs repeated conditions, and quantitative metrics for signal emergence and outcome impact. No derivation chain, equations, fitted parameters, or self-citations are invoked to reduce any prediction or result to its inputs by construction; the central claims follow directly from the described experimental setup and observed agent behaviours without self-referential definitions or load-bearing reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are stated or derivable from the provided text.

pith-pipeline@v0.9.0 · 5430 in / 1013 out tokens · 44467 ms · 2026-05-16T16:36:25.339260+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Abaku, T.E

    [Abakuet al., 2024 ] E.A. Abaku, T.E. Edunjobi, and A.C. Odimarha. Theoretical approaches to AI in supply chain optimization: Pathways to efficiency and resilience.Int. J. Sci. Tech. Res. Archive, 6(1):092–107,

  2. [2]

    [Aliet al., 2025 ] R. Ali, F. Caso, C. Irwin, and P. Li `o. Entropy-lens: The information signature of transformer computations.arXiv:2502.16570,

  3. [3]

    Trusting intelligent machines: Deepening trust within socio-technical systems.IEEE Tech

    [Andraset al., 2018 ] Peter Andras, Lukas Esterle, Michael Guckert, The Anh Han, Peter R Lewis, Kristina Milanovic, et al. Trusting intelligent machines: Deepening trust within socio-technical systems.IEEE Tech. Soc. Maga- zine, 37(4):76–83,

  4. [4]

    Bahtizin, V .Y

    [Bahtizinet al., 2019 ] A.R. Bahtizin, V .Y . Bortalevich, E.L. Loginov, and A.I. Soldatov. Using artificial intelligence to optimize intermodal networking of organizational agents within the digital economy. InJ. Phys: conference series, volume 1327, page 012042. IOP Publishing,

  5. [5]

    Balabanova, A

    [Balabanovaet al., 2025 ] N. Balabanova, A. Bashir, P. Bova, A. Buscemi, T. Cimpeanu, H.C. da Fonseca, et al. Media Table 4: Top 5 most frequent symbols in repeated games for covert communication types (D or H), with their frequency over total messages. C (D) C (H) Game1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th H5|83.4% 2|5.6% 3|1.9% 200|1.2% 100|1.1% 5A|12....

  6. [6]

    [Brooks, 2022] W. Brooks. Artificial bias: the ethical con- cerns of ai-driven dispute resolution in family matters.J. Disp. Resol., page 117,

  7. [7]

    Buscemi and D

    [Buscemi and Proverbio, 2024] A. Buscemi and D. Prover- bio. Large language models’ detection of political orienta- tion in newspapers.arxiv:2406.00018,

  8. [8]

    [Chaffer, 2025] T.J. Chaffer. Governing the agent-to- agent economy of trust via progressive decentralization. arXiv:2501.16606,

  9. [9]

    [Chenet al., 2023 ] X. Chen, D. Simchi-Levi, and Y . Wang. Utility fairness in contextual dynamic pricing with demand learning.arXiv:2311.16528,

  10. [10]

    Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,

    [Cloudet al., 2025 ] Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, and Owain Evans. Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,

  11. [11]

    [Elet al., 2025 ] B. El, D. Choudhury, P. Li`o, and C.K. Joshi. Towards mechanistic interpretability of graph transformers via attention graphs.arXiv:2502.12352,

  12. [12]

    Falc˜ao Filho

    [Falc˜ao Filho, 2024] H.A. Falc˜ao Filho. Making sense of ne- gotiation and AI: The blossoming of a new collaboration. Int. J. Commerce Contract., 8(1-2):44–64,

  13. [13]

    [Fanet al., 2024 ] C. Fan, Z. Tariq, N. Saadiq Bhuiyan, M.G. Yankoski, and T.W. Ford. Comp-husim: Persistent digital personality simulation platform. InProc. 32nd ACM Conf. User Mod., Adapt. Person., pages 98–101,

  14. [14]

    Cheap talk.Journal of Economic perspectives, 10(3):103–118,

    [Farrell and Rabin, 1996] Joseph Farrell and Matthew Ra- bin. Cheap talk.Journal of Economic perspectives, 10(3):103–118,

  15. [15]

    N icer than humans: How do large lan- guage models behave in the prisoner’s dilemma? arXiv preprint arXiv:2406.13605 ,

    [Fontanaet al., 2024 ] N. Fontana, F. Pierri, and L.M. Aiello. Nicer than humans: How do large language models behave in the prisoner’s dilemma?arXiv:2406.13605,

  16. [16]

    Game-theoretical approach to minimum entropy produc- tions in information thermodynamics.Physical Review Research, 6(1):013023,

    [Fujimoto and Ito, 2024] Yuma Fujimoto and Sosuke Ito. Game-theoretical approach to minimum entropy produc- tions in information thermodynamics.Physical Review Research, 6(1):013023,

  17. [17]

    Fulgu and V

    [Fulgu and Capraro, 2024] R.A. Fulgu and V . Capraro. Sur- prising gender biases in gpt.Comp. Human Beha. Rep., 16:100533,

  18. [18]

    arXiv preprint arXiv:2502.14143 , year=

    [Hammondet al., 2025 ] L. Hammond, A. Chan, J. Clifton, J. Hoelscher-Obermaier, A. Khan, E. McLean, et al. Multi- agent risks from advanced ai.arXiv:2502.14143,

  19. [19]

    Cooperation versus social wel- fare.Physics of Life Reviews, 56:33–60,

    [Hanet al., 2026 ] The Anh Han, Zhao Song, Theodor Cim- peanu, Manh Hong Duong, Marcus Krellner, Valerio Capraro, and Matjaz Perc. Cooperation versus social wel- fare.Physics of Life Reviews, 56:33–60,

  20. [20]

    [Han, 2022] T.A. Han. Emergent behaviours in multi-agent systems with evolutionary game theory.AI Commun., 35(4),

  21. [21]

    He and C

    [He and Zhang, 2024] Z. He and C. Zhang. Afspp: Agent framework for shaping preference and personality with large language models.arXiv:2401.02870,

  22. [22]

    [Heet al., 2025 ] L. He, G. Sun, D. Niyato, H. Du, F. Mei, J. Kang, et al. Generative ai for game theory-based mo- bile networking.IEEE Wireless Commun., 32(1):122–130,

  23. [23]

    Cooper- ative initiative through pre-play communication in sim- ple games.Journal of Behavioral and Experimental Eco- nomics, 80:108–120,

    [Hernandez-Lagos, 2019] Pablo Hernandez-Lagos. Cooper- ative initiative through pre-play communication in sim- ple games.Journal of Behavioral and Experimental Eco- nomics, 80:108–120,

  24. [24]

    [Liet al., 2023a ] J. Li, X. Cheng, W.X. Zhao, J.-Y . Nie, and J.-R. Wen. Halueval: A large-scale hallucina- tion evaluation benchmark for large language models. arXiv:2305.11747,

  25. [25]

    [Liet al., 2023b ] Y . Li, Y . Du, K. Zhou, J. Wang, W.X. Zhao, and J.-R. Wen. Evaluating object hallucination in large vision-language models.arXiv:2305.10355,

  26. [26]

    [Luet al., 2024 ] Y . Lu, A. Aleta, C. Du, L. Shi, and Y . Moreno. Llms and generative agent-based models for complex systems research.Phys. Life Rev.,

  27. [27]

    [Min, 2010] H. Min. Artificial intelligence in supply chain management: theory and applications.Int. J. Logistics: Res. Appl., 13(1):13–39,

  28. [28]

    Inducing personality in llm-based honeypot agents: Measuring the effect on human-like agenda gener- ation.arXiv:2503.19752,

    [Newshamet al., 2025 ] Lewis Newsham, Ryan Hyland, and Daniel Prince. Inducing personality in llm-based honeypot agents: Measuring the effect on human-like agenda gener- ation.arXiv:2503.19752,

  29. [29]

    Owen.Game theory

    [Owen, 2013] G. Owen.Game theory. Emerald Group Pub- lishing,

  30. [30]

    Patel and S

    [Patel and Trivedi, 2020] N. Patel and S. Trivedi. Leverag- ing predictive modeling, machine learning personaliza- tion, NLP customer support, and AI chatbots to increase customer loyalty.Empir. Quests Manage. Essenc., 3(3):1– 24,

  31. [31]

    Ramachandran, A

    [Ramachandranet al., 2022 ] D. Ramachandran, A. Keshari, and M. Kumar Tiwari. Contract price negotiation using an ai-based chatbot. InInt. Conf. Data An. Pub. Proc. Supply Chain, pages 303–310. Springer,

  32. [32]

    Princeton University Press,

    [Sigmund, 2010] Karl Sigmund.The calculus of selfishness. Princeton University Press,

  33. [33]

    Oxford University Press,

    [Skyrms, 2010] Brian Skyrms.Signals: Evolution, learning, and information. Oxford University Press,

  34. [34]

    Network reciprocity turns cheap talk into a force for cooperation.Journal of Theoretical Biology, 617:112303,

    [Songet al., 2026 ] Zhao Song, Chen Shen, and The Anh Han. Network reciprocity turns cheap talk into a force for cooperation.Journal of Theoretical Biology, 617:112303,

  35. [35]

    Stewart, A.A

    [Stewartet al., 2024 ] A.J. Stewart, A.A. Arechar, D.G. Rand, and J.B. Plotkin. The distorting effects of pro- ducer strategies: Why engagement does not reveal con- sumer preferences for misinformation.Proc. Natl. Acad. Sci., 121(10):e2315195121,

  36. [36]

    Stone, E

    [Stoneet al., 2020 ] M. Stone, E. Aravopoulou, Y . Ekinci, G. Evans, M. Hobbs, A. Labib, et al. Artificial intelli- gence (ai) in strategic marketing decision-making: a re- search agenda.The Bottom Line, 33(2):183–200,

  37. [37]

    Game theory meets large language models: A systematic survey

    [Sunet al., 2025 ] Haoran Sun, Yusen Wu, Yukun Cheng, and Xu Chu. Game theory meets large language models: A systematic survey. InIJCAI, pages 10669–10677,

  38. [38]

    Talaji ´c, I

    [Talaji´cet al., 2024 ] M. Talaji ´c, I.. Vranki ´c, and M. Peji ´c Bach. Strategic management of workforce diversity: An evolutionary game theory approach as a foundation for ai-driven systems.Information, 15(6):366,

  39. [39]

    Tessler, M.A

    [Tessleret al., 2024 ] M.H. Tessler, M.A. Bakker, D. Jarrett, H. Sheahan, M.J. Chadwick, et al. Ai can help humans find common ground in democratic deliberation.Science, 386(6719):eadq2852,

  40. [40]

    [Wanget al., 2015 ] Z. Wang, S. Kokubo, M. Jusup, and J. Tanimoto. Universal scaling for the dilemma strength in evolutionary games.Phys. Life Rev., 14:1–30,

  41. [41]

    [Wanget al., 2024 ] Z. Wang, R. Song, C. Shen, S. Yin, Z. Song, B. Battu, et al. Large language models overcome the machine penalty when acting fairly but not when acting selfishly or altruistically.arXiv:2410.03724,

  42. [42]

    Will systems of llm agents cooperate: An investigation into a social dilemma,

    [Williset al., 2025 ] R. Willis, Y . Du, J.Z. Leibo, and M. Luck. Will systems of llm agents cooperate: An inves- tigation into a social dilemma.arXiv:2501.16173, 2025