arxiv: 2511.21783 · v2 · submitted 2025-11-26 · ⚛️ physics.soc-ph · cs.GT

Recognition: 2 theorem links

· Lean Theorem

NetworkGames: Simulating Cooperation in Network Games with Personality-driven LLM Agents

Xuan Qiu

Authors on Pith no claims yet

Pith reviewed 2026-05-17 04:52 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.GT

keywords LLM agentsnetwork gamesiterated prisoner's dilemmacooperationMBTI personalitiesnetwork topologysocial simulationcollective welfare

0 comments

The pith

Cooperative outcomes in network games depend on connectivity and personality placement, not dyadic interactions alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper simulates the iterated prisoner's dilemma using LLM agents each assigned a distinct MBTI personality type. Agents interact on different network structures such as small-world and scale-free graphs. Results show that overall cooperation levels cannot be forecast from pairwise personality preferences. Network topology and the locations of specific personality types jointly determine collective welfare. Small-world networks lower cooperation while placing pro-social types in central hubs of scale-free networks raises it.

Core claim

In a population of LLM agents with MBTI personalities playing iterated prisoner's dilemma on networks, macro-level cooperative outcomes are co-determined by the network's connectivity and the spatial distribution of personalities rather than being predictable from the dyadic interaction matrix alone.

What carries the argument

Message-passing process on graphs where LLM policies govern actions between heterogeneous agents situated in structures such as small-world or scale-free networks.

If this is right

Small-world networks reduce collective cooperation relative to other topologies.
Placing pro-social personalities at hub positions in scale-free networks substantially raises overall cooperation.
Baseline dyadic interaction matrices between all 16 personality pairs fail to predict group outcomes.
The co-determination pattern holds across multiple LLM architectures and scaled network sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platform designers could raise group cooperation by engineering connection patterns or seeding certain user types in central positions.
Similar topology-plus-placement effects might appear in human networks if personality distributions align with network roles.
The approach could be tested on dynamic networks that evolve over time or on other repeated games.

Load-bearing premise

LLM agents given MBTI labels produce cooperation behavior representative enough of humans to support claims about real social networks.

What would settle it

Human subjects playing the same iterated prisoner's dilemma on identical small-world and scale-free networks with personality labels assigned would show collective cooperation levels that match dyadic predictions regardless of topology or placement.

Figures

Figures reproduced from arXiv: 2511.21783 by Xuan Qiu.

**Figure 1.** Figure 1: Heatmap of cooperation rates for all 16 × 16 personality pairings. Rows: actor personality; columns: opponent personality. INTJ INTP ENTJ ENTP INFJ INFP ENFJ ENFP ISTJ ISFJ ESTJ ESFJ ISTP ISFP ESTP ESFP Opponent Personality Type INTJ INTP ENTJ ENTP INFJ INFP ENFJ ENFP ISTJ ISFJ ESTJ ESFJ ISTP ISFP ESTP ESFP Player Personality Type 20.00 20.00 20.00 20.00 41.00 32.00 40.00 35.00 20.00 38.00 20.00 43.00 20.… view at source ↗

**Figure 2.** Figure 2: Heatmap of total payoffs for all 16 × 16 personality pairings. Rows: actor personality; columns: opponent personality. by both average cooperation rate and total payoff. The strong correlation between these rankings demonstrates that cooperative personalities achieve superior long-term outcomes. This finding challenges the theoretical Nash equilibrium of "always defect" and suggests that in personality-d… view at source ↗

**Figure 3.** Figure 3: Ranking of 16 Personalities by Average Cooperation [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Ranking of 16 Personalities by Total Payoff. Pro [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Temporal evolution of edge interaction types in the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Final network snapshot of the Small-World topology [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Temporal evolution of "Both Defect" edge rates [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Final network snapshot of the Pro-Social Dom [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

While Large Language Models (LLMs) have been extensively tested in dyadic game-theoretic scenarios, their collective behavior within complex network games remains surprisingly unexplored. To bridge this gap, we present NetworkGames, a framework connecting Generative Agents and Geometric Deep Learning. By formalizing social simulation as a message-passing process governed by LLM policies, we investigate how node heterogeneity (MBTI personalities) and network topology co-determine collective welfare. We instantiate a population of LLM agents, each endowed with a distinct personality from the MBTI taxonomy, and situate them in various network structures (e.g., small-world and scale-free). Through extensive simulations of the Iterated Prisoner's Dilemma, we first establish a baseline dyadic interaction matrix, revealing nuanced cooperative preferences between all 16 personality pairs. We then demonstrate that macro-level cooperative outcomes are not predictable from dyadic interactions alone; they are co-determined by the network's connectivity and the spatial distribution of personalities. For instance, we find that small-world networks are detrimental to cooperation, while strategically placing pro-social personalities in hub positions within scale-free networks can significantly promote cooperative behavior. We validate the robustness of these findings through extensive stress tests across multiple LLM architectures, scaled network sizes, varying random seeds, and comprehensive ablation studies. Our findings offer significant implications for designing healthier online social environments and forecasting collective behavior. We open-source our framework to facilitate research into the social physics of AI societies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NetworkGames, a framework for simulating Iterated Prisoner's Dilemma games on networks using LLM agents assigned MBTI personality types. It first constructs a 16x16 dyadic cooperation matrix from pairwise LLM interactions, then runs network simulations on small-world and scale-free topologies to show that macro-level cooperation rates are co-determined by network connectivity and the spatial placement of personality types rather than being fully predictable from the dyadic matrix alone. Examples include detrimental effects of small-world structure and benefits from placing pro-social types at hubs in scale-free networks. The work reports stress tests across LLM architectures, network sizes, random seeds, and ablations, and releases the code.

Significance. If the simulation results prove robust, the paper would advance the study of heterogeneous agent behavior in networked social dilemmas by demonstrating the joint role of topology and individual traits. The open-sourced framework and multi-LLM validation are concrete strengths that support reproducibility. However, the broader significance for real social networks and collective welfare forecasting is limited by the absence of calibration to human data, reducing direct applicability to the claimed implications for online environments.

major comments (2)

[Abstract and stress tests section] Abstract and the section describing stress tests and ablation studies: the claim of 'extensive stress tests' and robustness is load-bearing for the central assertion that macro outcomes are not predictable from the dyadic matrix alone, yet no quantitative effect sizes, cooperation-rate differences with standard deviations, error bars, or explicit controls for LLM stochasticity (e.g., temperature, number of independent runs, or seed averaging) are reported. This prevents assessment of whether observed topology and placement effects exceed simulation noise.
[Implications section] The section on implications and discussion: the extension to 'designing healthier online social environments' and 'forecasting collective behavior' rests on the assumption that MBTI-labeled LLM agents replicate human personality-driven cooperation patterns in iterated PD. No calibration against human experimental data on personality and network IPD is provided, which is load-bearing for any claim beyond pure simulation.

minor comments (2)

[Dyadic matrix construction] The description of how the 16x16 dyadic matrix entries are computed (e.g., exact prompt template, number of iterations per pair, and cooperation metric) should be expanded for reproducibility, as this baseline underpins all network comparisons.
[Figures] Figure captions for network visualizations and cooperation heatmaps could include explicit axis labels and legend definitions to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments point by point below, and we are committed to improving the paper accordingly.

read point-by-point responses

Referee: [Abstract and stress tests section] Abstract and the section describing stress tests and ablation studies: the claim of 'extensive stress tests' and robustness is load-bearing for the central assertion that macro outcomes are not predictable from the dyadic matrix alone, yet no quantitative effect sizes, cooperation-rate differences with standard deviations, error bars, or explicit controls for LLM stochasticity (e.g., temperature, number of independent runs, or seed averaging) are reported. This prevents assessment of whether observed topology and placement effects exceed simulation noise.

Authors: We acknowledge the validity of this observation. Although the manuscript describes performing stress tests across multiple LLM architectures, network sizes, random seeds, and ablation studies, it does not provide the quantitative details such as effect sizes, standard deviations, or error bars for the reported cooperation rates. We will revise the relevant sections to include these statistical measures, explicitly report the number of independent runs, temperature settings used for the LLMs, and any seed averaging procedures. This will strengthen the evidence that the topology and placement effects are robust and exceed simulation noise. revision: yes
Referee: [Implications section] The section on implications and discussion: the extension to 'designing healthier online social environments' and 'forecasting collective behavior' rests on the assumption that MBTI-labeled LLM agents replicate human personality-driven cooperation patterns in iterated PD. No calibration against human experimental data on personality and network IPD is provided, which is load-bearing for any claim beyond pure simulation.

Authors: We agree that direct applicability to human social networks requires calibration with human data, which is not provided in this work. Our study focuses on the behavior of LLM agents with assigned personality types in network games as a means to explore emergent collective behaviors in artificial agent systems. We will revise the implications and discussion section to more clearly delineate the scope as a simulation study of AI agents, note the absence of human calibration, and suggest that future work could involve such validation to extend implications to real-world online environments. This maintains the contribution while avoiding overstatement. revision: partial

Circularity Check

0 steps flagged

No circularity: macro outcomes shown via direct network simulations, not reductions to fitted inputs or self-citations

full rationale

The paper constructs a dyadic interaction matrix directly from LLM queries on MBTI personality pairs, then runs explicit network simulations of the Iterated Prisoner's Dilemma on small-world and scale-free graphs with varying personality placements. The central demonstration—that collective cooperation is co-determined by topology and spatial distribution rather than dyadic matrix alone—follows from comparing simulation outputs across network conditions, with no equations, parameter fitting, or predictions that collapse back to the inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing; results are presented as empirical simulation outputs validated by ablation and robustness checks. The derivation is therefore self-contained as a comparative simulation study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that MBTI labels produce stable, interpretable behavioral differences in LLM agents and that message-passing on networks faithfully captures social interaction dynamics; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption MBTI personality types induce consistent and distinguishable cooperative preferences in LLM agents across game iterations.
Invoked when constructing the dyadic interaction matrix and when attributing network-level outcomes to personality distribution.
domain assumption Network topology and agent placement can be varied independently of the underlying LLM policy.
Required to isolate the claimed co-determination effect.

pith-pipeline@v0.9.0 · 5549 in / 1319 out tokens · 29740 ms · 2026-05-17T04:52:31.675689+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

macro-level cooperative outcomes are not predictable from dyadic interactions alone; they are co-determined by the network's connectivity and the spatial distribution of personalities

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing
cs.CY 2026-03 unverdicted novelty 6.0

The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Arriaga, and Bistra Dilkina

Gopal Vinay Aher, Rosa I. Arriaga, and Bistra Dilkina. 2023. Using large lan- guage models to simulate multiple humans and replicate human subject studies. Scientific Reports13 (2023), 15201. doi:10.1038/s41598-023-41458-5

work page doi:10.1038/s41598-023-41458-5 2023
[2]

Elif Akata et al . 2023. Playing repeated games with Large Language Models. https://arxiv.org/abs/2305.16867

work page arXiv 2023
[3]

Albert-László Barabási. 2013. Network science.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences371, 1987 (2013), 20120375

work page 2013
[4]

Maciej Besta, Shriram Chandran, Robert Gerstenberger, Mathis Lindner, Marcin Chrapek, Sebastian Hermann Martschat, Taraneh Ghandi, Patrick Iff, Hubert Niewiadomski, Piotr Nyczyk, et al. 2025. Psychologically Enhanced AI Agents. arXiv:2509.04343 https://arxiv.org/abs/2509.04343

work page arXiv 2025
[5]

Brady, Joshua Conrad Jackson, Björn Lindström, and M.J

William J. Brady, Joshua Conrad Jackson, Björn Lindström, and M.J. Crockett

work page
[6]

doi:10.1016/j.tics.2023.06.008

Algorithm-mediated social learning in online social networks.Trends in Cognitive Sciences27, 10 (2023), 947–960. doi:10.1016/j.tics.2023.06.008

work page doi:10.1016/j.tics.2023.06.008 2023
[7]

Philip Brookins and Jason DeBacker. 2024. Playing games with GPT: What can we learn about a large language model from canonical strategic games?Economics Bulletin43, 4 (2024), 1–12. https://ideas.repec.org/a/ebl/ecbull/eb-23-00457.html

work page 2024
[8]

Michele Fontana, Francesco Pierri, and Luca M. Aiello. 2024. Nicer Than Humans: How do Large Language Models Behave in the Prisoner’s Dilemma? https: //arxiv.org/abs/2406.13605

work page arXiv 2024
[9]

Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, and Noah Goodman

work page
[10]

Advances in Neural Information Processing Systems36 (2023), 13518–13529

Understanding social reasoning in language models with language models. Advances in Neural Information Processing Systems36 (2023), 13518–13529

work page 2023
[11]

Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large language models empowered agent-based modeling and simulation: A survey and perspectives.Humanities and Social Sciences Communications11, 1 (2024), 1–24. doi:10.1057/s41599-024-03611-3

work page doi:10.1057/s41599-024-03611-3 2024
[12]

Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, and Yongfeng Zhang. 2024. Game-theoretic LLM: Agent Workflow for Negotiation Games. arXiv:2411.05990 [cs.AI] https://arxiv.org/abs/2411.05990

work page arXiv 2024
[13]

Guangyuan Jiang, Manjie Xu, Song-Chun Zhu, Wenjuan Han, Chi Zhang, and Yixin Zhu. 2023. Evaluating and inducing personality in pre-trained language models.Advances in Neural Information Processing Systems36 (2023), 10622– 10643

work page 2023
[14]

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, and Jad Kabbara

work page
[15]

arXiv:2305.02547 [cs.CL]

PersonaLLM: Investigating the ability of large language models to express personality traits. arXiv:2305.02547 [cs.CL]

work page arXiv
[16]

Yan Leng and Yuan Yuan. 2024. Do LLM Agents Exhibit Social Behavior? arXiv:2312.15198 [cs.AI] https://arxiv.org/abs/2312.15198

work page arXiv 2024
[17]

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI] https://arxiv.org/ abs/2303.17760

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

2021.Predicting MBTI personality type of Twitter users

Weiling Li. 2021.Predicting MBTI personality type of Twitter users. Ph. D. Disser- tation. Rutgers University-Camden Graduate School

work page 2021
[19]

Zhicheng Lin. 2024. Large language models as linguistic simulators and cognitive models in human research. arXiv:2402.04470 https://arxiv.org/abs/2402.04470

work page arXiv 2024
[20]

Emanuele Lorè and Arman Heydari. 2023. Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing. https://arxiv.org/abs/2309. 05898

work page 2023
[21]

Nathan Matias

J. Nathan Matias. 2019. Preventing harassment and increasing group participation through social norms in 2,190 online science discussions.Proceedings of the Na- tional Academy of Sciences116, 20 (2019), 9785–9789. doi:10.1073/pnas.1813486116

work page doi:10.1073/pnas.1813486116 2019
[23]

Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, and Zhongyu Wei. 2024. From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents. arXiv:2412.03563 [cs.CL] https://arxiv.org/abs/2412.03563

work page arXiv 2024
[24]

Nowak and Robert M

Martin A. Nowak and Robert M. May. 1992. Evolutionary games and spatial chaos.Nature359, 6398 (1992), 826–829. doi:10.1038/359826a0

work page doi:10.1038/359826a0 1992
[25]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC] https://arxiv.org/abs/2304.03442

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Santos and Jorge M

Francisco C. Santos and Jorge M. Pacheco. 2005. Scale-free networks provide a unifying framework for the emergence of cooperation.Physical Review Letters 95, 9 (2005), 098104. doi:10.1103/PhysRevLett.95.098104

work page doi:10.1103/physrevlett.95.098104 2005
[27]

Gregory Serapio-García, Mustafa Safdari, Clément Crepy, Luning Sun, Stephen Fitz, Marwa Abdulhai, Aleksandra Faust, and Maja Matarić. 2023. Personality traits in large language models. Research Square. doi:10.21203/rs.3.rs-3296728/v1 Preprint (Version 1)

work page doi:10.21203/rs.3.rs-3296728/v1 2023
[28]

György Szabó and Gábor Fáth. 2007. Evolutionary games on graphs.Physics Reports446, 4–6 (2007), 97–216. doi:10.1016/j.physrep.2007.04.004

work page doi:10.1016/j.physrep.2007.04.004 2007
[29]

Filippo Tonini and Lukas Galke. 2025. Super-additive Cooperation in Language Model Agents. arXiv:2508.15510 [cs.AI] https://arxiv.org/abs/2508.15510

work page arXiv 2025
[30]

Wei Wang, Haili Yang, Yuanfu Lu, Yuanhang Zou, Xu Zhang, Shuting Guo, and Leyu Lin. 2021. Influence Maximization in Multi-Relational Social Networks. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management(Virtual Event, Queensland, Australia)(CIKM ’21’). Association for Computing Machinery, New York, NY, USA, 4193–42...

work page doi:10.1145/3459637 2021
[31]

Zhao Wang, Sota Moriyama, Wei-Yao Wang, Briti Gangopadhyay, and Shingo Takamatsu. 2025. Talk Structurally, Act Hierarchically: A Collaborative Frame- work for LLM Multi-Agent Systems. arXiv:2502.11098 [cs.AI] https://arxiv.org/ abs/2502.11098

work page arXiv 2025
[32]

small- world

Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of “small- world” networks.Nature393, 6684 (1998), 440–442. doi:10.1038/30918

work page doi:10.1038/30918 1998
[33]

Simin Yu, Hao Wang, Ye Su, Ziyu Niu, Zhi Li, Jianjun Liu, and Jiwei Wang

work page
[34]

Journal of King Saud University - Computer and Information Sciences36, 2 (2024), 101923

Privacy-preserving recommendation system based on social relationships. Journal of King Saud University - Computer and Information Sciences36, 2 (2024), 101923. doi:10.1016/j.jksuci.2024.101923

work page doi:10.1016/j.jksuci.2024.101923 2024
[35]

Weiqi Zeng, Bo Wang, Dongming Zhao, Zongfeng Qu, Ruifang He, Yuexian Hou, and Qinghua Hu. 2025. Dynamic Personality in LLM Agents: A Framework for Evolutionary Modeling and Behavioral Analysis in the Prisoner’s Dilemma. In Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Austria, 23087...

work page doi:10.18653/v1/ 2025