SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model

Kaiqi Huang; Lei Cui; Likun Yang; Pei Xu; Shiyue Cao; Xiaotang Chen

arxiv: 2605.07301 · v1 · submitted 2026-05-08 · 💻 cs.AI

SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model

Shiyue Cao , Pei Xu , Likun Yang , Lei Cui , Xiaotang Chen , Kaiqi Huang This is my paper

Pith reviewed 2026-05-11 01:16 UTC · model grok-4.3

classification 💻 cs.AI

keywords opponent modelingstructural causal modelLLM agentsmulti-agent systemsstrategic decision makingprediction accuracy

0 comments

The pith

Structured Opponent Modeling builds an explicit causal graph of opponents before the LLM makes predictions, separating construction from reasoning to improve accuracy in multi-agent settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Structured Opponent Modeling to help LLM-based agents predict what other agents will do during interactions. Current approaches combine modeling and prediction in one implicit step, which can falter when situations shift. SOM first uses a Structural Causal Model to map directed connections from opponents' observations to their actions, creating a clear graph. The LLM then follows those specific paths for its predictions rather than relying on general context. Experiments across several multi-agent benchmarks show this yields more accurate and stable results than existing methods.

Core claim

SOM is a two-stage framework where a Structural Causal Model captures directed links between opponents' observations and actions to produce an explicit opponent representation, after which the LLM performs structured reasoning along the derived pathways to improve prediction accuracy and stability over entangled implicit approaches.

What carries the argument

The Structural Causal Model, a graph that represents directed dependencies among variables, which explicitly links opponents' observations to actions and supplies clear reasoning paths for the LLM during prediction.

If this is right

Opponent predictions become more accurate because the LLM reasons along explicit causal pathways instead of entangled context.
Decision-making adapts better when interactions change because the model structure remains separate from the current prediction step.
Performance exceeds that of prior LLM reasoning baselines across diverse multi-agent test environments.
Strategic choices in games and other interactive settings gain reliability from the added structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of model construction from prediction may apply to other LLM agent tasks where causal relations among observed variables matter.
If the graph can be updated online, agents could maintain useful models even as opponents alter their strategies mid-interaction.
Tracing decisions back through the graph offers a potential way to explain why an agent chose one action over another.

Load-bearing premise

A Structural Causal Model can be built reliably from observed interactions to reflect the actual cause-and-effect links between what opponents see and what they do, and that the LLM can follow those links without adding fresh mistakes.

What would settle it

A controlled multi-agent benchmark run where predictions made by following the SCM pathways prove less accurate or less stable than predictions made by a baseline LLM using only implicit contextual reasoning.

Figures

Figures reproduced from arXiv: 2605.07301 by Kaiqi Huang, Lei Cui, Likun Yang, Pei Xu, Shiyue Cao, Xiaotang Chen.

**Figure 2.** Figure 2: Illustration of the opponent modeling pipeline of SOM. SOM operates in two explicit stages. First, it constructs the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Action prediction deviation and win rate over [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Win Rate against different opponents in Undercover game. The performance of SOM and baseline methods is evaluated [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 1.** Figure 1: A visualization of the causal graph generated by SOM for an opponent in the Guessing 0.8 of the [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗

**Figure 2.** Figure 2: A visualization of the causal graph generated by SOM for an opponent in the Survival Auction [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

read the original abstract

Accurately predicting opponents' behavior from interactions is a fundamental capability for large language model (LLM)-based agents in multi-agent and game-theoretic environments. Existing approaches often entangle opponent modeling with prediction, relying on implicit contextual reasoning and limiting adaptability in dynamic interactions. To this end, we propose Structured Opponent Modeling (SOM), a two-stage opponent modeling framework that distinctly separates opponent model construction and opponent prediction. At the construction stage, SOM employs a Structural Causal Model (SCM), a graph-based formalism for representing dependencies among variables, to capture directed links between opponents' observations and actions, yielding an explicit and structured opponent representation. At the prediction stage, the LLM performs structured reasoning along clear pathways derived from the SCM, improving both prediction accuracy and stability. Extensive experiments on diverse multi-agent benchmarks demonstrate that SOM consistently outperforms state-of-the-art LLM-based reasoning baselines, enabling more accurate and adaptable strategic decision-making in complex and dynamic multi-agent interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Structured Opponent Modeling (SOM), a two-stage framework for LLM-based agents in multi-agent environments. The construction stage builds a Structural Causal Model (SCM) to explicitly represent directed dependencies between opponents' observations and actions derived from interaction data. The prediction stage then has the LLM perform structured reasoning along the SCM pathways rather than relying on implicit contextual reasoning. The central claim is that this separation yields more accurate and stable opponent predictions, with extensive experiments on diverse multi-agent benchmarks showing consistent outperformance over state-of-the-art LLM-based reasoning baselines.

Significance. If the SCM construction reliably recovers true causal structure and the structured reasoning step demonstrably improves predictions, the framework would offer a more interpretable and potentially more robust alternative to purely implicit opponent modeling in dynamic multi-agent settings. The explicit separation of construction and prediction stages is a clear conceptual contribution that could influence future work on causal representations in LLM agents.

major comments (2)

[Construction stage description] The description of the construction stage does not specify the procedure for inferring the directed edges of the SCM from observational interaction data, nor does it include identifiability checks, intervention-based validation, or handling of confounders, simultaneous moves, or partial observability. Without such details, it is unclear whether the recovered graph accurately reflects causal influences rather than spurious correlations, which directly affects the validity of the subsequent claim that LLM reasoning along these pathways improves accuracy.
[Experiments section] The experimental results section asserts that SOM 'consistently outperforms state-of-the-art LLM-based reasoning baselines' on 'diverse multi-agent benchmarks' but provides no information on the specific benchmarks, baselines, metrics (e.g., prediction accuracy, adaptability measures), number of runs, statistical significance tests, or controls for confounds. This absence makes it impossible to verify whether the data support the central performance claim.

minor comments (1)

[Abstract] The abstract would benefit from a concise statement of how the SCM is constructed (e.g., learning algorithm or assumptions) to allow readers to immediately assess the framework's feasibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses

Referee: [Construction stage description] The description of the construction stage does not specify the procedure for inferring the directed edges of the SCM from observational interaction data, nor does it include identifiability checks, intervention-based validation, or handling of confounders, simultaneous moves, or partial observability. Without such details, it is unclear whether the recovered graph accurately reflects causal influences rather than spurious correlations, which directly affects the validity of the subsequent claim that LLM reasoning along these pathways improves accuracy.

Authors: We agree that the submitted manuscript presents the construction stage at a high level without specifying the edge-inference procedure or addressing identifiability, confounders, simultaneous moves, and partial observability. In the revision we will expand the relevant section to describe the exact method (conditional-independence tests combined with domain-specific temporal ordering on interaction logs), state the identifiability assumptions, discuss potential confounders and how they are mitigated via proxy variables, and explain how simultaneous moves are handled by imposing a canonical ordering derived from the environment. These additions will make the causal claims more transparent and verifiable. revision: yes
Referee: [Experiments section] The experimental results section asserts that SOM 'consistently outperforms state-of-the-art LLM-based reasoning baselines' on 'diverse multi-agent benchmarks' but provides no information on the specific benchmarks, baselines, metrics (e.g., prediction accuracy, adaptability measures), number of runs, statistical significance tests, or controls for confounds. This absence makes it impossible to verify whether the data support the central performance claim.

Authors: We acknowledge that the experimental section in the current version lacks the requested specifics. The revised manuscript will include: (i) explicit names and descriptions of all benchmarks, (ii) a complete list of baselines with citations, (iii) the precise metrics (prediction accuracy, stability, adaptability), (iv) the number of independent runs and random seeds, (v) statistical significance tests with p-values, and (vi) controls for prompt and model-version confounds. These details will allow readers to fully assess the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: framework validated by external benchmarks

full rationale

The paper introduces SOM as a two-stage separation of SCM-based opponent model construction from LLM structured reasoning along derived pathways. No equations, fitted parameters, or self-citations are shown that reduce the claimed performance gains to a definitional identity or input by construction. The central claims rest on empirical results from diverse multi-agent benchmarks, which are independent of the framework's internal definitions. This is the standard non-circular case for a new modeling proposal whose value is demonstrated externally rather than derived tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that opponent behaviors admit an explicit causal graph representation that LLMs can usefully follow; this is a domain assumption rather than a derived result.

axioms (1)

domain assumption Structural Causal Models can be constructed from interaction data to accurately represent directed dependencies between opponents' observations and actions.
Invoked in the construction stage of the framework without further justification in the abstract.

invented entities (1)

Structured Opponent Modeling (SOM) two-stage framework no independent evidence
purpose: To separate explicit causal model construction from LLM-based prediction for improved opponent behavior forecasting.
New method introduced in the paper; no independent evidence outside the claimed experiments is provided.

pith-pipeline@v0.9.0 · 5473 in / 1324 out tokens · 36271 ms · 2026-05-11T01:16:37.853348+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SOM employs a Structural Causal Model (SCM), a graph-based formalism for representing dependencies among variables, to capture directed links between opponents' observations and actions
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the LLM performs structured reasoning along clear pathways derived from the SCM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

[1]

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. 2024. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 17682–17690

work page 2024
[2]

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. 2024. How well can llms negotiate? negotiationarena platform and analysis.arXiv preprint arXiv:2402.05863(2024)

work page arXiv 2024
[3]

Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. 2019. On the utility of learning about humans for human-ai coordination.Advances in neural information processing systems32 (2019)

work page 2019
[4]

Pei Chen, Boran Han, and Shuai Zhang. 2024. CoMM: Collaborative multi-agent, multi-reasoning-path prompting for complex problem solving.arXiv preprint arXiv:2404.17729(2024)

work page arXiv 2024
[5]

Haobo Fu, Ye Tian, Hongxiang Yu, Weiming Liu, Shuang Wu, Jiechao Xiong, Ying Wen, Kai Li, Junliang Xing, Qiang Fu, et al. 2022. Greedy when sure and conservative when uncertain about the opponents. InInternational Conference on Machine Learning. PMLR, 6829–6848

work page 2022
[6]

Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, and Yizhou Wang. 2024. Richelieu: Self-evolving llm-based agents for ai diplomacy.Advances in Neural Information Processing Systems37 (2024), 123471–123497

work page 2024
[7]

Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, and Yutaka Matsuo. 2023. Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4.arXiv preprint arXiv:2309.17277(2023)

work page arXiv 2023
[8]

2023.Large language models as simulated economic agents: What can we learn from homo silicus?Technical Report

John J Horton. 2023.Large language models as simulated economic agents: What can we learn from homo silicus?Technical Report. National Bureau of Economic Research

work page 2023
[9]

Yudong Hu, Congying Han, Haoran Li, and Tiande Guo. 2023. Modeling opponent learning in multiagent repeated games.Applied Intelligence53, 13 (2023), 17194– 17210

work page 2023
[10]

Shima Imani, Liang Du, and Harsh Shrivastava. 2023. Mathprompter: Mathe- matical reasoning using large language models.arXiv preprint arXiv:2303.05398 (2023)

work page arXiv 2023
[11]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation.ACM computing surveys55, 12 (2023), 1–38

work page 2023
[12]

Yuheng Jing, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, and Jian Cheng. 2026. An Open-Ended Learning Framework for Opponent Modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 23222–23230

work page 2026
[13]

Dong Ki Kim, Miao Liu, Matthew D Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, and Jonathan How. 2021. A policy gradient algorithm for learning to learn in multiagent reinforcement learning. InInternational Conference on Machine Learning. PMLR, 5541–5550

work page 2021
[14]

Alain Ledoux. 1981. Concours résultats complets.Les victimes se sont plu à jouer le14 (1981), 10–11

work page 1981
[15]

Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. 2024. Econagent: large language model-empowered agents for simulating macroeconomic activities. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15523–15536

work page 2024
[16]

Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, and Shu-Tao Xia. 2025. Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language.arXiv preprint arXiv:2505.16114(2025). https://arxiv.org/abs/2505.16114

work page arXiv 2025
[17]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2023. Lost in the middle: How language models use long contexts.arXiv preprint arXiv:2307.03172(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Christopher Lu, Timon Willi, Christian A Schroeder De Witt, and Jakob Foerster

work page
[19]

InInternational Conference on Machine Learning

Model-free opponent shaping. InInternational Conference on Machine Learning. PMLR, 14398–14411

work page
[20]

Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, and Furu Wei. 2024. ALYMPICS: LLM Agents Meet Game Theory–Exploring Strategic Decision-Making with AI Agents.arXiv preprint arXiv:2311.03220 (2024)

work page arXiv 2024
[21]

Meta Fundamental AI Research Diplomacy Team (FAIR), Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, et al . 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning.Science378, 6624 (2022), 1067–1074

work page 2022
[22]

Samer Nashed and Shlomo Zilberstein. 2022. A survey of opponent modeling in adversarial domains.Journal of Artificial Intelligence Research73 (2022), 277–327

work page 2022
[23]

Georgios Papoudakis, Filippos Christianos, and Stefano Albrecht. 2021. Agent modelling under partial observability for deep reinforcement learning.Advances in Neural Information Processing Systems34 (2021), 19210–19222

work page 2021
[24]

J. Pearl. 2000.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, NY, USA

work page 2000
[25]

Sumedh Rasal. 2024. Llm harmony: Multi-agent communication for problem solving.arXiv preprint arXiv:2401.01312(2024)

work page arXiv 2024
[26]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems36 (2023), 8634–8652

work page 2023
[27]

Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. 2023. Head- to-tail: How knowledgeable are large language models (llm)? AKA will llms replace knowledge graphs?arXiv preprint arXiv:2308.10168(2023)

work page arXiv 2023
[28]

William Vickrey. 1961. Counterspeculation, auctions, and competitive sealed tenders.The Journal of finance16, 1 (1961), 8–37

work page 1961
[29]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-Consistency Improves Chain of Thought Reasoning in Language Models.arXiv preprint arXiv:2203.11171(2022). https://arxiv.org/abs/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2022
[30]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[31]

Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. 2024. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330(2024)

work page arXiv 2024
[32]

Zhe Wu, Kai Li, Hang Xu, Yifan Zang, Bo An, and Junliang Xing. 2022. L2E: Learning to exploit your opponent. In2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8

work page 2022
[33]

Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, and Jiashi Feng. 2023. Magic: Investigation of large language model powered multi-agent in cognition, adaptability, rationality and collaboration. arXiv preprint arXiv:2311.08562(2023)

work page arXiv 2023
[34]

Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. 2023. Exploring large language models for communication games: An empirical study on werewolf.arXiv preprint arXiv:2309.04658(2023)

work page arXiv 2023
[35]

Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. 2023. Language agents with reinforcement learning for strategic play in the werewolf game.arXiv preprint arXiv:2310.18940(2023)

work page arXiv 2023
[36]

Likun Yang, Pei Xu, Shiyue Cao, Yongjian Ren, Xiaotang Chen, and Kaiqi Huang

work page
[37]

InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems

Uncertainty-Aware Opponent Modeling for Deep Reinforcement Learning. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems. 2217–2225

work page
[38]

Yaodong Yang and Jun Wang. 2020. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583 (2020)

work page arXiv 2020
[39]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems36 (2023), 11809–11822

work page 2023
[40]

Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, and Zongqing Lu. 2022. Model-based opponent modeling.Advances in Neural Information Processing Systems35 (2022), 28208–28221

work page 2022
[41]

XiaoPeng Yu, Wanpeng Zhang, and Zongqing Lu. 2025. LLM-Based Explicit Models of Opponents for Multi-Agent Games. InProceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 892–911

work page 2025
[42]

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. 2024. Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17591–17599

work page 2024
[43]

Yilun Zhang, Yujun Cai, Yifei Li, and Yaodong Yang. 2024. On the Diagram of Thought.arXiv preprint arXiv:2409.10038(2024). https://arxiv.org/abs/2409.10038

work page internal anchor Pith review arXiv 2024
[44]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, and Furu Wei. 2024. LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models.arXiv preprint arXiv:2404.01230(2024)

work page arXiv 2024
[45]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, and Furu Wei. 2024. K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning.arXiv preprint arXiv:2402.01521(2024)

work page arXiv 2024
[46]

Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, and Katja Hofmann

work page
[47]

other_players_previous_choices -> lowest_choice -> action; historical_choices -> action

Deep interactive bayesian reinforcement learning via meta-learning.arXiv preprint arXiv:2101.03864(2021). SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model Appendix A Environment This section provides a detailed description of the experimental environments used for our evaluation, in- cluding the game rules, specific confi...

work page arXiv 2021

[1] [1]

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. 2024. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 17682–17690

work page 2024

[2] [2]

Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, and James Zou. 2024. How well can llms negotiate? negotiationarena platform and analysis.arXiv preprint arXiv:2402.05863(2024)

work page arXiv 2024

[3] [3]

Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. 2019. On the utility of learning about humans for human-ai coordination.Advances in neural information processing systems32 (2019)

work page 2019

[4] [4]

Pei Chen, Boran Han, and Shuai Zhang. 2024. CoMM: Collaborative multi-agent, multi-reasoning-path prompting for complex problem solving.arXiv preprint arXiv:2404.17729(2024)

work page arXiv 2024

[5] [5]

Haobo Fu, Ye Tian, Hongxiang Yu, Weiming Liu, Shuang Wu, Jiechao Xiong, Ying Wen, Kai Li, Junliang Xing, Qiang Fu, et al. 2022. Greedy when sure and conservative when uncertain about the opponents. InInternational Conference on Machine Learning. PMLR, 6829–6848

work page 2022

[6] [6]

Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, and Yizhou Wang. 2024. Richelieu: Self-evolving llm-based agents for ai diplomacy.Advances in Neural Information Processing Systems37 (2024), 123471–123497

work page 2024

[7] [7]

Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, and Yutaka Matsuo. 2023. Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4.arXiv preprint arXiv:2309.17277(2023)

work page arXiv 2023

[8] [8]

2023.Large language models as simulated economic agents: What can we learn from homo silicus?Technical Report

John J Horton. 2023.Large language models as simulated economic agents: What can we learn from homo silicus?Technical Report. National Bureau of Economic Research

work page 2023

[9] [9]

Yudong Hu, Congying Han, Haoran Li, and Tiande Guo. 2023. Modeling opponent learning in multiagent repeated games.Applied Intelligence53, 13 (2023), 17194– 17210

work page 2023

[10] [10]

Shima Imani, Liang Du, and Harsh Shrivastava. 2023. Mathprompter: Mathe- matical reasoning using large language models.arXiv preprint arXiv:2303.05398 (2023)

work page arXiv 2023

[11] [11]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation.ACM computing surveys55, 12 (2023), 1–38

work page 2023

[12] [12]

Yuheng Jing, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, and Jian Cheng. 2026. An Open-Ended Learning Framework for Opponent Modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 23222–23230

work page 2026

[13] [13]

Dong Ki Kim, Miao Liu, Matthew D Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, and Jonathan How. 2021. A policy gradient algorithm for learning to learn in multiagent reinforcement learning. InInternational Conference on Machine Learning. PMLR, 5541–5550

work page 2021

[14] [14]

Alain Ledoux. 1981. Concours résultats complets.Les victimes se sont plu à jouer le14 (1981), 10–11

work page 1981

[15] [15]

Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. 2024. Econagent: large language model-empowered agents for simulating macroeconomic activities. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15523–15536

work page 2024

[16] [16]

Naiqi Li, Peiyuan Liu, Zheng Liu, Tao Dai, Yong Jiang, and Shu-Tao Xia. 2025. Logic-of-Thought: Empowering Large Language Models with Logic Programs for Solving Puzzles in Natural Language.arXiv preprint arXiv:2505.16114(2025). https://arxiv.org/abs/2505.16114

work page arXiv 2025

[17] [17]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2023. Lost in the middle: How language models use long contexts.arXiv preprint arXiv:2307.03172(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Christopher Lu, Timon Willi, Christian A Schroeder De Witt, and Jakob Foerster

work page

[19] [19]

InInternational Conference on Machine Learning

Model-free opponent shaping. InInternational Conference on Machine Learning. PMLR, 14398–14411

work page

[20] [20]

Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, and Furu Wei. 2024. ALYMPICS: LLM Agents Meet Game Theory–Exploring Strategic Decision-Making with AI Agents.arXiv preprint arXiv:2311.03220 (2024)

work page arXiv 2024

[21] [21]

Meta Fundamental AI Research Diplomacy Team (FAIR), Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, et al . 2022. Human-level play in the game of Diplomacy by combining language models with strategic reasoning.Science378, 6624 (2022), 1067–1074

work page 2022

[22] [22]

Samer Nashed and Shlomo Zilberstein. 2022. A survey of opponent modeling in adversarial domains.Journal of Artificial Intelligence Research73 (2022), 277–327

work page 2022

[23] [23]

Georgios Papoudakis, Filippos Christianos, and Stefano Albrecht. 2021. Agent modelling under partial observability for deep reinforcement learning.Advances in Neural Information Processing Systems34 (2021), 19210–19222

work page 2021

[24] [24]

J. Pearl. 2000.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, NY, USA

work page 2000

[25] [25]

Sumedh Rasal. 2024. Llm harmony: Multi-agent communication for problem solving.arXiv preprint arXiv:2401.01312(2024)

work page arXiv 2024

[26] [26]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems36 (2023), 8634–8652

work page 2023

[27] [27]

Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. 2023. Head- to-tail: How knowledgeable are large language models (llm)? AKA will llms replace knowledge graphs?arXiv preprint arXiv:2308.10168(2023)

work page arXiv 2023

[28] [28]

William Vickrey. 1961. Counterspeculation, auctions, and competitive sealed tenders.The Journal of finance16, 1 (1961), 8–37

work page 1961

[29] [29]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-Consistency Improves Chain of Thought Reasoning in Language Models.arXiv preprint arXiv:2203.11171(2022). https://arxiv.org/abs/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2022

[30] [30]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022

[31] [31]

Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. 2024. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330(2024)

work page arXiv 2024

[32] [32]

Zhe Wu, Kai Li, Hang Xu, Yifan Zang, Bo An, and Junliang Xing. 2022. L2E: Learning to exploit your opponent. In2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8

work page 2022

[33] [33]

Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, and Jiashi Feng. 2023. Magic: Investigation of large language model powered multi-agent in cognition, adaptability, rationality and collaboration. arXiv preprint arXiv:2311.08562(2023)

work page arXiv 2023

[34] [34]

Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. 2023. Exploring large language models for communication games: An empirical study on werewolf.arXiv preprint arXiv:2309.04658(2023)

work page arXiv 2023

[35] [35]

Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. 2023. Language agents with reinforcement learning for strategic play in the werewolf game.arXiv preprint arXiv:2310.18940(2023)

work page arXiv 2023

[36] [36]

Likun Yang, Pei Xu, Shiyue Cao, Yongjian Ren, Xiaotang Chen, and Kaiqi Huang

work page

[37] [37]

InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems

Uncertainty-Aware Opponent Modeling for Deep Reinforcement Learning. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems. 2217–2225

work page

[38] [38]

Yaodong Yang and Jun Wang. 2020. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583 (2020)

work page arXiv 2020

[39] [39]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems36 (2023), 11809–11822

work page 2023

[40] [40]

Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, and Zongqing Lu. 2022. Model-based opponent modeling.Advances in Neural Information Processing Systems35 (2022), 28208–28221

work page 2022

[41] [41]

XiaoPeng Yu, Wanpeng Zhang, and Zongqing Lu. 2025. LLM-Based Explicit Models of Opponents for Multi-Agent Games. InProceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 892–911

work page 2025

[42] [42]

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. 2024. Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17591–17599

work page 2024

[43] [43]

Yilun Zhang, Yujun Cai, Yifei Li, and Yaodong Yang. 2024. On the Diagram of Thought.arXiv preprint arXiv:2409.10038(2024). https://arxiv.org/abs/2409.10038

work page internal anchor Pith review arXiv 2024

[44] [44]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, and Furu Wei. 2024. LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models.arXiv preprint arXiv:2404.01230(2024)

work page arXiv 2024

[45] [45]

Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, and Furu Wei. 2024. K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning.arXiv preprint arXiv:2402.01521(2024)

work page arXiv 2024

[46] [46]

Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, and Katja Hofmann

work page

[47] [47]

other_players_previous_choices -> lowest_choice -> action; historical_choices -> action

Deep interactive bayesian reinforcement learning via meta-learning.arXiv preprint arXiv:2101.03864(2021). SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model Appendix A Environment This section provides a detailed description of the experimental environments used for our evaluation, in- cluding the game rules, specific confi...

work page arXiv 2021