Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

Davi Bastos Costa; Renato Vicente

arxiv: 2509.23023 · v3 · pith:43ALCLR2new · submitted 2025-09-27 · 💻 cs.AI

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

Davi Bastos Costa , Renato Vicente This is my paper

Pith reviewed 2026-05-18 13:21 UTC · model grok-4.3

classification 💻 cs.AI

keywords Mini-Mafialarge language modelssocial deductionmulti-agent interactiondeception detectionanalytical predictionwin-rate modelbenchmark

0 comments

The pith

An analytical formula with three parameters per model predicts mafia win rates across all language model combinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mini-Mafia, a four-player simplification of the social deduction game that reduces multi-turn play to one critical exchange. It shows that the mafia win probability follows the formula logit(p) equals v times open parenthesis m minus d close parenthesis, where m, d, and v are model-specific measures of deception, disclosure, and detection. Bayesian inference from gameplay data estimates these three numbers per model, allowing every possible three-model matchup to be predicted without direct testing. A sympathetic reader would care because the result replaces exhaustive empirical trials with a compact theoretical account of how agent capabilities shape group outcomes in interactive settings.

Core claim

In the Mini-Mafia setting the mafia win-rate p is given by the analytical expression logit(p) = v × (m - d), where the parameters m, d, and v quantify the mafioso's deception, the detective's disclosure, and the villager's detection. Bayesian inference from observed gameplay yields these parameters for each model, allowing accurate prediction of all tournament outcomes using only 3I parameters for I models and yielding a 76.6 percent reduction in Brier score relative to a random baseline in cross-validation.

What carries the argument

The logit-linear formula logit(p) = v × (m - d) that collapses the multi-turn game to the outcome probability of one critical exchange among mafioso, detective, and villager.

If this is right

For any collection of I models, all I cubed possible three-player tournaments can be predicted from only 3I parameters.
Models can be ranked by role-specific strengths, such as Grok 3 Mini as the strongest detector and Claude Sonnet 4 as near-random in detection.
The Mini-Mafia Benchmark supplies a data-efficient method to evaluate language-model interactive capabilities without exhaustive matchup simulation.
The analytical description supports principled comparisons that isolate deception, disclosure, and detection contributions to collective results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the single-exchange reduction holds, analogous critical-point analyses could be applied to other multi-agent tasks such as negotiation or debate.
The three parameters might guide selection of model teams for tasks that require complementary social skills.
Extending the same fitting procedure to larger player counts or altered game rules would test how far the low-dimensional description remains valid.
The framework implies that many collective outcomes in language-model social games may be governed by simple additive effects rather than higher-order emergent interactions.

Load-bearing premise

The full dynamics of the game reduce without loss of accuracy to a single critical exchange whose outcome probability is exactly captured by the linear form in logit space.

What would settle it

A new collection of repeated games among previously unseen model triples in which the observed mafia win frequencies deviate substantially from the probabilities predicted by the fitted parameters m, d, and v would falsify the claim.

Figures

Figures reproduced from arXiv: 2509.23023 by Davi Bastos Costa, Renato Vicente.

**Figure 2.** Figure 2: Detect performance: (a) aggregated scores across all backgrounds, Eq. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Disclose performance: (a) aggregated scores across all backgrounds, Eq. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Complete mafioso performance results across all detective and villager backgrounds. [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Complete villager performance results across all mafioso and detective backgrounds. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Complete detective performance results across all mafioso and villager backgrounds. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of methodological approaches for deceive, detect and disclose capability [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

read the original abstract

Large language models are increasingly deployed in multi-agent settings whose outcomes hinge on social intelligence, motivating evaluations of their interactive capabilities; yet existing studies remain overwhelmingly empirical, leaving us without a theoretical understanding of how agent interactions determine collective outcomes. To address this, we introduce \textit{Mini-Mafia}, a four-player simplification of the social deduction game Mafia in which a fixed night phase reduces the game to a single critical exchange among a mafioso, a detective, and a villager. In this setting, we show that the mafia win-rate $p$ is predicted by the analytical formula $\text{logit}(p) = v \times (m - d)$, where $m$, $d$, and $v$ represent the mafioso's deception, the detective's disclosure, and the villager's detection capabilities. We turn this analytical framework into the \textit{Mini-Mafia Benchmark}, where Bayesian inference over gameplay data yields per-model estimates of the intrinsic parameters $m$, $d$, and $v$. For $I$ models, only $3I$ parameters suffice to predict the outcomes of all $I^3$ tournament combinations; and in 5-fold cross-validation the formula achieves a $76.6\%$ Brier-score reduction over a random baseline. The benchmark also reveals counterintuitive results: Grok 3 Mini is the strongest detector and GPT-5 Mini the strongest discloser, both ahead of DeepSeek V3.1, Claude Opus 4, and Claude Sonnet 4; while Claude Sonnet 4 is the weakest detector, near random chance. Together, these results show that Mini-Mafia, a simple but nontrivial multi-agent system, admits an analytical description and serves as a principled benchmark for language model interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces LLM social deduction to a three-parameter model per model that predicts tournament outcomes with solid cross-validation gains, but the exact functional form needs more justification to hold up.

read the letter

The main thing here is that the authors have a compact analytical model for how LLMs handle deception and detection in a stripped-down Mafia game, using just three fitted parameters per model to cover all combinations, and the 5-fold cross-validation shows a 76.6% Brier improvement over baseline. That reduction from I^3 outcomes to 3I parameters is the core move, along with the specific rankings like Grok 3 Mini leading on detection and Claude Sonnet 4 near chance.

Referee Report

2 major / 2 minor

Summary. The paper introduces Mini-Mafia, a four-player simplification of the Mafia game reduced to a single critical exchange among a mafioso, detective, and villager. It claims that the mafia win-rate p follows the analytical formula logit(p) = v × (m - d), where m, d, and v are per-model parameters for deception, disclosure, and detection. Bayesian inference on gameplay data yields these parameters; for I models only 3I scalars suffice to predict all I^3 tournament outcomes. 5-fold cross-validation shows a 76.6% Brier-score reduction over random baseline, and the benchmark produces model rankings (e.g., Grok 3 Mini strongest detector, GPT-5 Mini strongest discloser, Claude Sonnet 4 weakest detector).

Significance. If the result holds, the work supplies a rare analytical, parameter-efficient account of LLM social intelligence in a multi-agent setting, moving beyond purely empirical evaluations. The cross-validation evidence, the reduction from I^3 to 3I parameters, and the falsifiable predictions constitute clear strengths. The counterintuitive capability rankings further illustrate the benchmark's potential utility for the field.

major comments (2)

[Abstract / analytical framework] Abstract and the section presenting the analytical framework: the formula logit(p) = v × (m - d) is asserted to capture the outcome of the full game via a single critical exchange, yet no derivation steps or explicit checks confirming the absence of residual pairwise interactions or higher-order terms across model triples are provided. This assumption is load-bearing for the claim that 3I parameters predict every I^3 combination without loss of accuracy.
[Cross-validation results] Cross-validation results (5-fold): while the 76.6% Brier reduction is reported, the manuscript does not include residual diagnostics or comparisons against models that add interaction terms to test whether the exact multiplicative logit form is misspecified versus merely adequate within the observed range. Such a check is required to substantiate that no model-pair-specific deviations exist.

minor comments (2)

Notation for the parameters m, d, v would benefit from an early explicit table or equation block defining each quantity and its estimation procedure.
The discussion of counterintuitive rankings (Grok 3 Mini, GPT-5 Mini, Claude variants) could be expanded with brief qualitative examples of the observed behaviors to aid interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / analytical framework] Abstract and the section presenting the analytical framework: the formula logit(p) = v × (m - d) is asserted to capture the outcome of the full game via a single critical exchange, yet no derivation steps or explicit checks confirming the absence of residual pairwise interactions or higher-order terms across model triples are provided. This assumption is load-bearing for the claim that 3I parameters predict every I^3 combination without loss of accuracy.

Authors: We agree that the current presentation would benefit from explicit derivation steps. The formula is derived from the game's reduction to a single decisive exchange in which the mafioso's deception advantage is modulated by the detective's disclosure and the villager's detection; under the assumption that other interactions are negligible due to the fixed night phase and role assignments, the net logit effect takes the multiplicative form v × (m - d). In the revision we will add a dedicated derivation subsection that walks through this reasoning from the game rules, together with post-hoc residual analyses across held-out model triples to verify that higher-order terms do not materially improve fit or predictive accuracy. revision: yes
Referee: [Cross-validation results] Cross-validation results (5-fold): while the 76.6% Brier reduction is reported, the manuscript does not include residual diagnostics or comparisons against models that add interaction terms to test whether the exact multiplicative logit form is misspecified versus merely adequate within the observed range. Such a check is required to substantiate that no model-pair-specific deviations exist.

Authors: We accept that residual diagnostics and explicit misspecification tests against richer models are necessary to substantiate the claim. In the revised manuscript we will include (i) residual plots and summary statistics from the Bayesian posterior predictive checks and (ii) a direct comparison of the base model against versions augmented with pairwise interaction terms (e.g., m·d, m·v). These additions will quantify whether the simple multiplicative form is adequate or whether systematic deviations appear for particular model combinations. revision: yes

Circularity Check

0 steps flagged

No circularity: functional form is a modeling assumption validated by generalization on held-out data rather than a tautological fit.

full rationale

The paper presents logit(p) = v × (m - d) as an analytical formula that collapses the multi-turn game to a single critical exchange and estimates the three per-model scalars via Bayesian inference on observed gameplay. It then tests predictive accuracy on held-out tournament combinations through 5-fold cross-validation, reporting a 76.6% Brier-score reduction. Because the evaluation explicitly withholds data from parameter estimation and measures out-of-sample performance, the reported predictions are not equivalent to the inputs by construction. No self-citation chain, uniqueness theorem, or ansatz imported from prior author work is invoked to justify the functional form; the derivation chain therefore remains self-contained against the external empirical benchmark.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The framework rests on three fitted parameters per model plus the structural assumption that game outcomes reduce to the stated logit form; no new physical entities are postulated.

free parameters (3)

m (mafioso deception)
Intrinsic parameter estimated via Bayesian inference from gameplay data for each model's performance in the mafioso role.
d (detective disclosure)
Intrinsic parameter estimated via Bayesian inference from gameplay data for each model's performance in the detective role.
v (villager detection)
Intrinsic parameter estimated via Bayesian inference from gameplay data for each model's performance in the villager role.

axioms (2)

domain assumption The four-player game with fixed night phase reduces to a single critical exchange among mafioso, detective, and villager.
This reduction is invoked to derive the analytical formula for win probability.
ad hoc to paper Win probability follows exactly logit(p) = v × (m - d) with no additional interaction terms.
Presented as the predictive analytical formula without further derivation shown in the abstract.

pith-pipeline@v0.9.0 · 5855 in / 1665 out tokens · 46606 ms · 2026-05-18T13:21:09.335145+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the mafia win-rate p is predicted by the analytical formula logit(p) = v × (m - d)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For I models, only 3I parameters suffice to predict the outcomes of all I³ tournament combinations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
cs.CL 2025-11 unverdicted novelty 6.0

LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Werewolf arena: A case study in llm evaluation via social deduction

Suma Bailis, Jane Friedhoff, and Feiyang Chen. Werewolf arena: A case study in llm evaluation via social deduction. arXiv preprint arXiv:2407.13943, 2024. URL https://arxiv.org/abs/2407.13943

work page arXiv 2024
[3]

Chang, Y .; Wang, X.; Wang, J.; Wu, Y .; Zhu, K.; Chen, H.; Yang, L.; Yi, X.; Wang, C.; Wang, Y .; Ye, W.; Zhang, Y .; Chang, Y .; Yu, P

Sourav Banerjee, Ayushi Agarwal, and Eishkaran Singh. The vulnerability of language model benchmarks: Do they accurately reflect true llm performance?, 2024. URL https://arxiv.org/abs/2412.03597

work page arXiv 2024
[4]

Philosophy of Physics, volume 45 of Synthese Library

Mario Bunge. Philosophy of Physics, volume 45 of Synthese Library. D. Reidel Publishing Company, Dordrecht, Holland, 1973

work page 1973
[5]

Mini-Mafia: LLM Benchmarking for Deception, Detection, and Disclosure

Davi Bastos Costa. Mini-Mafia: LLM Benchmarking for Deception, Detection, and Disclosure . https://github.com/bastoscostadavi/llm-mafia-game, 2025

work page 2025
[6]

Helmsman of the masses? evaluate the opinion leadership of large language models in the werewolf game

Silin Du and Xiaowei Zhang. Helmsman of the masses? evaluate the opinion leadership of large language models in the werewolf game. arXiv preprint arXiv:2404.01602, 2024. URL https://arxiv.org/abs/2404.01602

work page arXiv 2024
[7]

Truthful ai: Developing and governing ai that does not lie, 2021

Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, and William Saunders. Truthful ai: Developing and governing ai that does not lie, 2021. URL https://arxiv.org/abs/2110.06674

work page arXiv 2021
[9]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pp.\ 8048--8057. IJCAI, 2024. doi:10.24963/ijcai.2024/890

work page doi:10.24963/ijcai.2024/890 2024
[10]

Egosocialarena: Benchmarking the social intelligence of large language models from a first-person perspective

Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, and Weiming Lu. Egosocialarena: Benchmarking the social intelligence of large language models from a first-person perspective. arXiv preprint arXiv:2410.06195, 2024. URL https://arxiv.org/abs/2410.06195

work page arXiv 2024
[11]

Homo Ludens: A Study of the Play-Element in Culture

Johan Huizinga. Homo Ludens: A Study of the Play-Element in Culture. Routledge & Kegan Paul, 1938

work page 1938
[12]

Learning to discuss strategically: A case study on one night ultimate werewolf

Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, and Jun Wang. Learning to discuss strategically: A case study on one night ultimate werewolf. arXiv preprint arXiv:2405.19946, 2024. URL https://arxiv.org/abs/2405.19946

work page arXiv 2024
[13]

Rehg, and Diyi Yang

Brian Lai, Haofan Zhang, Ming Liu, Andrea Pariani, Francesca Ryan, Weizhe Jia, Shirley Anugrah Hayati, James M. Rehg, and Diyi Yang. Werewolf among us: A multimodal dataset for modeling persuasion behaviors in social deduction games. arXiv preprint arXiv:2212.08279, 2022. URL https://arxiv.org/abs/2212.08279

work page arXiv 2022
[14]

Th \'e orie Analytique des Probabilit \'e s

Pierre-Simon Laplace. Th \'e orie Analytique des Probabilit \'e s . Courcier, Paris, 1812. See Livre II, Chapitre VI for the rule of succession. Reprinted with additions, 2nd ed. 1814; English translation in A. I. Dale (ed.), Pierre-Simon Laplace: Philosophical Essay on Probabilities , Springer, 1995

work page 1995
[15]

Strategy adaptation in large language model werewolf agents

Fumiya Nakamori, Yoshinobu Kano, Neo Watanabe, et al. Strategy adaptation in large language model werewolf agents. arXiv preprint arXiv:2507.12732, 2025. URL https://arxiv.org/abs/2507.12732

work page arXiv 2025
[16]

When benchmarks talk: Re-evaluating code llms with interactive feedback

Jane Pan, Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, and Valerie Chen. When benchmarks talk: Re-evaluating code llms with interactive feedback. arXiv preprint arXiv:2502.18413, 2025

work page arXiv 2025
[17]

and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , title =

Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, and Dan Hendrycks. Ai deception: A survey of examples, risks, and potential solutions, 2023. URL https://arxiv.org/abs/2308.14752

work page arXiv 2023
[18]

Playing the werewolf game with artificial intelligence for language understanding

Hisaichi Shibata, Soichiro Miki, et al. Playing the werewolf game with artificial intelligence for language understanding. arXiv preprint arXiv:2302.10646, 2023. URL https://arxiv.org/abs/2302.10646

work page arXiv 2023
[19]

Nature , author =

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 0 (7587): 0 484--489, 2016. doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016
[20]

A survey on large language model based autonomous agents

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18 0 (6): 0 186345, 2024. doi:10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[21]

Enhance reasoning for large language models in the game werewolf

Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024. URL https://arxiv.org/abs/2402.02330

work page arXiv 2024
[22]

Language agents with reinforcement learning for strategic play in the werewolf game

Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023. URL https://arxiv.org/abs/2310.18940. Uses Werewolf as a social-deduction testbed

work page arXiv 2023
[23]

Learning strategic language agents in the werewolf game with iterative latent space policy optimization

Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, and Yu Wang. Learning strategic language agents in the werewolf game with iterative latent space policy optimization. In Proceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 of Proceedings of Machine Learning Research, 2025. URL https://nicsefc.ee.tsinghua.edu.cn/nics_file/pdf/a58b31b...

work page 2025
[24]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[25]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[26]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.1057/s41599-024-03611-3 2025

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Werewolf arena: A case study in llm evaluation via social deduction

Suma Bailis, Jane Friedhoff, and Feiyang Chen. Werewolf arena: A case study in llm evaluation via social deduction. arXiv preprint arXiv:2407.13943, 2024. URL https://arxiv.org/abs/2407.13943

work page arXiv 2024

[3] [3]

Chang, Y .; Wang, X.; Wang, J.; Wu, Y .; Zhu, K.; Chen, H.; Yang, L.; Yi, X.; Wang, C.; Wang, Y .; Ye, W.; Zhang, Y .; Chang, Y .; Yu, P

Sourav Banerjee, Ayushi Agarwal, and Eishkaran Singh. The vulnerability of language model benchmarks: Do they accurately reflect true llm performance?, 2024. URL https://arxiv.org/abs/2412.03597

work page arXiv 2024

[4] [4]

Philosophy of Physics, volume 45 of Synthese Library

Mario Bunge. Philosophy of Physics, volume 45 of Synthese Library. D. Reidel Publishing Company, Dordrecht, Holland, 1973

work page 1973

[5] [5]

Mini-Mafia: LLM Benchmarking for Deception, Detection, and Disclosure

Davi Bastos Costa. Mini-Mafia: LLM Benchmarking for Deception, Detection, and Disclosure . https://github.com/bastoscostadavi/llm-mafia-game, 2025

work page 2025

[6] [6]

Helmsman of the masses? evaluate the opinion leadership of large language models in the werewolf game

Silin Du and Xiaowei Zhang. Helmsman of the masses? evaluate the opinion leadership of large language models in the werewolf game. arXiv preprint arXiv:2404.01602, 2024. URL https://arxiv.org/abs/2404.01602

work page arXiv 2024

[7] [7]

Truthful ai: Developing and governing ai that does not lie, 2021

Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, and William Saunders. Truthful ai: Developing and governing ai that does not lie, 2021. URL https://arxiv.org/abs/2110.06674

work page arXiv 2021

[8] [9]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, pp.\ 8048--8057. IJCAI, 2024. doi:10.24963/ijcai.2024/890

work page doi:10.24963/ijcai.2024/890 2024

[9] [10]

Egosocialarena: Benchmarking the social intelligence of large language models from a first-person perspective

Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, and Weiming Lu. Egosocialarena: Benchmarking the social intelligence of large language models from a first-person perspective. arXiv preprint arXiv:2410.06195, 2024. URL https://arxiv.org/abs/2410.06195

work page arXiv 2024

[10] [11]

Homo Ludens: A Study of the Play-Element in Culture

Johan Huizinga. Homo Ludens: A Study of the Play-Element in Culture. Routledge & Kegan Paul, 1938

work page 1938

[11] [12]

Learning to discuss strategically: A case study on one night ultimate werewolf

Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, and Jun Wang. Learning to discuss strategically: A case study on one night ultimate werewolf. arXiv preprint arXiv:2405.19946, 2024. URL https://arxiv.org/abs/2405.19946

work page arXiv 2024

[12] [13]

Rehg, and Diyi Yang

Brian Lai, Haofan Zhang, Ming Liu, Andrea Pariani, Francesca Ryan, Weizhe Jia, Shirley Anugrah Hayati, James M. Rehg, and Diyi Yang. Werewolf among us: A multimodal dataset for modeling persuasion behaviors in social deduction games. arXiv preprint arXiv:2212.08279, 2022. URL https://arxiv.org/abs/2212.08279

work page arXiv 2022

[13] [14]

Th \'e orie Analytique des Probabilit \'e s

Pierre-Simon Laplace. Th \'e orie Analytique des Probabilit \'e s . Courcier, Paris, 1812. See Livre II, Chapitre VI for the rule of succession. Reprinted with additions, 2nd ed. 1814; English translation in A. I. Dale (ed.), Pierre-Simon Laplace: Philosophical Essay on Probabilities , Springer, 1995

work page 1995

[14] [15]

Strategy adaptation in large language model werewolf agents

Fumiya Nakamori, Yoshinobu Kano, Neo Watanabe, et al. Strategy adaptation in large language model werewolf agents. arXiv preprint arXiv:2507.12732, 2025. URL https://arxiv.org/abs/2507.12732

work page arXiv 2025

[15] [16]

When benchmarks talk: Re-evaluating code llms with interactive feedback

Jane Pan, Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, and Valerie Chen. When benchmarks talk: Re-evaluating code llms with interactive feedback. arXiv preprint arXiv:2502.18413, 2025

work page arXiv 2025

[16] [17]

and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , title =

Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, and Dan Hendrycks. Ai deception: A survey of examples, risks, and potential solutions, 2023. URL https://arxiv.org/abs/2308.14752

work page arXiv 2023

[17] [18]

Playing the werewolf game with artificial intelligence for language understanding

Hisaichi Shibata, Soichiro Miki, et al. Playing the werewolf game with artificial intelligence for language understanding. arXiv preprint arXiv:2302.10646, 2023. URL https://arxiv.org/abs/2302.10646

work page arXiv 2023

[18] [19]

Nature , author =

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 0 (7587): 0 484--489, 2016. doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016

[19] [20]

A survey on large language model based autonomous agents

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18 0 (6): 0 186345, 2024. doi:10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024

[20] [21]

Enhance reasoning for large language models in the game werewolf

Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, and Haobo Fu. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024. URL https://arxiv.org/abs/2402.02330

work page arXiv 2024

[21] [22]

Language agents with reinforcement learning for strategic play in the werewolf game

Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023. URL https://arxiv.org/abs/2310.18940. Uses Werewolf as a social-deduction testbed

work page arXiv 2023

[22] [23]

Learning strategic language agents in the werewolf game with iterative latent space policy optimization

Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, and Yu Wang. Learning strategic language agents in the werewolf game with iterative latent space policy optimization. In Proceedings of the 42nd International Conference on Machine Learning (ICML), volume 267 of Proceedings of Machine Learning Research, 2025. URL https://nicsefc.ee.tsinghua.edu.cn/nics_file/pdf/a58b31b...

work page 2025

[23] [24]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[24] [25]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[25] [26]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.1057/s41599-024-03611-3 2025