Revac: A Social Deduction Reasoning Agent
Pith reviewed 2026-05-10 01:51 UTC · model grok-4.3
The pith
A multi-module AI agent for social deduction games uses memory profiling, social graph analysis, and dynamic tone selection to win first place in competition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Revac-8 evolved from a simple two-stage reasoning system into a multi-module architecture that integrates memory-based player profiling, social-graph analysis of accusations and defenses, and dynamic tone selection for communication, where it achieved first place.
What carries the argument
The multi-module architecture integrating memory-based player profiling to track histories, social-graph analysis to examine accusation and defense patterns, and dynamic tone selection to adjust communication.
If this is right
- Memory of prior statements enables agents to build profiles that improve role inference over time.
- Social-graph analysis reveals consistent or suspicious interaction patterns that aid elimination decisions.
- Dynamic tone selection increases the agent's ability to gather information and influence group outcomes.
- These components together produce stronger results than simpler two-stage reasoning in deceptive environments.
Where Pith is reading between the lines
- The same modular structure could be tested in other interactive domains that involve trust and misinformation, such as automated negotiation.
- Controlled experiments matching the agent against human teams would clarify whether competition performance generalizes beyond AI opponents.
- Adding deeper language models to the communication module might further close gaps in interpreting subtle human cues.
Load-bearing premise
Success against other AI agents in one specific competition demonstrates effective reasoning under uncertainty and deception that transfers to other settings or against human players.
What would settle it
Direct head-to-head games between Revac-8 and human players in Mafia, with recorded win rates and accuracy at identifying hidden roles.
Figures
read the original abstract
Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strategic elimination decisions. Unlike deterministic board games, success in Mafia depends not on perfect information or brute-force search, but on inference, memory, and adaptability in the presence of deception. This work presents the design and evaluation of Revac-8, an AI agent developed for the Social Deduction track of the MindGames Arena competition, where it achieved first place. The final agent evolved from a simple two-stage reasoning system into a multi-module architecture that integrates memory-based player profiling, social-graph analysis of accusations and defenses, and dynamic tone selection for communication. These results highlight the importance of structured memory and adaptive communication for achieving strong performance in high-stakes social environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Revac-8, an AI agent for social deduction games such as Mafia in the MindGames Arena competition. It describes the agent's evolution from a simple two-stage reasoning system into a multi-module architecture incorporating memory-based player profiling, social-graph analysis of accusations and defenses, and dynamic tone selection for communication, claiming this design achieved first place in the Social Deduction track.
Significance. If the performance attribution holds after proper documentation, the work would illustrate the utility of structured memory and adaptive communication for AI reasoning under uncertainty and deception, with potential relevance to multi-agent systems. The external competition outcome provides an independent benchmark, though the absence of internal validation limits claims about component contributions or generalizability.
major comments (2)
- [Abstract] Abstract: The first-place result is asserted without any evaluation metrics, baselines, win rates, statistical details, or ablation studies. This is load-bearing for the central claim that the multi-module additions (memory profiling, social-graph analysis, dynamic tone) drove the outcome, as success could arise from unstated factors such as rule-specific tuning or variance.
- [Evaluation / Results (implied)] No section provides quantitative comparisons between the final multi-module agent and the initial two-stage system, or against other competition entrants. Without such data, the attribution of performance gains to the specific modules cannot be evaluated.
minor comments (2)
- [Title / Abstract] The title refers to 'Revac' while the abstract uses 'Revac-8'; standardize the agent name and clarify versioning.
- [Abstract] The abstract mentions 'high-stakes social environments' without defining the competition rules or game parameters, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on evaluation and attribution. We address each major point below and will revise the manuscript to incorporate additional discussion of limitations and available context where feasible.
read point-by-point responses
-
Referee: [Abstract] Abstract: The first-place result is asserted without any evaluation metrics, baselines, win rates, statistical details, or ablation studies. This is load-bearing for the central claim that the multi-module additions (memory profiling, social-graph analysis, dynamic tone) drove the outcome, as success could arise from unstated factors such as rule-specific tuning or variance.
Authors: We agree that the abstract would be strengthened by more context. The first-place result is from the MindGames Arena Social Deduction track, which functions as an external benchmark. However, the competition does not release detailed per-round win rates, statistical tests, or entrant baselines in a form suitable for inclusion. We will revise the abstract to describe the competition format and ranking more precisely while moderating language on module contributions to avoid implying isolated causal effects. This acknowledges that unstated factors could contribute to the outcome. revision: partial
-
Referee: [Evaluation / Results (implied)] No section provides quantitative comparisons between the final multi-module agent and the initial two-stage system, or against other competition entrants. Without such data, the attribution of performance gains to the specific modules cannot be evaluated.
Authors: We acknowledge the absence of such comparisons. The agent evolved iteratively during the competition, and no formal ablation studies or controlled experiments against the two-stage prototype were conducted due to time constraints. Detailed performance data from other entrants is limited to final rankings. We will add a dedicated subsection discussing the development stages, qualitative observations from testing, and explicit limitations on attributing gains to individual modules (memory profiling, social-graph analysis, dynamic tone). This will clarify that the competition outcome supports the overall architecture but does not enable component-level evaluation. revision: partial
Circularity Check
No significant circularity: external competition result anchors the claim
full rationale
The manuscript describes an empirical agent design process for Revac-8 in a social deduction competition, noting its evolution from a two-stage system to a multi-module architecture with memory profiling, social-graph analysis, and tone selection, culminating in a first-place finish. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central performance claim is tied directly to an external, independently verifiable competition outcome rather than any internal reduction or ansatz smuggled through prior work, leaving the derivation chain self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
AI Wolf Contest: Development of Game AI Using Collective Intelligence
Fujio Toriumi, Hirotaka Osawa, Michimasa Inaba, Daisuke Katagami, Kosuke Shinoda, and Hitoshi Matsubara. AI Wolf Contest: Development of Game AI Using Collective Intelligence. In Computer Games, pages 101--115. Springer, 2017
work page 2017
-
[2]
RLupus: Cooperation through emergent communication in the Werewolf social deduction game
Nicol\`o Brandizzi, Davide Grossi, and Luca Iocchi. RLupus: Cooperation through emergent communication in the Werewolf social deduction game. Intelligenza Artificiale, 16(1):3--21, 2022
work page 2022
-
[3]
Byunghwa Yoo and Kyung-Joong Kim. Finding deceivers in social context with large language models and how to find them: the case of the Mafia game. Scientific Reports, 14, 2024
work page 2024
-
[4]
Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies
Zhiyang Qi and Michimasa Inaba. Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies. In Proceedings of the 2nd International AIWolfDial Workshop, pages 30--39. Association for Computational Linguistics, 2024
work page 2024
-
[5]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[6]
Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1--22, 2023
work page 2023
-
[7]
Theory of Mind for Multi-Agent Collaboration via Large Language Models
Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. Theory of Mind for Multi-Agent Collaboration via Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1961--1979, 2023
work page 2023
-
[8]
Bidipta Sarkar, Warren Xia, C. Karen Liu, and Dorsa Sadigh. Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning. In Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1830--1839, 2025
work page 2025
-
[9]
Playing the werewolf game with artificial intelligence for language understanding
Hisaichi Shibata, Soichiro Miki, and Yuta Nakamura. Playing the Werewolf game with artificial intelligence for language understanding. arXiv preprint arXiv:2302.10646, 2023
-
[10]
Yannakakis, and Julian Togelius
Andrzej Liapis, Georgios N. Yannakakis, and Julian Togelius. Games for Artificial Intelligence Research: A Review and Framework. Artificial Intelligence Review, 57, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.