pith. sign in

arxiv: 2605.22826 · v1 · pith:C4LUW4KOnew · submitted 2026-04-09 · 💻 cs.CL · cs.AI· cs.GT· cs.MA

Evaluating Large Language Models in a Complex Hidden Role Game

Pith reviewed 2026-05-25 00:34 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.GTcs.MA
keywords large language modelsdeceptionSecret HitlerAI safetysocial deduction gamesstrategic reasoninghidden role gamesmulti-turn manipulation
0
0 comments X

The pith

Large language models fail to sustain deception or strategic impact when playing fascists in Secret Hitler.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates LLMs inside the hidden-role game Secret Hitler to test their ability to reason, persuade, and deceive across multiple turns. It introduces three new metrics—Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate—and compares model performance against both rule-based agents and recorded human games. Results show models achieve far lower voting alignment with expert decisions than rule-based agents, produce negative impact scores as fascists, and end games roughly 40 percent sooner than humans. Standard reasoning aids such as chain-of-thought prompting and added memory yield no gains and can reduce fascist win rates by up to 23.2 percent. The work positions the game and metrics as a reproducible testbed for tracking when models acquire complex manipulative skills.

Core claim

Benchmarking reveals a clear gap between conversational fluency and strategic depth: rule-based agents match expert human voting decisions 86.7 percent of the time while Llama 3.1 70B reaches only 59.7 percent accuracy. Models assigned fascist roles consistently generate negative Game State Impact Rates, fail to retain deception, and produce games about 40 percent shorter than human games. Neither chain-of-thought prompting nor internal memory improves outcomes and can degrade fascist win rates by as much as 23.2 percent.

What carries the argument

The Secret Hitler game environment together with the three custom metrics (Role Identification Accuracy, Deception Retention Rate, Game State Impact Rate) that track how well agents identify roles, maintain lies, and alter overall game trajectories.

If this is right

  • Current LLM architectures remain ineffective at complex multi-turn manipulation.
  • Rule-based agents align with human expert decisions more closely than any tested model.
  • Common reasoning enhancements produce no benefit and can actively harm performance in deception tasks.
  • The released framework supplies a standardized, reproducible environment for measuring future progress in deceptive capability.
  • Detecting the point at which models master these behaviors will matter for alignment monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metrics could be ported to other hidden-role games to test whether the observed gap generalizes beyond Secret Hitler.
  • The performance drop when models are given chain-of-thought or memory hints at limits in long-horizon consistency rather than simple knowledge retrieval.
  • If models later close the gap on these metrics, the framework would provide an early-warning signal for real-world manipulation risks.
  • Human players may already be using cues that current models do not yet replicate to detect and counter model deception.

Load-bearing premise

The rules and scoring of Secret Hitler serve as a faithful stand-in for general deceptive capability rather than merely testing surface fluency or prompt sensitivity.

What would settle it

Repeated trials in which an LLM achieves fascist win rates, deception retention, and game lengths statistically indistinguishable from or superior to human players would falsify the ineffectiveness claim.

Figures

Figures reproduced from arXiv: 2605.22826 by Niklas Bauer.

Figure 1
Figure 1. Figure 1: Secret Hitler is used as a shared testing ground for two major research pillars of LLMs: reasoning about hidden information and deception as a means of persuasion in social deduction games. Modern generative models produce human-like text and solve complex language understanding and reasoning problems [OpenAI et al., 2023, Brown et al., 2020]. Their increase in popularity in recent years also raises concer… view at source ↗
Figure 2
Figure 2. Figure 2: Simple example of one turn in Secret Hitler. The rotating President proposes a Chancellor. Everyone on the table votes for the two being in a government. The president forwards two cards in secret. The chancellor plays one card. Discussions between the actions are not shown. Strategic depth arises from policy outcomes: as more fascist policies are enacted, the President gains investigative or executive pow… view at source ↗
Figure 3
Figure 3. Figure 3: Architecture overview of my Secret Hitler LLM framework showing the core modules for game management and agent interaction, along with available player types and evaluation metrics. 3.2.1 Game Environment To provide an architectural overview, this section begins with a description of the core game environment underlying the framework. The environment includes a fully implemented rules engine that adheres t… view at source ↗
Figure 4
Figure 4. Figure 4: Game State Impact Rate (GSIR) by five different language models. Measured is the average [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Tracking Game State Evaluations for n = 297 games of Qwen 3 32B playing against four reputation-based agents per round (light lines). The plot also shows mean curves for the three roles (solid lines). The Game State Evaluation is computed after each round, with higher values indicating a more favorable position for liberals, and lower values favoring fascists. The values represent the average score across … view at source ↗
Figure 6
Figure 6. Figure 6: Role Identification Accuracy (RIA) of tested LLMs when playing as Liberal as the rounds [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study of prompting strategies and techniques on [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Game State Impact Rate (GSIR) of Llama 3.3 70B by role across different prompting strategies, as described in Section 3. It is the average impact (delta) on game state scores by the models’ actions. Positive values indicate beneficial actions for Llama 3.3 70B’s party, while negative values represent harmful actions. The top row shows the baseline GSIR and broken down for each role. Then, the bottom row sh… view at source ↗
Figure 9
Figure 9. Figure 9: Deception Retention Rate (DRR) averaged across game rounds for different models. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Absolute counts of persuasion categories based on messages by [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Average Uses of Persuasion Techniques with annotated persuasion categories by [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Tracking the mean number of policies played at certain points in the game, separated for [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Radar Chart of the relative frequency of persuasion techniques across different models [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Average uses of persuasion techniques in messages by Human players across different Elo [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Tracking Game State Evaluations of four different models playing against four reputation [PITH_FULL_IMAGE:figures/full_fig_p052_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Detailed counts of LLM-Annotated persuasion categories based on messages by [PITH_FULL_IMAGE:figures/full_fig_p052_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Relative frequency of persuasion techniques across different models and human players. [PITH_FULL_IMAGE:figures/full_fig_p053_17.png] view at source ↗
read the original abstract

Quantifying the deceptive potential of Large Language Models (LLMs) is critical for AI safety, yet difficult to achieve in uncontrolled environments. This work investigates the reasoning, persuasion, and deceptive capabilities of LLMs within the social deduction game Secret Hitler. I introduce an open-source framework and novel metrics to measure performance: Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate. By benchmarking models against rule-based algorithms and human games, I identify a gap between conversational ability and strategic depth. The study also analyzes the impact of reasoning-enhancement techniques on win rates and strategic reasoning. Neither Chain-of-Thought prompting nor internal memory bring improvements in performance, with up to 23.2% worse win rates for fascist roles. While rule-based agents align with expert human voting decisions 86.7% of the time, models like Llama 3.1 70B achieve only a 59.7% accuracy. Models playing as Fascists consistently yield negative impact scores and fail to sustain deception, resulting in roughly 40% shorter games compared to humans. These findings suggest that current architectures remain ineffective at complex, multi-turn manipulation. As capabilities advance, detecting when models begin to master these deceptive behaviors is crucial. The developed framework serves as a reproducible testbed for future alignment research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces an open-source framework for evaluating LLMs in the social deduction game Secret Hitler and defines three new metrics (Role Identification Accuracy, Deception Retention Rate, Game State Impact Rate) to quantify reasoning, persuasion, and deception. It benchmarks multiple LLMs against rule-based agents and human games, reports that LLMs achieve only 59.7% voting alignment (vs. 86.7% for rule-based), produce negative fascist impact scores, yield ~40% shorter games, and show no gains (or up to 23.2% worse win rates) from Chain-of-Thought or internal memory, concluding that current architectures remain ineffective at complex multi-turn manipulation.

Significance. If the metrics are shown to isolate deception from state-tracking, the work supplies a reproducible testbed and concrete empirical baselines that could support future alignment research on multi-agent deception. The open-source framework and direct comparison to rule-based and human performance are strengths.

major comments (3)
  1. [Metrics definitions and results (abstract and §4)] The central claim that models are ineffective at sustaining multi-turn deception rests on Deception Retention Rate and Game State Impact Rate serving as valid proxies. These metrics are computed inside a partially observed game whose state must be maintained by the LLM across turns; without an ablation that supplies ground-truth state mid-game (or tests deception in an otherwise identical fully-observed setting), low scores are consistent with known state-tracking collapse rather than inability to generate persuasive language given perfect information.
  2. [Abstract and experimental results] The reported 59.7% voting alignment for Llama 3.1 70B versus 86.7% for rule-based agents, the 23.2% worse win rates under reasoning enhancements, and the 40% shorter games are presented without sample sizes, error bars, number of games per condition, or statistical tests, making it impossible to determine whether the performance gap is robust or sensitive to post-hoc prompt or game-rule choices.
  3. [Reasoning-enhancement experiments] The claim that neither Chain-of-Thought nor internal memory improves performance is load-bearing for the broader conclusion about current architectures, yet the paper supplies no description of how these techniques were implemented, how many trials were run, or whether the 23.2% degradation is measured on the same metric suite or on win rate alone.
minor comments (1)
  1. [Metric definitions] Clarify the exact formulas for Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate, including how votes, actions, and game logs are mapped to each score.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below, with revisions planned where the concerns identify gaps in the current manuscript.

read point-by-point responses
  1. Referee: [Metrics definitions and results (abstract and §4)] The central claim that models are ineffective at sustaining multi-turn deception rests on Deception Retention Rate and Game State Impact Rate serving as valid proxies. These metrics are computed inside a partially observed game whose state must be maintained by the LLM across turns; without an ablation that supplies ground-truth state mid-game (or tests deception in an otherwise identical fully-observed setting), low scores are consistent with known state-tracking collapse rather than inability to generate persuasive language given perfect information.

    Authors: The metrics are defined and computed within the standard partially-observed rules of Secret Hitler precisely because the task requires both state tracking and deception; isolating one from the other would change the evaluation target. We agree, however, that the current text does not sufficiently discuss this potential confound. We will add an explicit limitations paragraph noting that low scores may partly reflect state-tracking failures and outlining future ablations that supply ground-truth state. revision: yes

  2. Referee: [Abstract and experimental results] The reported 59.7% voting alignment for Llama 3.1 70B versus 86.7% for rule-based agents, the 23.2% worse win rates under reasoning enhancements, and the 40% shorter games are presented without sample sizes, error bars, number of games per condition, or statistical tests, making it impossible to determine whether the performance gap is robust or sensitive to post-hoc prompt or game-rule choices.

    Authors: The experimental section of the manuscript states the total number of games run per model and condition, but we acknowledge that error bars, per-condition sample sizes, and statistical tests are not reported. In the revision we will add these details together with the results of appropriate significance tests. revision: yes

  3. Referee: [Reasoning-enhancement experiments] The claim that neither Chain-of-Thought nor internal memory improves performance is load-bearing for the broader conclusion about current architectures, yet the paper supplies no description of how these techniques were implemented, how many trials were run, or whether the 23.2% degradation is measured on the same metric suite or on win rate alone.

    Authors: We will expand the methods subsection on reasoning enhancements to describe the exact prompting templates used for Chain-of-Thought, the memory mechanism, the number of trials per condition, and to state explicitly that the 23.2 % figure is the change in fascist win rate. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmarking with external baselines

full rationale

The paper introduces a game framework and three metrics (Role Identification Accuracy, Deception Retention Rate, Game State Impact Rate) then reports empirical results by running LLMs, rule-based agents, and referencing human games. No equations, fitted parameters, self-citations, or derivations are present in the provided text. Claims rest on direct comparison to independent external baselines rather than any reduction of outputs to inputs by construction. This matches the default case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical evaluation study; no free parameters, mathematical axioms, or invented entities are invoked in the abstract.

pith-pipeline@v0.9.0 · 5759 in / 1059 out tokens · 19653 ms · 2026-05-25T00:34:49.893309+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 3 internal anchors

  1. [1]

    Sijing Chen, Lu Xiao, and Jin Mao

    URLhttps://arxiv.org/abs/2404.18231. Sijing Chen, Lu Xiao, and Jin Mao. Persuasion strategies of misinformation-containing posts in the social media. 58(5):102665, 2021. ISSN 0306-4573. doi: 10.1016/j.ipm.2021.102665. URL https://www.sciencedirect.com/science/article/pii/S0306457321001539. Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Ho...

  2. [2]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z

    URLhttps://arxiv.org/abs/2505.12923. DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi De...

  3. [3]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    URLhttps://arxiv.org/abs/2501.12948. Caleb DeLeeuw, Gaurav Chawla, Aniket Sharma, and Vanessa Dietze. The secret agenda: LLMs strategically lie and our current safety tools are blind, 2025. URL https://arxiv.org/abs/ 2509.20393. Silin Du and Xiaowei Zhang. Helmsman of the masses? evaluate the opinion leadership of large language models in the werewolf gam...

  4. [4]

    doi: 10.18653/v1/2024.naacl-long.123

    Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.123. URL https://aclanthology.org/2024.naacl-long.123/. Niv Eckhaus, Uri Berger, and Gabriel Stanovsky. Time to talk: LLM agents for asynchronous group communication in mafia games. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings...

  5. [5]

    Gemma 3 Technical Report

    URLhttps://arxiv.org/abs/2503.19786. Satvik Golechha and Adrià Garriga-Alonso. Among us: A sandbox for measuring and detecting agentic deception, 2025. URLhttps://arxiv.org/abs/2504.04072. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang,...

  6. [6]

    secret hitler

    ISBN 978-1-108-83555-8. URL https://www.cambridge.org/core/books/social- media-and-democracy/misinformation-disinformation-and-online-propaganda/ D14406A631AA181839ED896916598500. Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, and Yutaka Matsuo. Suspicion- agent: Playing imperfect information games with theory of mind aware GPT-4, 2023. ...

  7. [7]

    doi: 10.63562/2577-8439.1111

    ISSN 2577-8439. doi: 10.63562/2577-8439.1111. URL https://orb.binghamton.edu/ nejcs/vol7/iss2/5. Sihao Hu, Tiansheng Huang, Gaowen Liu, Ramana Rao Kompella, Fatih Ilhan, Selim Furkan Tekin, Yichang Xu, Zachary Yahn, and Ling Liu. A survey on large language model-based game agents,

  8. [8]

    37 Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang

    URLhttps://arxiv.org/abs/2404.02039. 37 Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and peace (WarAgent): Large language model-based multi-agent simulation of world wars, 2023. URLhttps://arxiv.org/abs/2311.17227. Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jia...

  9. [9]

    doi: 10.1073/pnas.2405460121

    ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.2405460121. URL https://pnas.org/ doi/10.1073/pnas.2405460121. 38 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the ACM SIGOPS 29th ...

  10. [10]

    doi: 10.18653/v1/2024.emnlp-main.383

    Association for Computational Linguistics. doi: 10.18653/v1/2024.emnlp-main.383. URL https://aclanthology.org/2024.emnlp-main.383/. Aengus Lynch, Benjamin Wright, and Caleb Larson. Agentic misalignment: How LLMs could be insider threats. 2025. URL https://www.anthropic.com/research/agentic- misalignment. Ji Ma. Computational basis of LLM’s decision making...

  11. [11]

    Bayesian Social Deduction with Graph-Informed Language Models

    URLhttps://arxiv.org/abs/2506.17788. Jack Reinhardt. Competing in a complex hidden role game with information set monte carlo tree search, 2020. URLhttps://arxiv.org/abs/2005.07156. Alexander Rogiers, Sander Noels, Maarten Buyl, and Tijl De Bie. Persuasion with large language models: a survey, 2024. URLhttps://arxiv.org/abs/2411.06837. 41 Bidipta Sarkar, ...

  12. [12]

    doi: 10.18653/v1/2024.aiwolfdial-1.6

    Association for Computational Linguistics. doi: 10.18653/v1/2024.aiwolfdial-1.6. URL https://aclanthology.org/2024.aiwolfdial-1.6/. Samuel M. Taylor and Benjamin K. Bergen. Do large language models exhibit spontaneous rational deception?, 2025. URLhttps://arxiv.org/abs/2504.00285. Fujio Toriumi, Hirotaka Osawa, Michimasa Inaba, Daisuke Katagami, Kosuke Sh...

  13. [13]

    Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, and Gao Huang

    URLhttps://arxiv.org/abs/2410.10479. Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, and Gao Huang. Avalon’s game of thoughts: Battle against deception through recursive contemplation, 2023. URLhttps://arxiv.org/abs/2310.01320. Tianhe Wang and Tomoyuki Kaneko. Application of deep reinforcemen...

  14. [14]

    Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu

    URLhttps://arxiv.org/abs/2309.04658. Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. Language agents with reinforcement learning for strategic play in the werewolf game. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024b. URLhttps: //openreview.net/forum?id=usUPvQH3XK. Zelai Xu,...

  15. [15]

    URLhttps://openreview.net/pdf?id=WE_vluYUL-X

    OpenReview.net, 2023. URLhttps://openreview.net/pdf?id=WE_vluYUL-X. Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi. How johnny can persuade LLMs to jailbreak them: Rethinking persuasion to challenge AI safety by humanizing LLMs. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of ...

  16. [16]

    Early Game ( -0.229):In an opening situation at round 1 with 0L–0F policies enacted, a starting deck composition of 6L–11F cards, a liberal president, no unlocked powers, and no role information available, the deck’s fascist bias creates a slightly unfavorable position for liberals despite the balanced policy track

  17. [17]

    Mid-Game Crisis (-0.457):A representative mid-game state at round 7 features 1L–3F policies enacted, a fascist president holding execution power, and liberals correctly identifying the fascist president. Despite accurate role identification by liberals, the combination of policy disadvantage, poor deck composition, and dangerous executive power in fascist...

  18. [18]

    Balanced Mid-Game (+0.037):Another mid-game configuration at round 6 contains 2L–2F policies enacted, a liberal president without powers, and liberals correctly identifying the fascist player. The policy track appears balanced, but the heavily fascist-biased deck composition counteracts the liberal president advantage, resulting in a nearly neutral score ...

  19. [19]

    This misidentification creates substantial election risk, overwhelming the liberal president’s investigative advantage and producing a moderately fascist-favored score

    Hitler Danger (-0.326):A different example at round 8 with 1L–3F policies enacted, a liberal president holding investigate power, and liberals misidentifying Hitler as liberal after three fascist policies illustrates the impact of misinformation. This misidentification creates substantial election risk, overwhelming the liberal president’s investigative a...

  20. [20]

    Late Game Liberal Advantage (+0.531):A late-game scenario at round 10 shows liberals with 4L–2F policies enacted (one away from victory). With a liberal president, no unlocked powers, and liberals correctly identifying both Hitler and the fascist player, the strong policy advantage and excellent role information outweigh the poor deck state, yielding a mo...

  21. [21]

    Secret Hitler

    Dire Situation (-0.579):In a high-pressure late-game position at round 12, 1L–5F policies have been enacted (fascists one away from victory), a fascist president wielding execution power, and some role identification by liberals. The imminent fascist policy victory combined with executive control in fascist hands produces a strongly fascist-favored score,...