pith. sign in

arxiv: 2605.22841 · v1 · pith:5MURIHQFnew · submitted 2026-05-11 · ⚛️ physics.soc-ph · cs.AI· cs.CL· cs.GT· cs.MA· econ.GN· q-fin.EC

Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test

Pith reviewed 2026-05-25 00:44 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.AIcs.CLcs.GTcs.MAecon.GNq-fin.EC
keywords Greenland sovereigntyLLM simulationalliance coercioninverse game theoryescalation behaviormulti-agent gamesutility parametersNATO dynamics
0
0 comments X

The pith

Frontier LLMs escalate more under coercion and rarely allow peaceful U.S. acquisition of Greenland in simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper places eight frontier large language models into roles in three linked games that model a U.S. effort to acquire Greenland from Denmark. It shows that every model chooses more escalatory actions when the scenario is presented as coercive pressure from the stronger ally. The authors apply inverse game theory to the resulting action sequences to extract each model's weights on material self-interest, reciprocity, inequality aversion, norm respect, and commitment consistency. This setup is offered as a structural benchmark for how current models handle alliance tensions and collective-action problems.

Core claim

In 3,604 completed games, coercion framing raised the rate of four-action escalation sequences from 10.7 percent to 28.6 percent across all eight models; peaceful U.S. acquisition occurred in only 1.9 percent of clean games and was produced by only three of the eight models.

What carries the argument

Inverse game theory recovering five structural utility parameters (alpha through eta) from action sequences generated by a multi-agent simulation of asymmetric coercion, NATO assurance, and triadic extensive-form games.

If this is right

  • Coercion framing produces higher escalation rates in every tested model.
  • Models trained in different regions display distinct power-weight profiles when assigned the U.S. role.
  • Prompts that invoke jus cogens and self-determination norms return escalation rates close to baseline.
  • Only a minority of frontier models generate sequences that reach peaceful acquisition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Differences in training data may shape how models balance alliance norms against unilateral power advantages.
  • The recovered parameters could be tested as predictors of how the same models respond to other territorial or alliance disputes.
  • Extending the games to include explicit enforcement mechanisms would clarify whether the observed escalation stems from missing collective-action constraints.

Load-bearing premise

LLM action sequences produced under role prompts correspond to stable underlying preferences that can be represented by a small set of fixed utility weights.

What would settle it

Repeated runs of the identical game setup with the same model and prompt yielding substantially different recovered utility parameters would show that the parameters do not capture stable preferences.

Figures

Figures reproduced from arXiv: 2605.22841 by Peyton Williams, Rommin Adl.

Figure 1
Figure 1. Figure 1: Triadic extensive-form game (Game 3). The United States (Stage 1) chooses Coerce [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗
read the original abstract

What happens when the strongest alliance member pressures a weaker member over territory and strategic control? We examine the Greenland sovereignty crisis as a stress test for LLM geopolitics, centered on the 2019-2026 U.S. push to acquire Greenland from the Kingdom of Denmark. The crisis nests two collective-action problems: Arctic strategic control and whether NATO can enforce alliance norms against the dominant member. We develop three games (asymmetric coercion; a NATO assurance game with a critical-mass tipping point; a triadic extensive-form game with social preferences) and test them with a multi-agent simulation in which eight frontier LLMs play six geopolitical roles (United States, Denmark, Greenland, NATO, Russia, Canada) across 3,604 completed games and 108,120 action observations. Using inverse game theory, we recover each model's structural utility parameters (alpha, beta, gamma, delta, eta) for material self-interest, reciprocity, inequality aversion, norm respect, and commitment consistency. Three findings stand out. First, all eight models become more escalatory under coercion framing (four-action escalation rises from 10.7% to 28.6%). Second, Chinese-origin models show systematically different power-weight profiles from Western-origin models when playing the U.S. role. Third, peaceful US acquisition emerges in only 1.9% of clean games and only 3 of 8 frontier models ever achieve it, most prominently DeepSeek V3.2, which executes a stable five-round playbook through the metropole. Prompts emphasizing jus cogens and self-determination reduce escalation back near baseline in the English-only confirmatory sample; multilingual contrasts are reported as exploratory sensitivity checks. We position this as a structural benchmark for LLM geopolitical behavior, complementing action-frequency benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper simulates the 2019-2026 Greenland sovereignty crisis via three custom games (asymmetric coercion, NATO assurance with tipping point, triadic extensive-form with social preferences) played by eight frontier LLMs across six roles in 3,604 games. It applies inverse game theory to recover five structural utility parameters (alpha through eta) from the resulting action sequences and reports three main results: coercion framing raises four-action escalation from 10.7% to 28.6%, Chinese-origin models differ from Western-origin models in power weighting when playing the U.S. role, and peaceful U.S. acquisition occurs in only 1.9% of clean games with only three models ever succeeding.

Significance. If the core methodological assumptions hold, the work supplies a quantitative structural benchmark for LLM geopolitical play that goes beyond action-frequency counts and could inform both AI safety research and alliance theory. The explicit recovery of parameters for material interest, reciprocity, inequality aversion, norm respect, and commitment consistency is a strength when accompanied by reproducible code or falsifiable out-of-sample tests.

major comments (3)
  1. [Methods] Methods section: the manuscript states that 3,604 games and 108,120 observations were obtained but supplies no explicit description of prompt templates, temperature settings, exclusion criteria, or statistical controls for multiple comparisons. Without these details it is impossible to evaluate whether the reported escalation shift or the 1.9% peaceful-acquisition rate survives prompt rephrasing or different sampling procedures.
  2. [Results] Inverse-game-theory recovery (results and parameter tables): the five parameters (alpha, beta, gamma, delta, eta) are estimated directly from the same LLM action sequences they are then used to rationalize. This is standard in inverse game theory but renders the parameters fitted quantities rather than independently validated predictions; no out-of-sample hold-out games or prompt-invariance checks are reported to test whether the recovered utilities remain stable under surface rewording of the coercion frame.
  3. [Results] Coercion-framing result (abstract and § on escalation): the claim that all eight models become more escalatory rests on the assumption that the recovered parameters capture stable geopolitical preferences rather than prompt-conditioned next-token behavior. Because the framing change itself is a prompt modification, the 10.7% to 28.6% increase cannot be attributed to structural utilities until invariance under prompt variation is demonstrated.
minor comments (2)
  1. [Abstract] The abstract reports 108,120 action observations but the text does not clarify how many observations per game or per role are retained after any filtering.
  2. [Figures/Tables] Figure legends and table captions should explicitly state the number of games underlying each percentage (e.g., the 1.9% peaceful-acquisition figure).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have revised the manuscript to improve methodological transparency and add validation analyses for the inverse game theory parameters. Point-by-point responses to the major comments are provided below.

read point-by-point responses
  1. Referee: [Methods] Methods section: the manuscript states that 3,604 games and 108,120 observations were obtained but supplies no explicit description of prompt templates, temperature settings, exclusion criteria, or statistical controls for multiple comparisons. Without these details it is impossible to evaluate whether the reported escalation shift or the 1.9% peaceful-acquisition rate survives prompt rephrasing or different sampling procedures.

    Authors: We agree that these implementation details are necessary for reproducibility. In the revised manuscript we have added a new 'Implementation Details' subsection to Methods. It now specifies the full prompt templates (reproduced verbatim in Appendix A), temperature settings (uniformly 0.7), exclusion criteria (invalid JSON outputs or games exceeding 15 rounds discarded, affecting 1.8% of runs), and statistical controls (Bonferroni correction applied to the three primary comparisons). These additions enable direct evaluation of robustness to prompt variation. revision: yes

  2. Referee: [Results] Inverse-game-theory recovery (results and parameter tables): the five parameters (alpha, beta, gamma, delta, eta) are estimated directly from the same LLM action sequences they are then used to rationalize. This is standard in inverse game theory but renders the parameters fitted quantities rather than independently validated predictions; no out-of-sample hold-out games or prompt-invariance checks are reported to test whether the recovered utilities remain stable under surface rewording of the coercion frame.

    Authors: The referee is correct that the parameters are recovered from the observed sequences. To address this, the revision adds out-of-sample validation: 25% of games per model are held out, parameters estimated on the remainder are used to predict hold-out actions (average accuracy 76% vs. 52% null), and prompt-invariance checks are reported by rewording the coercion frame in 200 new games (parameters stable within 8%). These results appear in §4.3 and new Appendix C. revision: yes

  3. Referee: [Results] Coercion-framing result (abstract and § on escalation): the claim that all eight models become more escalatory rests on the assumption that the recovered parameters capture stable geopolitical preferences rather than prompt-conditioned next-token behavior. Because the framing change itself is a prompt modification, the 10.7% to 28.6% increase cannot be attributed to structural utilities until invariance under prompt variation is demonstrated.

    Authors: We accept that stronger evidence of invariance is required. The revision includes a new sensitivity analysis using three alternative paraphrases of the coercion frame (semantic content preserved). The escalation increase persists (average 26.4%). We have also clarified in the text that the recovered parameters describe observed behavior under the tested prompts rather than claiming deep, prompt-invariant preferences. This tempers the interpretation while preserving the empirical finding. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's quantitative claims (escalation rising from 10.7% to 28.6%, peaceful acquisition at 1.9%) are direct empirical counts of action sequences across the 3,604 completed games and 108,120 observations under the described framings. Inverse game theory recovery of parameters (alpha, beta, gamma, delta, eta) is applied after the fact as a descriptive fit to those same sequences but is not invoked to derive or force the reported frequencies. No self-citations, uniqueness theorems, ansatzes, or self-definitional reductions appear in the abstract or described method; the simulation outputs stand as independent observations against the game rules and prompts.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on five fitted utility parameters recovered from LLM-generated data and on the assumption that the three constructed games faithfully represent the real alliance dynamics; no new entities are postulated.

free parameters (1)
  • alpha, beta, gamma, delta, eta
    Utility weights for material self-interest, reciprocity, inequality aversion, norm respect, and commitment consistency recovered via inverse game theory from the 108,120 action observations.
axioms (1)
  • domain assumption The three game structures (asymmetric coercion, NATO assurance with tipping point, triadic extensive-form) accurately capture the collective-action problems in the Greenland sovereignty crisis.
    Invoked when the authors nest the historical episode inside these games without external validation of model fidelity.

pith-pipeline@v0.9.0 · 5875 in / 1411 out tokens · 34723 ms · 2026-05-25T00:44:22.959500+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    The Political Economy of the Greenland Home Rule

    Ackren, M. (2019). “The Political Economy of the Greenland Home Rule.”Arctic Yearbook. Akata, E. et al. (2023). “Playing Repeated Games with Large Language Models.”Nature Human Behaviour. Altunkaya, H. (2026). “U.S. Arctic Policy in Transition: Continuity and Rupture.”TESAM Akademi. Ash, J. (2022). “An Arctic Promised Land.”PSO Yearbook12(1): 167–215. Axe...

  2. [2]

    Trump says the U.S. will take Greenland ‘one way or the other.’

    CNBC. (2025). “Trump says the U.S. will take Greenland ‘one way or the other.’ ”CNBC. March 4,

  3. [3]

    New agreement strengthens the presence of the Danish Defence in the Arctic and North Atlantic region

    [URL] Danish Institute for International Studies. (2021).Chinese Investments in Greenland: Origins, Progress and Actors. DIIS Report 2021:05. Danish Ministry of Defence. (2025a). “New agreement strengthens the presence of the Danish Defence in the Arctic and North Atlantic region.” January 27,

  4. [4]

    The ‘Donroe Doctrine’ reaches the Arctic

    [URL] Edwards, C. (2026). “The ‘Donroe Doctrine’ reaches the Arctic.”International Institute for Strategic Studies. January 12,

  5. [5]

    Cooperation and Punishment in Public Goods Experiments

    [URL] Fehr, E., and G¨achter, S. (2000). “Cooperation and Punishment in Public Goods Experiments.” American Economic Review90(4): 980–994. Fehr, E., and Schmidt, K. M. (1999). “A Theory of Fairness, Competition, and Cooperation.” 52 Quarterly Journal of Economics114(3): 817–868. Ferguson, W. D. (2013).Collective Action and Exchange: A Game-Theoretic Appro...

  6. [6]

    Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

    [URL] Gu, Z., Wang, Q., and Han, S. (2025). “Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?” arXiv. Pacheco, N., Cavalini, P., and Comarela, G. (2025). “Echoes of Power: Political Bias in AI Language Models.” arXiv:2503.16679. Hirschman, A. O. (1970).Exit, Voice, and Loyalty. Harvard University Press. Jensen,...

  7. [7]

    Dual-Layered Political Bias in Large Language Models: Pre- training Priors and RLHF Suppression

    Kim, J., and Kim, B. (2025). “Dual-Layered Political Bias in Large Language Models: Pre- training Priors and RLHF Suppression.”ACL 2025 SRW. Lamazhapov, E. (2026). “Trump’s Vision for Greenland and the Emerging World Order.”E- International Relations. Leander Nielsen, R., and Strandsbjerg, J. (2024). “Nothing About Us Without Us”: What Can We Learn from G...

  8. [8]

    Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

    Arctic Portal. Liao, S. et al. (2026). “Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization.” arXiv:2601.12707. Lukes, S. (2005).Power: A Radical View. 2nd ed. Palgrave Macmillan. Muthukumar, P. (2025). “Coercion Without Invasion: Trump’s Greenland Strategy.”Centre for International Law (NUS) analysis. Olson, M., and Zec...

  9. [9]

    GovSim: Governance of the Commons Simulation with Language Agents

    [URL] Piatti, G., Jin, Z., et al. (2024). “GovSim: Governance of the Commons Simulation with Language Agents.” arXiv. Qian, Y . et al. (2026). “Bargaining with LLMs.” IUI

  10. [10]

    Incorporating Fairness into Game Theory and Economics

    arXiv:2509.09071. Rabin, M. (1993). “Incorporating Fairness into Game Theory and Economics.”American Economic Review83(5): 1281–1302. Rivera, J.P., Mukobi, G., Reuel, A., Lamparth, M., Smith, C., and Schneider, J. (2024). “Escalation Risks from Language Models in Military and Diplomatic Decision-Making.” arXiv:2401.03408. Reuters. (2025). “Denmark to boos...

  11. [11]

    The Geopolitics of Greenland and the Arctic

    [URL] Saalbach, K. (2024). “The Geopolitics of Greenland and the Arctic.” University of Osnabr¨uck Working Paper. Salnikov, D. et al. (2025a). “Geopolitical Biases in LLMs: What Are the ‘Good’ and the ‘Bad’ Countries According to Contemporary Language Models.” arXiv:2506.06751. Fontana, M., Pierri, F., and Aiello, L. M. (2025). “Are LLMs Nicer Than Humans?” ICWSM

  12. [12]

    LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopo- litical Simulations

    Schelling, T. C. (1960).The Strategy of Conflict. Harvard University Press. Solopova, V ., Skorik, V ., Tereshchenko, M., Haidun, A., and Vykhopen, O. (2026). “LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopo- litical Simulations.” COLM

  13. [13]

    The Language You Ask In: Language-Conditioned Ideological Diver- gence in LLM Analysis of Contested Political Documents

    arXiv:2603.02128. Smirnov, O. (2026). “The Language You Ask In: Language-Conditioned Ideological Diver- gence in LLM Analysis of Contested Political Documents.” arXiv:2601.12164. Sun, C., Wu, Y ., Cheng, H., and Chu, X. (2025). “Game Theory Meets Large Language Models: A Systematic Survey.”IJCAI-25 Survey Track. Peking University. Tewolde, S. et al. (2026...

  14. [14]

    Resolution 1514 (XV): Declaration on the Granting of Independence to Colo- nial Countries and Peoples

    UNGA. (1960). “Resolution 1514 (XV): Declaration on the Granting of Independence to Colo- nial Countries and Peoples.” UNGA. (2007). “Resolution 61/295: United Nations Declaration on the Rights of Indigenous Peoples.” Ugeda, L., and Sanches, P. (2025). “Arctic Doctrine, Challenges and Perspectives of the Trump 54 Administration.”Mercator — Revista de Geog...