Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test
Pith reviewed 2026-05-25 00:44 UTC · model grok-4.3
The pith
Frontier LLMs escalate more under coercion and rarely allow peaceful U.S. acquisition of Greenland in simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In 3,604 completed games, coercion framing raised the rate of four-action escalation sequences from 10.7 percent to 28.6 percent across all eight models; peaceful U.S. acquisition occurred in only 1.9 percent of clean games and was produced by only three of the eight models.
What carries the argument
Inverse game theory recovering five structural utility parameters (alpha through eta) from action sequences generated by a multi-agent simulation of asymmetric coercion, NATO assurance, and triadic extensive-form games.
If this is right
- Coercion framing produces higher escalation rates in every tested model.
- Models trained in different regions display distinct power-weight profiles when assigned the U.S. role.
- Prompts that invoke jus cogens and self-determination norms return escalation rates close to baseline.
- Only a minority of frontier models generate sequences that reach peaceful acquisition.
Where Pith is reading between the lines
- Differences in training data may shape how models balance alliance norms against unilateral power advantages.
- The recovered parameters could be tested as predictors of how the same models respond to other territorial or alliance disputes.
- Extending the games to include explicit enforcement mechanisms would clarify whether the observed escalation stems from missing collective-action constraints.
Load-bearing premise
LLM action sequences produced under role prompts correspond to stable underlying preferences that can be represented by a small set of fixed utility weights.
What would settle it
Repeated runs of the identical game setup with the same model and prompt yielding substantially different recovered utility parameters would show that the parameters do not capture stable preferences.
Figures
read the original abstract
What happens when the strongest alliance member pressures a weaker member over territory and strategic control? We examine the Greenland sovereignty crisis as a stress test for LLM geopolitics, centered on the 2019-2026 U.S. push to acquire Greenland from the Kingdom of Denmark. The crisis nests two collective-action problems: Arctic strategic control and whether NATO can enforce alliance norms against the dominant member. We develop three games (asymmetric coercion; a NATO assurance game with a critical-mass tipping point; a triadic extensive-form game with social preferences) and test them with a multi-agent simulation in which eight frontier LLMs play six geopolitical roles (United States, Denmark, Greenland, NATO, Russia, Canada) across 3,604 completed games and 108,120 action observations. Using inverse game theory, we recover each model's structural utility parameters (alpha, beta, gamma, delta, eta) for material self-interest, reciprocity, inequality aversion, norm respect, and commitment consistency. Three findings stand out. First, all eight models become more escalatory under coercion framing (four-action escalation rises from 10.7% to 28.6%). Second, Chinese-origin models show systematically different power-weight profiles from Western-origin models when playing the U.S. role. Third, peaceful US acquisition emerges in only 1.9% of clean games and only 3 of 8 frontier models ever achieve it, most prominently DeepSeek V3.2, which executes a stable five-round playbook through the metropole. Prompts emphasizing jus cogens and self-determination reduce escalation back near baseline in the English-only confirmatory sample; multilingual contrasts are reported as exploratory sensitivity checks. We position this as a structural benchmark for LLM geopolitical behavior, complementing action-frequency benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper simulates the 2019-2026 Greenland sovereignty crisis via three custom games (asymmetric coercion, NATO assurance with tipping point, triadic extensive-form with social preferences) played by eight frontier LLMs across six roles in 3,604 games. It applies inverse game theory to recover five structural utility parameters (alpha through eta) from the resulting action sequences and reports three main results: coercion framing raises four-action escalation from 10.7% to 28.6%, Chinese-origin models differ from Western-origin models in power weighting when playing the U.S. role, and peaceful U.S. acquisition occurs in only 1.9% of clean games with only three models ever succeeding.
Significance. If the core methodological assumptions hold, the work supplies a quantitative structural benchmark for LLM geopolitical play that goes beyond action-frequency counts and could inform both AI safety research and alliance theory. The explicit recovery of parameters for material interest, reciprocity, inequality aversion, norm respect, and commitment consistency is a strength when accompanied by reproducible code or falsifiable out-of-sample tests.
major comments (3)
- [Methods] Methods section: the manuscript states that 3,604 games and 108,120 observations were obtained but supplies no explicit description of prompt templates, temperature settings, exclusion criteria, or statistical controls for multiple comparisons. Without these details it is impossible to evaluate whether the reported escalation shift or the 1.9% peaceful-acquisition rate survives prompt rephrasing or different sampling procedures.
- [Results] Inverse-game-theory recovery (results and parameter tables): the five parameters (alpha, beta, gamma, delta, eta) are estimated directly from the same LLM action sequences they are then used to rationalize. This is standard in inverse game theory but renders the parameters fitted quantities rather than independently validated predictions; no out-of-sample hold-out games or prompt-invariance checks are reported to test whether the recovered utilities remain stable under surface rewording of the coercion frame.
- [Results] Coercion-framing result (abstract and § on escalation): the claim that all eight models become more escalatory rests on the assumption that the recovered parameters capture stable geopolitical preferences rather than prompt-conditioned next-token behavior. Because the framing change itself is a prompt modification, the 10.7% to 28.6% increase cannot be attributed to structural utilities until invariance under prompt variation is demonstrated.
minor comments (2)
- [Abstract] The abstract reports 108,120 action observations but the text does not clarify how many observations per game or per role are retained after any filtering.
- [Figures/Tables] Figure legends and table captions should explicitly state the number of games underlying each percentage (e.g., the 1.9% peaceful-acquisition figure).
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We have revised the manuscript to improve methodological transparency and add validation analyses for the inverse game theory parameters. Point-by-point responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Methods] Methods section: the manuscript states that 3,604 games and 108,120 observations were obtained but supplies no explicit description of prompt templates, temperature settings, exclusion criteria, or statistical controls for multiple comparisons. Without these details it is impossible to evaluate whether the reported escalation shift or the 1.9% peaceful-acquisition rate survives prompt rephrasing or different sampling procedures.
Authors: We agree that these implementation details are necessary for reproducibility. In the revised manuscript we have added a new 'Implementation Details' subsection to Methods. It now specifies the full prompt templates (reproduced verbatim in Appendix A), temperature settings (uniformly 0.7), exclusion criteria (invalid JSON outputs or games exceeding 15 rounds discarded, affecting 1.8% of runs), and statistical controls (Bonferroni correction applied to the three primary comparisons). These additions enable direct evaluation of robustness to prompt variation. revision: yes
-
Referee: [Results] Inverse-game-theory recovery (results and parameter tables): the five parameters (alpha, beta, gamma, delta, eta) are estimated directly from the same LLM action sequences they are then used to rationalize. This is standard in inverse game theory but renders the parameters fitted quantities rather than independently validated predictions; no out-of-sample hold-out games or prompt-invariance checks are reported to test whether the recovered utilities remain stable under surface rewording of the coercion frame.
Authors: The referee is correct that the parameters are recovered from the observed sequences. To address this, the revision adds out-of-sample validation: 25% of games per model are held out, parameters estimated on the remainder are used to predict hold-out actions (average accuracy 76% vs. 52% null), and prompt-invariance checks are reported by rewording the coercion frame in 200 new games (parameters stable within 8%). These results appear in §4.3 and new Appendix C. revision: yes
-
Referee: [Results] Coercion-framing result (abstract and § on escalation): the claim that all eight models become more escalatory rests on the assumption that the recovered parameters capture stable geopolitical preferences rather than prompt-conditioned next-token behavior. Because the framing change itself is a prompt modification, the 10.7% to 28.6% increase cannot be attributed to structural utilities until invariance under prompt variation is demonstrated.
Authors: We accept that stronger evidence of invariance is required. The revision includes a new sensitivity analysis using three alternative paraphrases of the coercion frame (semantic content preserved). The escalation increase persists (average 26.4%). We have also clarified in the text that the recovered parameters describe observed behavior under the tested prompts rather than claiming deep, prompt-invariant preferences. This tempers the interpretation while preserving the empirical finding. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's quantitative claims (escalation rising from 10.7% to 28.6%, peaceful acquisition at 1.9%) are direct empirical counts of action sequences across the 3,604 completed games and 108,120 observations under the described framings. Inverse game theory recovery of parameters (alpha, beta, gamma, delta, eta) is applied after the fact as a descriptive fit to those same sequences but is not invoked to derive or force the reported frequencies. No self-citations, uniqueness theorems, ansatzes, or self-definitional reductions appear in the abstract or described method; the simulation outputs stand as independent observations against the game rules and prompts.
Axiom & Free-Parameter Ledger
free parameters (1)
- alpha, beta, gamma, delta, eta
axioms (1)
- domain assumption The three game structures (asymmetric coercion, NATO assurance with tipping point, triadic extensive-form) accurately capture the collective-action problems in the Greenland sovereignty crisis.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using inverse game theory, we recover each model’s structural utility parameters(α,β,γ,δ,η)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ui(at)=αΔxi(at)+βκtΔxj(at)−γ|Δxi−Δxj|−δ·norm viol−η·lies
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The Political Economy of the Greenland Home Rule
Ackren, M. (2019). “The Political Economy of the Greenland Home Rule.”Arctic Yearbook. Akata, E. et al. (2023). “Playing Repeated Games with Large Language Models.”Nature Human Behaviour. Altunkaya, H. (2026). “U.S. Arctic Policy in Transition: Continuity and Rupture.”TESAM Akademi. Ash, J. (2022). “An Arctic Promised Land.”PSO Yearbook12(1): 167–215. Axe...
-
[2]
Trump says the U.S. will take Greenland ‘one way or the other.’
CNBC. (2025). “Trump says the U.S. will take Greenland ‘one way or the other.’ ”CNBC. March 4,
work page 2025
-
[3]
New agreement strengthens the presence of the Danish Defence in the Arctic and North Atlantic region
[URL] Danish Institute for International Studies. (2021).Chinese Investments in Greenland: Origins, Progress and Actors. DIIS Report 2021:05. Danish Ministry of Defence. (2025a). “New agreement strengthens the presence of the Danish Defence in the Arctic and North Atlantic region.” January 27,
work page 2021
-
[4]
The ‘Donroe Doctrine’ reaches the Arctic
[URL] Edwards, C. (2026). “The ‘Donroe Doctrine’ reaches the Arctic.”International Institute for Strategic Studies. January 12,
work page 2026
-
[5]
Cooperation and Punishment in Public Goods Experiments
[URL] Fehr, E., and G¨achter, S. (2000). “Cooperation and Punishment in Public Goods Experiments.” American Economic Review90(4): 980–994. Fehr, E., and Schmidt, K. M. (1999). “A Theory of Fairness, Competition, and Cooperation.” 52 Quarterly Journal of Economics114(3): 817–868. Ferguson, W. D. (2013).Collective Action and Exchange: A Game-Theoretic Appro...
work page 2000
-
[6]
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
[URL] Gu, Z., Wang, Q., and Han, S. (2025). “Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?” arXiv. Pacheco, N., Cavalini, P., and Comarela, G. (2025). “Echoes of Power: Political Bias in AI Language Models.” arXiv:2503.16679. Hirschman, A. O. (1970).Exit, Voice, and Loyalty. Harvard University Press. Jensen,...
-
[7]
Dual-Layered Political Bias in Large Language Models: Pre- training Priors and RLHF Suppression
Kim, J., and Kim, B. (2025). “Dual-Layered Political Bias in Large Language Models: Pre- training Priors and RLHF Suppression.”ACL 2025 SRW. Lamazhapov, E. (2026). “Trump’s Vision for Greenland and the Emerging World Order.”E- International Relations. Leander Nielsen, R., and Strandsbjerg, J. (2024). “Nothing About Us Without Us”: What Can We Learn from G...
work page 2025
-
[8]
Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization
Arctic Portal. Liao, S. et al. (2026). “Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization.” arXiv:2601.12707. Lukes, S. (2005).Power: A Radical View. 2nd ed. Palgrave Macmillan. Muthukumar, P. (2025). “Coercion Without Invasion: Trump’s Greenland Strategy.”Centre for International Law (NUS) analysis. Olson, M., and Zec...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
GovSim: Governance of the Commons Simulation with Language Agents
[URL] Piatti, G., Jin, Z., et al. (2024). “GovSim: Governance of the Commons Simulation with Language Agents.” arXiv. Qian, Y . et al. (2026). “Bargaining with LLMs.” IUI
work page 2024
-
[10]
Incorporating Fairness into Game Theory and Economics
arXiv:2509.09071. Rabin, M. (1993). “Incorporating Fairness into Game Theory and Economics.”American Economic Review83(5): 1281–1302. Rivera, J.P., Mukobi, G., Reuel, A., Lamparth, M., Smith, C., and Schneider, J. (2024). “Escalation Risks from Language Models in Military and Diplomatic Decision-Making.” arXiv:2401.03408. Reuters. (2025). “Denmark to boos...
-
[11]
The Geopolitics of Greenland and the Arctic
[URL] Saalbach, K. (2024). “The Geopolitics of Greenland and the Arctic.” University of Osnabr¨uck Working Paper. Salnikov, D. et al. (2025a). “Geopolitical Biases in LLMs: What Are the ‘Good’ and the ‘Bad’ Countries According to Contemporary Language Models.” arXiv:2506.06751. Fontana, M., Pierri, F., and Aiello, L. M. (2025). “Are LLMs Nicer Than Humans?” ICWSM
-
[12]
Schelling, T. C. (1960).The Strategy of Conflict. Harvard University Press. Solopova, V ., Skorik, V ., Tereshchenko, M., Haidun, A., and Vykhopen, O. (2026). “LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopo- litical Simulations.” COLM
work page 1960
-
[13]
arXiv:2603.02128. Smirnov, O. (2026). “The Language You Ask In: Language-Conditioned Ideological Diver- gence in LLM Analysis of Contested Political Documents.” arXiv:2601.12164. Sun, C., Wu, Y ., Cheng, H., and Chu, X. (2025). “Game Theory Meets Large Language Models: A Systematic Survey.”IJCAI-25 Survey Track. Peking University. Tewolde, S. et al. (2026...
-
[14]
UNGA. (1960). “Resolution 1514 (XV): Declaration on the Granting of Independence to Colo- nial Countries and Peoples.” UNGA. (2007). “Resolution 61/295: United Nations Declaration on the Rights of Indigenous Peoples.” Ugeda, L., and Sanches, P. (2025). “Arctic Doctrine, Challenges and Perspectives of the Trump 54 Administration.”Mercator — Revista de Geog...
work page 1960
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.