Strategic Preemption Under Shared Catastrophic Risk: The Suicide Region and the Race to Artificial General Intelligence
Pith reviewed 2026-05-21 17:58 UTC · model grok-4.3
The pith
In the AGI race, shared existential risks cancel from players' decisions and create a suicide region of forced early deployment despite negative value.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the cost of global ruin is embedded in both players' payoffs, the risk term mathematically cancels from the equilibrium indifference condition of the continuous-time preemption game. This cancellation produces a suicide region in which competitive pressures force early AGI deployment even though the risk-adjusted net present value remains negative.
What carries the argument
The equilibrium indifference condition of the preemption game, from which the shared systemic ruin term cancels.
If this is right
- Warning shots or sub-existential disasters leave the winner-takes-all structure unchanged and therefore fail to slow acceleration.
- The race stops only when the cost of ruin is internalized to each player, making safety research economically necessary before deployment.
- A critical private liability threshold exists that restores the option value of waiting.
- Targeted liability or insurance mechanisms can shift the equilibrium back toward safer research sequences.
Where Pith is reading between the lines
- If the ruin cost is only partially shared or imperfectly correlated with speed, the cancellation weakens and some value of waiting may reappear.
- Treaties that assign ex-post liability for global harm could replicate the internalization effect without requiring perfect symmetry in D.
- The same cancellation logic may apply to other winner-takes-all technological races that carry correlated global downside.
Load-bearing premise
The model assumes a systemic ruin parameter D that is correlated with development velocity and shared globally across players.
What would settle it
Empirical observation that competing actors delay AGI deployment when they acknowledge correlated global catastrophe risks would contradict the cancellation result.
read the original abstract
We analyze a continuous-time preemption game with shared catastrophic externalities. When the cost of catastrophe is embedded in both players' payoffs, the risk term cancels out in the equilibrium indifference condition. This creates a "suicide region" where competitive pressures force rational agents to deploy despite negative risk-adjusted net present values. We apply this framework to the race for artificial general intelligence (AGI). We show that this suicide region widens as the cost of systemic ruin grows: higher catastrophic risk does not deter the race but instead enlarges the set of conditions under which rational actors deploy despite negative social value. We characterize the resulting welfare distortion against a social planner's benchmark and demonstrate how two complementary mechanisms - private liability and prize-sharing - can close the suicide region. Private liability raises the cost of unsafe deployment while prize-sharing reduces the strategic imperative to deploy first. "Warning shots" (sub-existential disasters) will fail to deter AGI acceleration, as the winner-takes-all nature of the race remains intact.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper models the race to AGI between two sovereign actors as a continuous-time preemption game with endogenous existential risk. A systemic ruin parameter D, correlated with development velocity and shared globally, is embedded in both players' payoffs. The central claim is that this risk term cancels from the equilibrium indifference condition between investing immediately (leader value) and waiting (follower continuation value), producing a 'suicide region' in which rational preemption forces early deployment despite negative risk-adjusted NPV. The manuscript further argues that sub-existential warning shots fail to deter acceleration and derives a private liability threshold plus mechanism-design interventions to restore the option value of waiting.
Significance. If the cancellation result is rigorously established, the work supplies a game-theoretic account of why observed AGI acceleration can be consistent with rational behavior under shared catastrophic risk, extending real-options analysis to races with global externalities. The proposed liability threshold and safety-research prerequisites offer concrete policy levers. The model is falsifiable via its predicted dependence of the suicide region on the correlation structure of D and velocity.
major comments (2)
- [§3.2, Eq. (15)] §3.2, Eq. (15) (indifference condition): the derivation asserts that D cancels because it enters symmetrically in leader and follower continuation values. However, the leader's stopping time τ_L precedes the follower's τ_F, so the integrated hazard rates over [0,τ] differ unless the post-deployment ruin probability is explicitly independent of role and velocity. The manuscript must display the explicit integral expressions for both continuation values and show that the D terms are identical after substitution; without this step the cancellation is not guaranteed by global sharing alone.
- [§4.1, Proposition 2] §4.1, Proposition 2 (suicide region): the existence of the region where NPV < 0 yet investment occurs is load-bearing on the cancellation result. If the D integrals do not cancel, the threshold reverts to the standard real-options form and the suicide region disappears. A direct comparison of the derived threshold with and without the symmetry assumption on D would clarify the scope of the result.
minor comments (2)
- [§2] Notation for the hazard rate λ(v) and its dependence on velocity v is introduced in §2 but used without re-statement in the continuation-value integrals of §3; a brief reminder equation would improve readability.
- [Figure 2] Figure 2 (investment-space diagram) labels the suicide region but does not indicate the numerical values of D and correlation parameter used to generate the boundaries; adding these parameters would allow readers to reproduce the plotted region.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which help clarify the scope of our cancellation result. We address each major comment below, indicating planned revisions to strengthen the exposition without altering the core model.
read point-by-point responses
-
Referee: [§3.2, Eq. (15)] §3.2, Eq. (15) (indifference condition): the derivation asserts that D cancels because it enters symmetrically in leader and follower continuation values. However, the leader's stopping time τ_L precedes the follower's τ_F, so the integrated hazard rates over [0,τ] differ unless the post-deployment ruin probability is explicitly independent of role and velocity. The manuscript must display the explicit integral expressions for both continuation values and show that the D terms are identical after substitution; without this step the cancellation is not guaranteed by global sharing alone.
Authors: We agree that explicit verification is needed. In the model, D represents a global existential cost realized upon AGI deployment by either player, with the hazard rate λ(t) driven by cumulative velocity up to the first stopping time. Because the post-deployment ruin is triggered globally and independently of which actor leads (the deployed AGI affects the shared world state), the integrated term -∫ D · λ(s) ds from 0 to τ_L in the leader value equals the corresponding term in the follower continuation value after the leader's deployment (the follower then faces the same global D from τ_L onward). We will insert the full integral expressions for V_L and V_F immediately before Eq. (15), substitute the common D factor, and demonstrate algebraic cancellation under the global-sharing assumption. This addition will also note the modeling choice that post-deployment risk does not depend on role. revision: yes
-
Referee: [§4.1, Proposition 2] §4.1, Proposition 2 (suicide region): the existence of the region where NPV < 0 yet investment occurs is load-bearing on the cancellation result. If the D integrals do not cancel, the threshold reverts to the standard real-options form and the suicide region disappears. A direct comparison of the derived threshold with and without the symmetry assumption on D would clarify the scope of the result.
Authors: We accept that the suicide region is conditional on the cancellation. In the revised manuscript we will add a short subsection after Proposition 2 that compares the equilibrium investment threshold under (i) the baseline global D with role-independent post-deployment ruin (yielding the suicide region where investment occurs for NPV < 0) and (ii) a counterfactual where D is either private or role-dependent (in which case the threshold reverts to the standard real-options form with no suicide region). This comparison will be presented both analytically and via a numerical illustration to delineate the precise conditions under which the result holds. revision: yes
Circularity Check
Cancellation of systemic ruin parameter D from indifference condition is imposed by symmetric global-sharing assumption rather than derived from differential stopping times
specific steps
-
self definitional
[Abstract]
"As the disutility of catastrophe is embedded in both players' payoffs, the risk term mathematically cancels out of the equilibrium indifference condition. This creates a 'suicide region' in the investment space where competitive pressures force rational agents to deploy AGI systems early, despite a negative risk-adjusted net present value."
The paper claims the risk term cancels because D is embedded in both payoffs, directly yielding the suicide region. In a preemption game, however, leader and follower continuation values integrate the hazard over different intervals (earlier deployment for leader). Global sharing alone does not equate these integrals unless the model additionally assumes the ruin probability is role-independent and identical regardless of who moves first. The cancellation is therefore imposed by the symmetric embedding assumption rather than derived, making the suicide region result reduce to that modeling choice.
full rationale
The paper's central result—the suicide region where agents deploy despite negative NPV—rests on the claim that D cancels from the equilibrium indifference condition because it is embedded in both players' payoffs. This cancellation is asserted to follow from D being systemic, correlated with velocity, and shared globally. However, in a continuous-time preemption game the leader's stopping time precedes the follower's, so the continuation values contain distinct integrals over the hazard rate. Cancellation therefore requires an additional modeling restriction that the post-deployment ruin probability is independent of role and identical for both players. The abstract presents this cancellation as a mathematical consequence of embedding, but the skeptic analysis shows it is not automatic from global sharing alone. The result is therefore partially forced by the choice of how D enters the leader and follower values, producing a circularity score of 6.
Axiom & Free-Parameter Ledger
free parameters (1)
- systemic ruin parameter D
axioms (1)
- domain assumption AGI race modeled as continuous-time preemption game between sovereign actors
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
As the disutility of catastrophe is embedded in both players' payoffs, the risk term mathematically cancels out of the equilibrium indifference condition... V_P^* = I / ((1-2S) π(τ))
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The race is modelled as a symmetric, continuous-time stochastic game... payoff structures (3) and (4)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.