pith. sign in

arxiv: 2602.16234 · v2 · submitted 2026-02-18 · 💻 cs.GT

Computing Equilibria in Games with Stochastic Action Sets

Pith reviewed 2026-05-15 21:29 UTC · model grok-4.3

classification 💻 cs.GT
keywords stochastic action setsNash equilibriumzero-sum gamesregret minimizationmultiplicative weights updateonline learninggame theory
0
0 comments X

The pith

Nash equilibria in two-player zero-sum games with stochastic action sets can be represented by a vector the size of the action set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Games with Stochastic Action Sets to capture scenarios where each player's available actions are randomly restricted by an exogenous process. In the standard formulation the space of strategies and equilibria grows exponentially with the number of possible subsets, rendering computation intractable. Under the assumption that action restrictions occur independently across players, the authors show that equilibria in two-player zero-sum instances admit a compact representation whose dimension equals the number of original actions. They also supply an algorithm, SI-MWU, that minimizes sleeping internal regret and converges to equilibrium at rate O(sqrt(log |A_i| / T)), after which a stochastic approximation step recovers the equilibrium vector.

Core claim

Under the assumption that action availabilities are independent between players, Nash equilibria in two-player zero-sum Games with Stochastic Action Sets can be compactly represented by a vector of size |A_i|. The SI-MWU algorithm minimizes sleeping internal regret and converges to these equilibria with high probability at rate O(sqrt(log |A_i| / T)). The sequence of SI-MWU iterates can be fed into a stochastic approximation procedure that recovers the compact Nash equilibrium.

What carries the argument

SI-MWU, an online algorithm that minimizes sleeping internal regret so that its iterates converge to the compactly represented equilibrium without enumerating action subsets.

If this is right

  • Equilibrium computation and representation avoid the exponential blow-up that would otherwise be caused by enumerating all feasible action subsets.
  • Regret bounds depend only logarithmically on the size of the underlying action set rather than on the number of subsets.
  • The stochastic approximation recovery step turns online iterates into an exact compact equilibrium description.
  • The same compact vector can be used directly as a mixed strategy without maintaining subset-level probabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The independence assumption may permit analogous compact representations for other equilibrium notions such as coarse correlated equilibria.
  • The approach could scale equilibrium finding to real-world settings such as resource allocation or bidding where availability is random but independent.
  • Relaxing independence to allow limited correlation might still preserve polynomial-size representations if the correlation structure is low-rank.

Load-bearing premise

The random restrictions on each player's actions must be chosen independently of the restrictions on every other player.

What would settle it

Exhibit a concrete two-player zero-sum instance in which action availabilities are correlated across players and no vector whose length equals |A_i| represents a Nash equilibrium, or simulate SI-MWU on an independent instance and check whether the observed convergence rate matches O(sqrt(log |A_i| / T)).

read the original abstract

The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, players' available actions might be restricted due to exogenous stochasticity. To model this setting, for a game $\mathcal{G}_{\mathrm{orig}}$ with action set $A_i$ for each player $i$, we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets $\mathcal{S}_i \subseteq 2^{\vert A_i\vert}\backslash\{\varnothing\}$. In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, and existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size $\vert A_i\vert$, overcoming the na\"ive exponential-sized representation. Computationally, we introduce an efficient algorithm called SI-MWU that minimizes sleeping internal regret, converging to NE with high probability in 2p0s-GSAS with rate $O(\sqrt{\log\vert A_i\vert/T})$. Finally, using the SI-MWU iterates, we develop a procedure based on stochastic approximation to recover compactly represented NE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript introduces Games with Stochastic Action Sets (GSAS), extending standard games by drawing each player's realized action set stochastically from a distribution over non-empty subsets of A_i. Under the assumption that these realizations are chosen independently across players, it proves that Nash equilibria of two-player zero-sum GSAS admit a compact representation as a single vector x in the simplex over the full action set A_i, with strategies obtained by renormalization onto each realized subset. The SI-MWU algorithm is introduced to minimize sleeping internal regret and is shown to produce iterates that converge to such equilibria with high probability at rate O(sqrt(log |A_i| / T)); a stochastic-approximation post-processing step then recovers the compact equilibrium vector from the iterates.

Significance. If the representation theorem and regret bound hold, the work supplies a computationally tractable method for equilibrium computation in a practically relevant class of games that capture exogenous stochastic action availability. The compact |A_i|-sized encoding is a clear theoretical advance over the naive exponential representation, and the convergence rate is consistent with existing sleeping-regret analyses. The reliance on standard regret-minimization primitives is a methodological strength that may facilitate extensions to other stochastic game models.

major comments (1)
  1. [Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.
minor comments (3)
  1. [Abstract] Abstract: the notation for the distribution over subsets (parametrized by a probability distribution over S_i) should be introduced with one clarifying sentence on support and independence.
  2. [Algorithm section] Algorithm description: SI-MWU should be expanded on first use (e.g., Sleeping Internal Multiplicative Weights Update) and the precise sleeping-regret definition referenced to a standard citation.
  3. [Abstract / Convergence theorem] Convergence statement: the high-probability qualifier on the O(sqrt(log |A_i| / T)) rate should be tied to a specific theorem number rather than left in the abstract.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of the work's significance, and recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.

    Authors: We agree that the independence assumption is essential to the representation theorem and that making the decoupling explicit strengthens the presentation. The proof of Theorem 1 already invokes independence to show that the conditional expected payoff factors as the inner product of the renormalized strategies (i.e., the expectation over independent subset realizations separates, so that the best-response condition for the compact x holds uniformly for every pair of realized subsets). To address the concern directly, we will add a displayed equation in the revised manuscript that isolates this step: under independence, E[u_i(x_{S_i}, x_{S_j}) | S_i, S_j] = x_{S_i}^T A x_{S_j} for the renormalized vectors, confirming that the single vector x simultaneously satisfies the Nash condition across all realizations. This clarifies the size reduction without altering the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; compact representation and regret bound follow directly from stated independence assumption plus standard sleeping-regret analysis

full rationale

The derivation begins with an explicit modeling assumption (action availabilities independent across players) and shows that this decouples conditional payoffs, permitting any equilibrium to be encoded by a single vector x in the simplex over A_i whose renormalization onto each realized S_i yields the strategy. The SI-MWU algorithm is defined as a multiplicative-weights procedure that controls sleeping internal regret; its O(sqrt(log |A_i|/T)) rate is obtained by the usual potential-function argument applied to the sleeping-expert setting. No equation equates a fitted quantity to a claimed prediction, no load-bearing step rests on a self-citation, and no uniqueness theorem is imported from prior work by the same authors. The chain is therefore self-contained against external regret-minimization results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the independence of action-availability draws across players and on standard no-regret bounds for sleeping experts; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Action availabilities are drawn independently across players
    Invoked to obtain the compact vector representation of size |A_i| and the regret convergence rate.

pith-pipeline@v0.9.0 · 5540 in / 1242 out tokens · 19071 ms · 2026-05-15T21:29:58.113287+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.