Computing Equilibria in Games with Stochastic Action Sets
Pith reviewed 2026-05-15 21:29 UTC · model grok-4.3
The pith
Nash equilibria in two-player zero-sum games with stochastic action sets can be represented by a vector the size of the action set.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the assumption that action availabilities are independent between players, Nash equilibria in two-player zero-sum Games with Stochastic Action Sets can be compactly represented by a vector of size |A_i|. The SI-MWU algorithm minimizes sleeping internal regret and converges to these equilibria with high probability at rate O(sqrt(log |A_i| / T)). The sequence of SI-MWU iterates can be fed into a stochastic approximation procedure that recovers the compact Nash equilibrium.
What carries the argument
SI-MWU, an online algorithm that minimizes sleeping internal regret so that its iterates converge to the compactly represented equilibrium without enumerating action subsets.
If this is right
- Equilibrium computation and representation avoid the exponential blow-up that would otherwise be caused by enumerating all feasible action subsets.
- Regret bounds depend only logarithmically on the size of the underlying action set rather than on the number of subsets.
- The stochastic approximation recovery step turns online iterates into an exact compact equilibrium description.
- The same compact vector can be used directly as a mixed strategy without maintaining subset-level probabilities.
Where Pith is reading between the lines
- The independence assumption may permit analogous compact representations for other equilibrium notions such as coarse correlated equilibria.
- The approach could scale equilibrium finding to real-world settings such as resource allocation or bidding where availability is random but independent.
- Relaxing independence to allow limited correlation might still preserve polynomial-size representations if the correlation structure is low-rank.
Load-bearing premise
The random restrictions on each player's actions must be chosen independently of the restrictions on every other player.
What would settle it
Exhibit a concrete two-player zero-sum instance in which action availabilities are correlated across players and no vector whose length equals |A_i| represents a Nash equilibrium, or simulate SI-MWU on an independent instance and check whether the observed convergence rate matches O(sqrt(log |A_i| / T)).
read the original abstract
The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, players' available actions might be restricted due to exogenous stochasticity. To model this setting, for a game $\mathcal{G}_{\mathrm{orig}}$ with action set $A_i$ for each player $i$, we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets $\mathcal{S}_i \subseteq 2^{\vert A_i\vert}\backslash\{\varnothing\}$. In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, and existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size $\vert A_i\vert$, overcoming the na\"ive exponential-sized representation. Computationally, we introduce an efficient algorithm called SI-MWU that minimizes sleeping internal regret, converging to NE with high probability in 2p0s-GSAS with rate $O(\sqrt{\log\vert A_i\vert/T})$. Finally, using the SI-MWU iterates, we develop a procedure based on stochastic approximation to recover compactly represented NE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Games with Stochastic Action Sets (GSAS), extending standard games by drawing each player's realized action set stochastically from a distribution over non-empty subsets of A_i. Under the assumption that these realizations are chosen independently across players, it proves that Nash equilibria of two-player zero-sum GSAS admit a compact representation as a single vector x in the simplex over the full action set A_i, with strategies obtained by renormalization onto each realized subset. The SI-MWU algorithm is introduced to minimize sleeping internal regret and is shown to produce iterates that converge to such equilibria with high probability at rate O(sqrt(log |A_i| / T)); a stochastic-approximation post-processing step then recovers the compact equilibrium vector from the iterates.
Significance. If the representation theorem and regret bound hold, the work supplies a computationally tractable method for equilibrium computation in a practically relevant class of games that capture exogenous stochastic action availability. The compact |A_i|-sized encoding is a clear theoretical advance over the naive exponential representation, and the convergence rate is consistent with existing sleeping-regret analyses. The reliance on standard regret-minimization primitives is a methodological strength that may facilitate extensions to other stochastic game models.
major comments (1)
- [Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.
minor comments (3)
- [Abstract] Abstract: the notation for the distribution over subsets (parametrized by a probability distribution over S_i) should be introduced with one clarifying sentence on support and independence.
- [Algorithm section] Algorithm description: SI-MWU should be expanded on first use (e.g., Sleeping Internal Multiplicative Weights Update) and the precise sleeping-regret definition referenced to a standard citation.
- [Abstract / Convergence theorem] Convergence statement: the high-probability qualifier on the O(sqrt(log |A_i| / T)) rate should be tied to a specific theorem number rather than left in the abstract.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of the work's significance, and recommendation for minor revision. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.
Authors: We agree that the independence assumption is essential to the representation theorem and that making the decoupling explicit strengthens the presentation. The proof of Theorem 1 already invokes independence to show that the conditional expected payoff factors as the inner product of the renormalized strategies (i.e., the expectation over independent subset realizations separates, so that the best-response condition for the compact x holds uniformly for every pair of realized subsets). To address the concern directly, we will add a displayed equation in the revised manuscript that isolates this step: under independence, E[u_i(x_{S_i}, x_{S_j}) | S_i, S_j] = x_{S_i}^T A x_{S_j} for the renormalized vectors, confirming that the single vector x simultaneously satisfies the Nash condition across all realizations. This clarifies the size reduction without altering the result. revision: yes
Circularity Check
No significant circularity; compact representation and regret bound follow directly from stated independence assumption plus standard sleeping-regret analysis
full rationale
The derivation begins with an explicit modeling assumption (action availabilities independent across players) and shows that this decouples conditional payoffs, permitting any equilibrium to be encoded by a single vector x in the simplex over A_i whose renormalization onto each realized S_i yields the strategy. The SI-MWU algorithm is defined as a multiplicative-weights procedure that controls sleeping internal regret; its O(sqrt(log |A_i|/T)) rate is obtained by the usual potential-function argument applied to the sleeping-expert setting. No equation equates a fitted quantity to a claimed prediction, no load-bearing step rests on a self-citation, and no uniqueness theorem is imported from prior work by the same authors. The chain is therefore self-contained against external regret-minimization results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Action availabilities are drawn independently across players
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size |A_i|
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SI-MWU that minimizes sleeping internal regret, converging to NE with high probability in 2p0s-GSAS with rate O(sqrt(log |A_i|/T))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.