Computing Equilibria in Games with Stochastic Action Sets

Chun Kai Ling; Ryann Sim; Thomas Schwarz

arxiv: 2602.16234 · v2 · submitted 2026-02-18 · 💻 cs.GT

Computing Equilibria in Games with Stochastic Action Sets

Thomas Schwarz , Ryann Sim , Chun Kai Ling This is my paper

Pith reviewed 2026-05-15 21:29 UTC · model grok-4.3

classification 💻 cs.GT

keywords stochastic action setsNash equilibriumzero-sum gamesregret minimizationmultiplicative weights updateonline learninggame theory

0 comments

The pith

Nash equilibria in two-player zero-sum games with stochastic action sets can be represented by a vector the size of the action set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Games with Stochastic Action Sets to capture scenarios where each player's available actions are randomly restricted by an exogenous process. In the standard formulation the space of strategies and equilibria grows exponentially with the number of possible subsets, rendering computation intractable. Under the assumption that action restrictions occur independently across players, the authors show that equilibria in two-player zero-sum instances admit a compact representation whose dimension equals the number of original actions. They also supply an algorithm, SI-MWU, that minimizes sleeping internal regret and converges to equilibrium at rate O(sqrt(log |A_i| / T)), after which a stochastic approximation step recovers the equilibrium vector.

Core claim

Under the assumption that action availabilities are independent between players, Nash equilibria in two-player zero-sum Games with Stochastic Action Sets can be compactly represented by a vector of size |A_i|. The SI-MWU algorithm minimizes sleeping internal regret and converges to these equilibria with high probability at rate O(sqrt(log |A_i| / T)). The sequence of SI-MWU iterates can be fed into a stochastic approximation procedure that recovers the compact Nash equilibrium.

What carries the argument

SI-MWU, an online algorithm that minimizes sleeping internal regret so that its iterates converge to the compactly represented equilibrium without enumerating action subsets.

If this is right

Equilibrium computation and representation avoid the exponential blow-up that would otherwise be caused by enumerating all feasible action subsets.
Regret bounds depend only logarithmically on the size of the underlying action set rather than on the number of subsets.
The stochastic approximation recovery step turns online iterates into an exact compact equilibrium description.
The same compact vector can be used directly as a mixed strategy without maintaining subset-level probabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The independence assumption may permit analogous compact representations for other equilibrium notions such as coarse correlated equilibria.
The approach could scale equilibrium finding to real-world settings such as resource allocation or bidding where availability is random but independent.
Relaxing independence to allow limited correlation might still preserve polynomial-size representations if the correlation structure is low-rank.

Load-bearing premise

The random restrictions on each player's actions must be chosen independently of the restrictions on every other player.

What would settle it

Exhibit a concrete two-player zero-sum instance in which action availabilities are correlated across players and no vector whose length equals |A_i| represents a Nash equilibrium, or simulate SI-MWU on an independent instance and check whether the observed convergence rate matches O(sqrt(log |A_i| / T)).

read the original abstract

The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, players' available actions might be restricted due to exogenous stochasticity. To model this setting, for a game $\mathcal{G}_{\mathrm{orig}}$ with action set $A_i$ for each player $i$, we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets $\mathcal{S}_i \subseteq 2^{\vert A_i\vert}\backslash\{\varnothing\}$. In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, and existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size $\vert A_i\vert$, overcoming the na\"ive exponential-sized representation. Computationally, we introduce an efficient algorithm called SI-MWU that minimizes sleeping internal regret, converging to NE with high probability in 2p0s-GSAS with rate $O(\sqrt{\log\vert A_i\vert/T})$. Finally, using the SI-MWU iterates, we develop a procedure based on stochastic approximation to recover compactly represented NE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a compact vector representation for NE in 2p0s games with stochastic action sets plus an SI-MWU algorithm that works when availabilities are independent across players.

read the letter

The main thing to know is that this paper introduces a model for games where players' action sets are stochastic, and under the assumption that these sets are chosen independently for each player, it gives a compact representation for equilibria in the two-player zero-sum case along with an algorithm to compute them. What is new is the GSAS framework itself, which parametrizes the game by distributions over possible action subsets for each player. They show that independence allows representing a Nash equilibrium by a single vector in the simplex over all actions, rather than having to specify probabilities for every possible subset combination. The SI-MWU algorithm minimizes sleeping internal regret and achieves a convergence rate of order square root of log of action set size over time, which they use to recover the equilibrium via stochastic approximation. This approach handles a practical issue that comes up in applications like dynamic resource games or scheduling with random constraints. The compact representation is a solid technical contribution because it avoids the exponential blowup that would otherwise occur. The regret analysis builds on standard techniques for sleeping experts, so it feels like a natural extension rather than something forced. The soft spots are mostly around the independence assumption. As the stress test notes, if action availabilities are correlated across players, the best response conditions link the realizations, and the single-vector trick no longer works. That limits how broadly the results apply without additional work. The abstract also does not mention any experimental results, so we do not know how the algorithm behaves on concrete instances or whether the hidden constants make it usable in moderate-sized games. If the proofs have no gaps, the theory looks fine, but that is hard to judge from the abstract alone. This paper is for algorithmic game theory researchers focused on regret-based learning and equilibrium finding in non-standard game models. A reader who wants to extend multiplicative weights methods to settings with random action availability would get direct value from the SI-MWU construction. It is coherent enough and introduces enough new elements that it deserves a serious referee, even if revisions will be needed to clarify the scope of the independence assumption. I would recommend putting it through peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript introduces Games with Stochastic Action Sets (GSAS), extending standard games by drawing each player's realized action set stochastically from a distribution over non-empty subsets of A_i. Under the assumption that these realizations are chosen independently across players, it proves that Nash equilibria of two-player zero-sum GSAS admit a compact representation as a single vector x in the simplex over the full action set A_i, with strategies obtained by renormalization onto each realized subset. The SI-MWU algorithm is introduced to minimize sleeping internal regret and is shown to produce iterates that converge to such equilibria with high probability at rate O(sqrt(log |A_i| / T)); a stochastic-approximation post-processing step then recovers the compact equilibrium vector from the iterates.

Significance. If the representation theorem and regret bound hold, the work supplies a computationally tractable method for equilibrium computation in a practically relevant class of games that capture exogenous stochastic action availability. The compact |A_i|-sized encoding is a clear theoretical advance over the naive exponential representation, and the convergence rate is consistent with existing sleeping-regret analyses. The reliance on standard regret-minimization primitives is a methodological strength that may facilitate extensions to other stochastic game models.

major comments (1)

[Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.

minor comments (3)

[Abstract] Abstract: the notation for the distribution over subsets (parametrized by a probability distribution over S_i) should be introduced with one clarifying sentence on support and independence.
[Algorithm section] Algorithm description: SI-MWU should be expanded on first use (e.g., Sleeping Internal Multiplicative Weights Update) and the precise sleeping-regret definition referenced to a standard citation.
[Abstract / Convergence theorem] Convergence statement: the high-probability qualifier on the O(sqrt(log |A_i| / T)) rate should be tied to a specific theorem number rather than left in the abstract.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of the work's significance, and recommendation for minor revision. We address the major comment below.

read point-by-point responses

Referee: [Abstract / Representation theorem] Abstract and representation theorem: the central claim that a single vector x of size |A_i| defines an equilibrium strategy for every pair of realized subsets S_i, S_j via renormalization is load-bearing. The derivation must explicitly invoke independence to decouple the conditional expected payoffs; without a displayed equation showing that the best-response condition holds simultaneously for all realizations, the size reduction cannot be verified.

Authors: We agree that the independence assumption is essential to the representation theorem and that making the decoupling explicit strengthens the presentation. The proof of Theorem 1 already invokes independence to show that the conditional expected payoff factors as the inner product of the renormalized strategies (i.e., the expectation over independent subset realizations separates, so that the best-response condition for the compact x holds uniformly for every pair of realized subsets). To address the concern directly, we will add a displayed equation in the revised manuscript that isolates this step: under independence, E[u_i(x_{S_i}, x_{S_j}) | S_i, S_j] = x_{S_i}^T A x_{S_j} for the renormalized vectors, confirming that the single vector x simultaneously satisfies the Nash condition across all realizations. This clarifies the size reduction without altering the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; compact representation and regret bound follow directly from stated independence assumption plus standard sleeping-regret analysis

full rationale

The derivation begins with an explicit modeling assumption (action availabilities independent across players) and shows that this decouples conditional payoffs, permitting any equilibrium to be encoded by a single vector x in the simplex over A_i whose renormalization onto each realized S_i yields the strategy. The SI-MWU algorithm is defined as a multiplicative-weights procedure that controls sleeping internal regret; its O(sqrt(log |A_i|/T)) rate is obtained by the usual potential-function argument applied to the sleeping-expert setting. No equation equates a fitted quantity to a claimed prediction, no load-bearing step rests on a self-citation, and no uniqueness theorem is imported from prior work by the same authors. The chain is therefore self-contained against external regret-minimization results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the independence of action-availability draws across players and on standard no-regret bounds for sleeping experts; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Action availabilities are drawn independently across players
Invoked to obtain the compact vector representation of size |A_i| and the regret convergence rate.

pith-pipeline@v0.9.0 · 5540 in / 1242 out tokens · 19071 ms · 2026-05-15T21:29:58.113287+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size |A_i|
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SI-MWU that minimizes sleeping internal regret, converging to NE with high probability in 2p0s-GSAS with rate O(sqrt(log |A_i|/T))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.