Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent
Pith reviewed 2026-05-24 23:49 UTC · model grok-4.3
The pith
Alternating updates let gradient descent achieve bounded regret with any fixed step size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Switching from simultaneous to alternating updates in gradient descent-ascent yields finite regret for any agent using gradient descent against an arbitrary opponent and produces bounded cycling strategies when both players alternate in adversarial zero-sum games.
What carries the argument
Alternating updates in gradient descent-ascent, with players revising strategies sequentially rather than at the same time.
If this is right
- An agent using gradient descent obtains bounded regret against any opponent update rule.
- Both players' strategies remain bounded when they both use alternating gradient descent-ascent in zero-sum games.
- The strategies of both players enter cycles rather than diverge under the alternating rule with fixed step size.
- Finite regret is obtained without any requirement to decrease the step size over time.
Where Pith is reading between the lines
- Alternating updates could be applied to stabilize other first-order methods in repeated games.
- Enforcing turn order in distributed learning systems might prevent divergence in larger populations.
- Cycling dynamics suggest periodic rather than static approach to equilibrium points.
- The same alternation principle may extend to non-zero-sum or multi-player settings.
Load-bearing premise
Players must strictly alternate their updates rather than updating simultaneously.
What would settle it
A numerical simulation of a bilinear game in which one player uses alternating gradient descent and the cumulative regret grows linearly without bound over thousands of turns.
read the original abstract
Gradient descent is arguably one of the most popular online optimization methods with a wide array of applications. However, the standard implementation where agents simultaneously update their strategies yields several undesirable properties; strategies diverge away from equilibrium and regret grows over time. In this paper, we eliminate these negative properties by introducing a different implementation to obtain finite regret via arbitrary fixed step-size. We obtain this surprising property by having agents take turns when updating their strategies. In this setting, we show that an agent that uses gradient descent obtains bounded regret -- regardless of how their opponent updates their strategies. Furthermore, we show that in adversarial settings that agents' strategies are bounded and cycle when both are using the alternating gradient descent algorithm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that replacing simultaneous updates with strict alternation in gradient descent-ascent for two-player games yields finite (bounded) regret for a gradient-descent player against an arbitrary opponent, using any fixed step size. It further claims that when both players use alternating gradient descent-ascent, the joint strategy trajectory remains bounded and enters a cycle.
Significance. If the claims hold, the result is significant: it supplies a minimal structural change (alternation) that converts the well-known divergence and linear regret of simultaneous GDA into bounded regret and cycling, without requiring step-size decay or other modifications. This would be directly useful for online learning in games and could influence practical implementations of adversarial training.
major comments (2)
- [§3] The central bounded-regret claim (abstract and §3) is stated for arbitrary opponent updates, yet the proof sketch appears to rely on the opponent’s update occurring only after the GD player has moved; the manuscript should explicitly state the information structure (perfect vs. delayed observation of the opponent’s last move) and verify that the regret bound remains finite under the weaker assumption.
- [Theorem 1] Theorem 1 (or equivalent) asserts bounded regret independent of the opponent’s algorithm, but the derivation must be checked against the case in which the opponent also uses a fixed-step gradient step; if the two players’ step sizes differ by an arbitrary factor, the claimed bound may grow with that factor and should be stated explicitly.
minor comments (2)
- Notation for the alternating schedule (who moves first, how ties are broken) is introduced only informally; a short pseudocode block or explicit timing diagram would remove ambiguity.
- The cycling claim in the two-player case is illustrated only for a single low-dimensional example; a brief statement of the period or the invariant set would help readers assess generality.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comments. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [§3] The central bounded-regret claim (abstract and §3) is stated for arbitrary opponent updates, yet the proof sketch appears to rely on the opponent’s update occurring only after the GD player has moved; the manuscript should explicitly state the information structure (perfect vs. delayed observation of the opponent’s last move) and verify that the regret bound remains finite under the weaker assumption.
Authors: We agree that the information structure should be stated explicitly. The alternating update model in the paper assumes sequential moves in which each player observes the opponent's most recent strategy before updating (perfect observation of the last move). The proof in §3 relies on this ordering. We will add a clear statement of this assumption in §3. The weaker delayed-observation model is outside the scope of the claimed result; the paper does not assert the bound under simultaneous information, so no verification for that case is required. revision: yes
-
Referee: [Theorem 1] Theorem 1 (or equivalent) asserts bounded regret independent of the opponent’s algorithm, but the derivation must be checked against the case in which the opponent also uses a fixed-step gradient step; if the two players’ step sizes differ by an arbitrary factor, the claimed bound may grow with that factor and should be stated explicitly.
Authors: The theorem asserts that regret remains bounded (does not grow linearly with time) for any fixed opponent update rule. When the opponent also employs alternating GDA, the finite bound value does depend on the ratio of the two step sizes. We will revise the theorem statement and add a remark that explicitly notes this dependence and verifies the two-player alternating-GDA case. The central claim of time-independent boundedness is unaffected. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper's central claims rest on the explicit structural assumption of strict alternation in updates, presented as the key change that yields bounded regret for gradient descent against arbitrary opponents and cycling behavior when both alternate. No equations or steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations. The alternation is an independent modeling choice, not derived from the target regret bounds. The argument is internally consistent without circular reduction, consistent with the reader's assessment of score 2.0.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Players update in strict alternation rather than simultaneously
- domain assumption The setting is adversarial / two-player zero-sum
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArrowOfTime.lean; Cost/FunctionalEquation.lean; Foundation/RealityFromDistinction.leanreality_from_one_distinction; Jcost functional uniqueness; Hamiltonian energy conservation + Poincaré recurrence from volume + bounded orbits matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Lemma 7: If both agents use (AltGD), ... ||xt1||²/η1 + ||xt2||²/η2 + <xt1,A xt2> = constant (energy conservation). Thm 8-9: bounded orbits iff √(η1 η2) ≤ 2/||A||. Cor 6: Poincaré recurrence.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean; Cost modulesJ(x) = ½(x + x⁻¹) - 1 forces reciprocal cost and symplectic structure echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
AltGD obtained via Verlet/leapfrog symplectic integration of ContinuousGD; preserves volume and approximately conserves energy for all time (unlike Euler/SimGD).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.