Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent

Gauthier Gidel; Georgios Piliouras; James P. Bailey

arxiv: 1907.04392 · v1 · pith:TPUJAOHVnew · submitted 2019-07-09 · 💻 cs.GT · math.DS· math.OC

Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent

James P. Bailey , Gauthier Gidel , Georgios Piliouras This is my paper

Pith reviewed 2026-05-24 23:49 UTC · model grok-4.3

classification 💻 cs.GT math.DSmath.OC

keywords gradient descentregret minimizationalternating updatesgame theoryonline optimizationzero-sum gamesfixed step size

0 comments

The pith

Alternating updates let gradient descent achieve bounded regret with any fixed step size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when players take turns updating their strategies with gradient descent, one player obtains bounded regret no matter what update rule the opponent uses. This holds for any fixed step size and eliminates the divergence and growing regret that occur under simultaneous updates. In fully competitive zero-sum settings where both players alternate, their strategies stay bounded and enter cycles instead of diverging. The result matters for repeated interactions where reliable performance guarantees are needed even against arbitrary opponents without requiring step sizes to shrink over time.

Core claim

Switching from simultaneous to alternating updates in gradient descent-ascent yields finite regret for any agent using gradient descent against an arbitrary opponent and produces bounded cycling strategies when both players alternate in adversarial zero-sum games.

What carries the argument

Alternating updates in gradient descent-ascent, with players revising strategies sequentially rather than at the same time.

If this is right

An agent using gradient descent obtains bounded regret against any opponent update rule.
Both players' strategies remain bounded when they both use alternating gradient descent-ascent in zero-sum games.
The strategies of both players enter cycles rather than diverge under the alternating rule with fixed step size.
Finite regret is obtained without any requirement to decrease the step size over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Alternating updates could be applied to stabilize other first-order methods in repeated games.
Enforcing turn order in distributed learning systems might prevent divergence in larger populations.
Cycling dynamics suggest periodic rather than static approach to equilibrium points.
The same alternation principle may extend to non-zero-sum or multi-player settings.

Load-bearing premise

Players must strictly alternate their updates rather than updating simultaneously.

What would settle it

A numerical simulation of a bilinear game in which one player uses alternating gradient descent and the cumulative regret grows linearly without bound over thousands of turns.

read the original abstract

Gradient descent is arguably one of the most popular online optimization methods with a wide array of applications. However, the standard implementation where agents simultaneously update their strategies yields several undesirable properties; strategies diverge away from equilibrium and regret grows over time. In this paper, we eliminate these negative properties by introducing a different implementation to obtain finite regret via arbitrary fixed step-size. We obtain this surprising property by having agents take turns when updating their strategies. In this setting, we show that an agent that uses gradient descent obtains bounded regret -- regardless of how their opponent updates their strategies. Furthermore, we show that in adversarial settings that agents' strategies are bounded and cycle when both are using the alternating gradient descent algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Alternating updates give bounded regret for fixed-step GD against any opponent and bounded cycles when both alternate, a direct structural fix for simultaneous-play divergence.

read the letter

The main result is that strict alternation between the two players lets a gradient-descent agent keep finite regret at any fixed step size no matter how the opponent updates, and that both players cycling in a bounded way when they both alternate. This is the concrete advance over the simultaneous-update literature the abstract contrasts with. The paper does a clean job showing how the change in update order removes the usual divergence and linear-regret problems without forcing the step size to zero. That is useful for anyone who wants constant-step methods that still have no-regret guarantees in game settings. The cycling claim in adversarial cases adds a picture of long-run behavior that simultaneous analyses often lack. The soft spots are modest but real. The strength of the “regardless of opponent” guarantee rests on whatever payoff structure and continuity assumptions are used in the proofs; if those are narrow (bilinear zero-sum, for example) the result is narrower than the abstract phrasing suggests. The cycling bound also needs to be checked for dependence on step size or initial conditions that might not be fully general. No obvious circularity or invented quantities appear. This is for readers working on online learning in games and regret bounds under adversarial play. Someone already following the simultaneous GD literature would get immediate value from seeing what alternation buys. It deserves a serious referee because the central claim is stated sharply, the contrast with prior work is clear, and the idea is simple enough to check once the derivations are in hand.

Referee Report

2 major / 2 minor

Summary. The paper claims that replacing simultaneous updates with strict alternation in gradient descent-ascent for two-player games yields finite (bounded) regret for a gradient-descent player against an arbitrary opponent, using any fixed step size. It further claims that when both players use alternating gradient descent-ascent, the joint strategy trajectory remains bounded and enters a cycle.

Significance. If the claims hold, the result is significant: it supplies a minimal structural change (alternation) that converts the well-known divergence and linear regret of simultaneous GDA into bounded regret and cycling, without requiring step-size decay or other modifications. This would be directly useful for online learning in games and could influence practical implementations of adversarial training.

major comments (2)

[§3] The central bounded-regret claim (abstract and §3) is stated for arbitrary opponent updates, yet the proof sketch appears to rely on the opponent’s update occurring only after the GD player has moved; the manuscript should explicitly state the information structure (perfect vs. delayed observation of the opponent’s last move) and verify that the regret bound remains finite under the weaker assumption.
[Theorem 1] Theorem 1 (or equivalent) asserts bounded regret independent of the opponent’s algorithm, but the derivation must be checked against the case in which the opponent also uses a fixed-step gradient step; if the two players’ step sizes differ by an arbitrary factor, the claimed bound may grow with that factor and should be stated explicitly.

minor comments (2)

Notation for the alternating schedule (who moves first, how ties are broken) is introduced only informally; a short pseudocode block or explicit timing diagram would remove ambiguity.
The cycling claim in the two-player case is illustrated only for a single low-dimensional example; a brief statement of the period or the invariant set would help readers assess generality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [§3] The central bounded-regret claim (abstract and §3) is stated for arbitrary opponent updates, yet the proof sketch appears to rely on the opponent’s update occurring only after the GD player has moved; the manuscript should explicitly state the information structure (perfect vs. delayed observation of the opponent’s last move) and verify that the regret bound remains finite under the weaker assumption.

Authors: We agree that the information structure should be stated explicitly. The alternating update model in the paper assumes sequential moves in which each player observes the opponent's most recent strategy before updating (perfect observation of the last move). The proof in §3 relies on this ordering. We will add a clear statement of this assumption in §3. The weaker delayed-observation model is outside the scope of the claimed result; the paper does not assert the bound under simultaneous information, so no verification for that case is required. revision: yes
Referee: [Theorem 1] Theorem 1 (or equivalent) asserts bounded regret independent of the opponent’s algorithm, but the derivation must be checked against the case in which the opponent also uses a fixed-step gradient step; if the two players’ step sizes differ by an arbitrary factor, the claimed bound may grow with that factor and should be stated explicitly.

Authors: The theorem asserts that regret remains bounded (does not grow linearly with time) for any fixed opponent update rule. When the opponent also employs alternating GDA, the finite bound value does depend on the ratio of the two step sizes. We will revise the theorem statement and add a remark that explicitly notes this dependence and verifies the two-player alternating-GDA case. The central claim of time-independent boundedness is unaffected. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper's central claims rest on the explicit structural assumption of strict alternation in updates, presented as the key change that yields bounded regret for gradient descent against arbitrary opponents and cycling behavior when both alternate. No equations or steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations. The alternation is an independent modeling choice, not derived from the target regret bounds. The argument is internally consistent without circular reduction, consistent with the reader's assessment of score 2.0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on standard domain assumptions of two-player adversarial (likely zero-sum) games and the alternating update protocol; no free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption Players update in strict alternation rather than simultaneously
This is the core modeling choice that enables the finite-regret claim.
domain assumption The setting is adversarial / two-player zero-sum
Referenced by the phrase 'adversarial settings'.

pith-pipeline@v0.9.0 · 5656 in / 1057 out tokens · 20333 ms · 2026-05-24T23:49:32.136996+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean; Cost/FunctionalEquation.lean; Foundation/RealityFromDistinction.lean reality_from_one_distinction; Jcost functional uniqueness; Hamiltonian energy conservation + Poincaré recurrence from volume + bounded orbits matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

Lemma 7: If both agents use (AltGD), ... ||xt1||²/η1 + ||xt2||²/η2 + <xt1,A xt2> = constant (energy conservation). Thm 8-9: bounded orbits iff √(η1 η2) ≤ 2/||A||. Cor 6: Poincaré recurrence.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean; Cost modules J(x) = ½(x + x⁻¹) - 1 forces reciprocal cost and symplectic structure echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

AltGD obtained via Verlet/leapfrog symplectic integration of ContinuousGD; preserves volume and approximately conserves energy for all time (unlike Euler/SimGD).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.