pith. the verified trust layer for science. sign in

arxiv: 2604.19065 · v1 · submitted 2026-04-21 · 💻 cs.GT · cs.SY· eess.SY· math.OC· stat.ML

Last-Iterate Guarantees for Learning in Co-coercive Games

Pith reviewed 2026-05-10 02:05 UTC · model grok-4.3

classification 💻 cs.GT cs.SYeess.SYmath.OCstat.ML
keywords co-coercive gameslast-iterate convergencestochastic gradient descentNash equilibrianon-vanishing noisefinite-time boundsgame theory
0
0 comments X p. Extension

The pith

Stochastic gradient descent achieves O(log t / t^{1/3}) last-iterate convergence in co-coercive games under non-vanishing noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that vanilla stochastic gradient descent converges in the last iterate to the set of Nash equilibria for co-coercive games, which generalize strongly monotone games and include cases with multiple equilibria such as certain quadratic games and smooth concave potential games. It works under a noise model where the second moment of the feedback noise scales affinely with the squared norm of the current iterate, without requiring the noise to vanish near equilibrium. This produces a finite-time last-iterate rate of O(log(t)/t^{1/3}) together with almost-sure convergence of the iterates and standard time-average convergence. A reader would care because the noise model matches settings with large or unbounded action spaces where earlier vanishing-noise assumptions do not apply.

Core claim

In co-coercive games, stochastic gradient descent with a decreasing step-size schedule yields last-iterate convergence to the Nash set at rate O(log(t)/t^{1/3}) when the second moment of the noisy gradient feedback is bounded by a constant plus a term linear in the squared norm of the iterate. The same analysis also establishes almost-sure convergence of the sequence to the set of Nash equilibria and time-average convergence to that set.

What carries the argument

The co-coercivity of the game mapping together with the affine noise-second-moment bound, analyzed through a step-size schedule of order t to the minus two-thirds.

Load-bearing premise

The second moment of the noise is at most an affine function of the squared norm of the iterates.

What would settle it

Numerical runs in which the noise variance is forced to grow quadratically with the squared iterate norm and the observed last-iterate distance to equilibrium is checked for violation of the O(log t / t^{1/3}) rate.

read the original abstract

We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proves finite-time last-iterate convergence for vanilla SGD on co-coercive games under a non-vanishing noise model in which the noise second moment scales affinely with ||x_t||^2. It obtains an O(log t / t^{1/3}) last-iterate rate to the Nash set, almost-sure convergence of the iterates, and time-average convergence. The setting generalizes strongly monotone games and includes quadratic games with negative-semidefinite interaction matrices and smooth potential games.

Significance. If the stated bounds are correct, the result supplies the first last-iterate rate for co-coercive games that does not require relative (vanishing) noise. The noise model is natural for unbounded action spaces and the proof technique appears to combine a supermartingale moment bound with a Lyapunov argument that yields the 1/3 exponent. The paper also supplies almost-sure convergence, which is a useful complement to the finite-time guarantee.

major comments (2)
  1. [§4.2 and Theorem 4.3] §4.2, the moment-control argument: the supermartingale bound on E[||x_t||^2] is derived under the affine noise assumption, yet the subsequent last-iterate telescoping in Theorem 4.3 appears to invoke a uniform bound on the second moments to control the noise terms inside the Lyapunov decrease. These two steps must be sequenced explicitly so that the rate derivation does not presuppose the very boundedness it is used to establish.
  2. [Theorem 3.1 and Lemma 4.1] Theorem 3.1 (last-iterate bound): the O(log t / t^{1/3}) rate is stated to hold for any step-size sequence satisfying a specific summability condition, but the proof sketch does not verify that the chosen step sizes simultaneously satisfy the moment-control hypothesis of Lemma 4.1. A single explicit step-size schedule that works for both parts should be exhibited.
minor comments (2)
  1. [§2.2] Notation: the symbol β is used both for the co-coercivity constant and for a step-size parameter in §2.2; a distinct symbol would improve readability.
  2. [Figure 1] Figure 1: the y-axis label on the right-hand plot is missing the log scale indicator that is described in the caption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below. The concerns raised are valid points about proof presentation and can be resolved by reordering arguments for clarity and exhibiting an explicit step-size schedule; these are expository improvements that do not alter the results.

read point-by-point responses
  1. Referee: [§4.2 and Theorem 4.3] §4.2, the moment-control argument: the supermartingale bound on E[||x_t||^2] is derived under the affine noise assumption, yet the subsequent last-iterate telescoping in Theorem 4.3 appears to invoke a uniform bound on the second moments to control the noise terms inside the Lyapunov decrease. These two steps must be sequenced explicitly so that the rate derivation does not presuppose the very boundedness it is used to establish.

    Authors: We agree that the current presentation could be misread as circular. The supermartingale argument in §4.2 in fact yields a t-independent bound E[||x_t||^2] ≤ M almost surely under the affine noise model (with M depending only on the initial condition, the co-coercivity constant, and the noise coefficients). This bound is then invoked in the Lyapunov telescoping of Theorem 4.3 to absorb the noise terms. To eliminate any ambiguity, we will (i) promote the moment bound to a standalone lemma placed before Theorem 4.3, (ii) explicitly cite it at the start of the proof of Theorem 4.3 (“By the moment-control lemma, E[||x_s||^2] ≤ M for all s ≤ t, which permits the uniform bound …”), and (iii) add a short remark after the lemma stating that the bound is established independently of the subsequent rate analysis. These changes are purely organizational. revision: yes

  2. Referee: [Theorem 3.1 and Lemma 4.1] Theorem 3.1 (last-iterate bound): the O(log t / t^{1/3}) rate is stated to hold for any step-size sequence satisfying a specific summability condition, but the proof sketch does not verify that the chosen step sizes simultaneously satisfy the moment-control hypothesis of Lemma 4.1. A single explicit step-size schedule that works for both parts should be exhibited.

    Authors: We concur that an explicit schedule is useful. Theorem 3.1 requires a step-size sequence satisfying ∑ η_t = ∞ together with ∑ η_t^{3/2} < ∞ (to obtain the log t / t^{1/3} rate). Lemma 4.1 requires in addition that ∑ η_t^2 < ∞ to close the supermartingale argument. The schedule η_t = t^{-2/3} (log(t+2))^{-2} for t ≥ 1 satisfies all three conditions simultaneously: the first sum diverges (integral test yields t^{1/3} growth), while the second and third sums converge (exponents -4/3 and -4/3). We will insert this concrete choice as a corollary to Theorem 3.1, verify that it meets the hypothesis of Lemma 4.1, and include the short calculation in the appendix. The general statement of Theorem 3.1 remains unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract presents the O(log(t)/t^{1/3}) last-iterate bound as derived from co-coercivity plus the explicit affine noise model (second moment scaling as O(1 + ||x_t||^2)). No equations, definitions, or self-citations in the provided text reduce this bound to a fitted quantity, a self-referential assumption, or a prior result by the same authors. The noise model is stated as an input assumption rather than derived from the rate, and almost-sure convergence is listed as an additional result. The derivation chain therefore remains self-contained against external benchmarks such as prior vanishing-noise analyses; the skeptic concern about moment control is a standard proof obligation but does not manifest as a circular reduction in the given material.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the definition of co-coercivity and the affine noise model; both are domain assumptions rather than derived quantities. No free parameters are fitted and no new entities are introduced.

axioms (2)
  • domain assumption The game is co-coercive
    Core structural assumption that enables the analysis and is broader than strong monotonicity.
  • domain assumption Second moment of noise scales affinely with squared norm of iterates
    The key modeling choice that replaces vanishing-noise assumptions and is natural for unbounded action spaces.

pith-pipeline@v0.9.0 · 5494 in / 1374 out tokens · 57058 ms · 2026-05-10T02:05:17.207053+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages

  1. [1]

    A control theoretic approach to noncooperative game design,

    T. Alpcan, L. Pavel, and N. Stefanovic, “A control theoretic approach to noncooperative game design,” inDecision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC

  2. [2]

    IEEE, 2009, pp

    Proceedings of the 48th IEEE Conference on. IEEE, 2009, pp. 8575–8580

  3. [3]

    Distributed welfare games,

    J. R. Marden and A. Wierman, “Distributed welfare games,”Opera- tions Research, vol. 61, no. 1, pp. 155–168, 2013

  4. [4]

    Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking,

    T. Tatarenko, W. Shi, and A. Nedi ´c, “Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking,” IEEE Transactions on Automatic Control, vol. 66, no. 11, pp. 5342– 5353, 2020

  5. [5]

    On gradient-based learning in continuous games,

    E. Mazumdar, L. J. Ratliff, and S. S. Sastry, “On gradient-based learning in continuous games,”SIAM Journal on Mathematics of Data Science, vol. 2, no. 1, pp. 103–131, 2020

  6. [6]

    Learning in games with continuous action sets and unknown payoff functions,

    P. Mertikopoulos and Z. Zhou, “Learning in games with continuous action sets and unknown payoff functions,”Mathematical Program- ming, vol. 173, no. 1, pp. 465–507, 2019

  7. [7]

    Learning to control unknown strongly monotone games,

    S. Chandak, I. Bistritz, and N. Bambos, “Learning to control unknown strongly monotone games,”IEEE Transactions on Control of Network Systems, 2026

  8. [8]

    Existence and uniqueness of equilibrium points for concave n-person games,

    J. B. Rosen, “Existence and uniqueness of equilibrium points for concave n-person games,”Econometrica: Journal of the Econometric Society, pp. 520–534, 1965

  9. [9]

    Robust power management via learning and game design,

    Z. Zhou, P. Mertikopoulos, A. L. Moustakas, N. Bambos, and P. Glynn, “Robust power management via learning and game design,”Opera- tions Research, vol. 69, no. 1, pp. 331–345, 2021

  10. [10]

    Finite-time last- iterate convergence for multi-agent learning in games,

    T. Lin, Z. Zhou, P. Mertikopoulos, and M. Jordan, “Finite-time last- iterate convergence for multi-agent learning in games,” inProceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 6161–6171

  11. [11]

    Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity,

    N. Loizou, H. Berard, G. Gidel, I. Mitliagkas, and S. Lacoste-Julien, “Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity,” Advances in Neural Information Processing Systems, vol. 34, pp. 19 095–19 108, 2021

  12. [12]

    Convergence rate analysis of several splitting schemes,

    D. Davis and W. Yin, “Convergence rate analysis of several splitting schemes,”Splitting methods in communication, imaging, science, and engineering, pp. 115–163, 2016

  13. [13]

    Convergence rates with inexact non-expansive operators,

    J. Liang, J. Fadili, and G. Peyr ´e, “Convergence rates with inexact non-expansive operators,”Mathematical Programming, vol. 159, pp. 403–434, 2016

  14. [14]

    Bandit learning in concave n-person games,

    M. Bravo, D. Leslie, and P. Mertikopoulos, “Bandit learning in concave n-person games,”Advances in Neural Information Processing Systems, vol. 31, 2018

  15. [15]

    On the rate of convergence of payoff-based algorithms to nash equilibrium in strongly monotone games,

    T. Tatarenko and M. Kamgarpour, “On the rate of convergence of payoff-based algorithms to nash equilibrium in strongly monotone games,”arXiv preprint arXiv:2202.11147, 2022

  16. [16]

    Convergence rate of generalized nash equilibrium learning in strongly monotone games with linear constraints,

    ——, “Convergence rate of generalized nash equilibrium learning in strongly monotone games with linear constraints,”arXiv preprint arXiv:2507.12112, 2025

  17. [17]

    Cycles in adversarial regularized learning,

    P. Mertikopoulos, C. Papadimitriou, and G. Piliouras, “Cycles in adversarial regularized learning,” inProceedings of the twenty-ninth annual ACM-SIAM symposium on discrete algorithms. SIAM, 2018, pp. 2703–2717

  18. [18]

    Last-iterate convergence of optimistic gradient method for monotone variational inequalities,

    E. Gorbunov, A. Taylor, and G. Gidel, “Last-iterate convergence of optimistic gradient method for monotone variational inequalities,” Advances in neural information processing systems, vol. 35, pp. 21 858–21 870, 2022

  19. [19]

    Learning nash equilibria in mono- tone games,

    T. Tatarenko and M. Kamgarpour, “Learning nash equilibria in mono- tone games,” in2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 3104–3109

  20. [20]

    Tight last-iterate conver- gence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities,

    Y . Cai, A. Oikonomou, and W. Zheng, “Tight last-iterate conver- gence of the extragradient and the optimistic gradient descent-ascent algorithm for constrained monotone variational inequalities,”arXiv preprint arXiv:2204.09228, 2022

  21. [21]

    Facchinei and J.-S

    F. Facchinei and J.-S. Pang,Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007

  22. [22]

    Nesterovet al.,Lectures on convex optimization

    Y . Nesterovet al.,Lectures on convex optimization. Springer, 2018, vol. 137

  23. [23]

    A convergence theorem for non neg- ative almost supermartingales and some applications,

    H. Robbins and D. Siegmund, “A convergence theorem for non neg- ative almost supermartingales and some applications,” inOptimizing methods in statistics. Elsevier, 1971, pp. 233–257

  24. [24]

    Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition, ser

    V . Borkar,Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition, ser. Texts and Readings in Mathematics. Hindustan Book Agency, 2022

  25. [25]

    J.-J. E. Slotine and W. Li,Applied nonlinear control. Prentice hall Englewood Cliffs, NJ, 1991, vol. 199, no. 1. APPENDIXI PROOFS FORTHEOREMS1, 2, 3 A.Proof for Theorem 1 Proof.Let us first analyze solutions of the following ODE. ˙x(t) =v(x(t)).(5) DefineV(x) =∥x−x ∗∥2 for somex ∗ ∈ X ∗. Then we have thatV(x)≥0for allx∈R N d andV(x)→ ∞asx→ ∞. Also, note t...