Last-Iterate Convergence of Anchored Gradient Descent

Weiqiang Zheng; Yang Cai

arxiv: 2604.12235 · v1 · submitted 2026-04-14 · 🧮 math.OC

Last-Iterate Convergence of Anchored Gradient Descent

Yang Cai , Weiqiang Zheng This is my paper

Pith reviewed 2026-05-10 16:17 UTC · model grok-4.3

classification 🧮 math.OC

keywords monotone inclusionanchored gradient descentlast-iterate convergencetangent residualvariational inequalitiessaddle-point problems

0 comments

The pith

Anchored gradient descent achieves last-iterate convergence at rate O(1/sqrt(T)) for the monotone inclusion 0 in F(z) + A(z).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends anchored gradient descent, previously limited to unconstrained problems, to the monotone inclusion framework 0 in F(z) + A(z) where F is monotone and Lipschitz while A is maximally monotone. This setting covers constrained variational inequalities and regularized saddle-point problems. The authors prove that the anchored method alone produces last-iterate convergence measured by the tangent residual at rate O(1/sqrt(T)). A reader would care because vanilla gradient descent is known to diverge on the same problems, so the result isolates anchoring as a sufficient stabilizing mechanism.

Core claim

We extend the anchored gradient descent method to the monotone inclusion problem 0 in F(z) + A(z) and prove a last-iterate convergence rate of O(1/sqrt(T)) in terms of the tangent residual. The proof builds on the recent unconstrained analysis and incorporates operator tools to handle the general case that includes constraints and regularization.

What carries the argument

Anchored gradient descent, the iterative update that incorporates a fixed anchor point alongside the monotone operator to drive the sequence toward solutions of the inclusion.

If this is right

Anchored gradient descent converges last-iterate on constrained monotone variational inequalities.
The same rate holds for convex-concave saddle-point problems that include constraints or regularization.
Anchoring by itself suffices for last-iterate convergence even though vanilla gradient descent diverges.
The result applies to any maximally monotone A paired with a monotone Lipschitz F.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method might serve as a simpler alternative to optimism-based schemes like extragradient in settings where implementation cost matters.
Similar anchoring could be tested on non-monotone or stochastic variants of the inclusion problem.
If the rate is tight, it would match known lower bounds for last-iterate convergence in monotone problems.

Load-bearing premise

The proof techniques from the unconstrained setting transfer directly to the general inclusion problem without new technical obstacles.

What would settle it

A counterexample monotone inclusion where the tangent residual of the anchored iterates stays bounded away from zero or decays slower than O(1/sqrt(T)) would falsify the claimed convergence.

Figures

Figures reproduced from arXiv: 2604.12235 by Weiqiang Zheng, Yang Cai.

read the original abstract

We study the monotone inclusion problem $0\in F(z)+A(z)$, where $F$ is monotone and Lipschitz, and $A$ is maximally monotone, a framework that encompasses monotone variational inequalities and convex-concave saddle-point problems with constraints or regularization. It is well known that vanilla gradient descent diverges for this problem, whereas optimism-based methods such as Extragradient and accelerated methods that combine both optimism and anchoring, such as Extra Anchored Gradient, achieve last-iterate convergence. However, the anchoring-only method, anchored gradient descent, has been studied only in the unconstrained setting [RYY19, SST+26]. In this note, we extend the anchored gradient descent method to the monotone inclusion problem and prove a last-iterate convergence rate of $O(1/\sqrt{T})$ in terms of the tangent residual. We build on the recent proof in the unconstrained setting [SST+26] and use techniques from [COZ24] to extend it to the general inclusion setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Anchored gradient descent extends to monotone inclusions with last-iterate rates, but the resolvent handling needs close inspection.

read the letter

Anchored gradient descent now has a last-iterate O(1/√T) guarantee for the monotone inclusion 0 ∈ F(z) + A(z). That is the central new result. The paper adapts the unconstrained analysis to the setting with a maximally monotone operator A by incorporating the resolvent step. This is useful because it covers constrained variational inequalities and regularized problems without needing optimism or acceleration. The authors credit the prior work on anchoring and use operator techniques to make the extension. The focus on the tangent residual as the convergence measure fits the inclusion framework well. One soft spot is the reliance on transferring the proof. The resolvent of A is a nonlinear operation, so the inner-product arguments or potential functions from the unconstrained case may require additional bounds to control the error. The abstract mentions using techniques from [COZ24], but the strength of the result depends on whether those techniques fully absorb the extra terms without degrading the rate or adding assumptions. If the full paper shows explicit inequalities that close this gap, the claim holds; if not, it could be a minor technical hole. This work is for researchers in optimization and monotone operators who want simpler last-iterate methods for saddle-point and variational inequality problems. A reader who knows the recent anchored GD papers will see the value immediately. The citation pattern looks appropriate, building on the right prior results without circularity. I would bring this to a reading group to go over the proof steps in detail. I would not cite it in my own papers until I verify the resolvent estimates. It deserves peer review because it addresses a documented gap with a straightforward method.

Referee Report

2 major / 2 minor

Summary. The paper extends anchored gradient descent to the monotone inclusion problem 0 ∈ F(z) + A(z), where F is monotone and L-Lipschitz and A is maximally monotone. It claims a last-iterate O(1/√T) convergence rate on the tangent residual by adapting the unconstrained Lyapunov analysis of [SST+26] together with operator techniques from [COZ24].

Significance. If the extension is fully rigorous, the result would establish that anchoring alone suffices for last-iterate convergence in the general inclusion setting (covering constrained VIs and regularized saddle-point problems), complementing optimism-based methods. The manuscript correctly credits the source papers and avoids parameter fitting.

major comments (2)

[§3] §3, update (3) and the subsequent inner-product estimate: the resolvent step J_ηA introduces a non-expansive but nonlinear perturbation whose interaction with the anchoring term is not automatically controlled by the unconstrained estimates in [SST+26]. The manuscript must derive an explicit inequality showing that the additional error term remains O(1/√T) without degrading the rate or requiring extra assumptions on A.
[Theorem 1] Theorem 1 (main rate): the proof sketch invokes the tangent-residual bound from [COZ24] but does not verify that the composite map (forward step on F followed by resolvent of A) preserves the exact Lyapunov decrease used in the unconstrained case. A concrete expansion of the inner-product term involving (z_{t+1} - z_t) and the anchoring difference is needed.

minor comments (2)

[§2] The definition of the tangent residual (Eq. (2)) should include a brief comparison to the standard VI gap function to aid readers from the variational-inequality literature.
[§2, §4] Notation for the step-size η and the anchoring parameter should be introduced once in §2 and used consistently; a small number of inconsistent subscripts appear in the displayed equations of §4.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We appreciate the opportunity to clarify and strengthen the proof details. Below we address each major comment.

read point-by-point responses

Referee: [§3] §3, update (3) and the subsequent inner-product estimate: the resolvent step J_ηA introduces a non-expansive but nonlinear perturbation whose interaction with the anchoring term is not automatically controlled by the unconstrained estimates in [SST+26]. The manuscript must derive an explicit inequality showing that the additional error term remains O(1/√T) without degrading the rate or requiring extra assumptions on A.

Authors: We agree that an explicit derivation is necessary to rigorously control the perturbation from the resolvent step. In the revised manuscript, we will update equation (3) and the following inner-product estimate by decomposing the terms involving the resolvent J_ηA. Leveraging the non-expansiveness of the resolvent and the monotonicity of A, combined with the Lipschitz property of F, we will show that the additional error term is bounded by a quantity that does not degrade the O(1/√T) rate. This will be done without imposing extra assumptions on A beyond maximal monotonicity. revision: yes
Referee: [Theorem 1] Theorem 1 (main rate): the proof sketch invokes the tangent-residual bound from [COZ24] but does not verify that the composite map (forward step on F followed by resolvent of A) preserves the exact Lyapunov decrease used in the unconstrained case. A concrete expansion of the inner-product term involving (z_{t+1} - z_t) and the anchoring difference is needed.

Authors: We acknowledge that the proof sketch in Theorem 1 would benefit from a more detailed verification of the Lyapunov decrease for the composite map. We will include a concrete expansion of the inner-product term (z_{t+1} - z_t) dotted with the anchoring difference in the revised proof. This expansion will demonstrate that the decrease property carries over from the unconstrained setting [SST+26], using the tangent residual bound from [COZ24] to handle the inclusion. The anchoring term's contribution will be shown to remain dominant, preserving the convergence rate. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent external citations with no author overlap or self-referential reduction

full rationale

The paper extends anchored gradient descent to the monotone inclusion 0 ∈ F(z) + A(z) and claims an O(1/√T) last-iterate rate on the tangent residual. It explicitly states that the proof builds on the unconstrained analysis from [SST+26] together with operator tools from [COZ24]. These cited works have no author overlap with the present authors (Cai, Zheng). No equation in the provided abstract or description defines the claimed rate in terms of a fitted parameter, renames a known result, or reduces the central bound to a self-citation chain whose load-bearing step is unverified within the paper. The derivation is therefore not equivalent to its inputs by construction, and the extension is presented as a transfer of independent prior techniques rather than a self-contained re-derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of monotone operator theory plus the transferability of proof techniques from two cited papers. No new entities are postulated and no parameters are fitted to data.

axioms (2)

domain assumption F is monotone and L-Lipschitz continuous
Standard assumption stated in the abstract for the monotone inclusion problem.
domain assumption A is maximally monotone
Defines the inclusion problem 0 ∈ F(z) + A(z) and is required for the operator-theoretic setting.

pith-pipeline@v0.9.0 · 5463 in / 1276 out tokens · 72054 ms · 2026-05-10T16:17:43.578094+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Advancing Mathematics Research with AI-Driven Formal Proof Search
cs.AI 2026-05 unverdicted novelty 7.0

LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

arXiv preprint arXiv:2203.10947 , year=

[BCN22] Radu Ioan Bot, Ern ¨o Robert Csetnek, and Dang-Khoa Nguyen. “Fast OGDA in continuous and discrete time”. In:arXiv preprint arXiv:2203.10947(2022). [BNZ23] Radu Ioan Bot, Dang-Khoa Nguyen, and Chunxiang Zong. “Fast Forward-Backward splitting for monotone inclusions with a convergence rate of the tangent residual ofo(1/k)”. In:arXiv preprint arXiv:2...

work page arXiv 2022
[2]

Accelerated Single-Call Methods for Constrained Min-Max Optimization

[CZ23a] Yang Cai and Weiqiang Zheng. “Accelerated Single-Call Methods for Constrained Min-Max Optimization”. In:International Conference on Learning Representations (ICLR)(2023). [CZ23b] Yang Cai and Weiqiang Zheng. “Doubly optimal no-regret learning in monotone games”. In: International Conference on Machine Learning. PMLR. 2023, pp. 3507–3524. [DP18] Co...

work page 2023
[3]

Extragradient method:O(1/K)last- iterate convergence for monotone variational inequalities and connections with cocoercivity

[GLG22] Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. “Extragradient method:O(1/K)last- iterate convergence for monotone variational inequalities and connections with cocoercivity”. In:International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR. 2022, pp. 366–402. [GPD+20] Noah Golowich, Sarath Pattathil, Constantinos Daskala...

work page 2022
[4]

Tight last-iterate convergence rates for no-regret learning in multi-player games

[GPD20] Noah Golowich, Sarath Pattathil, and Constantinos Daskalakis. “Tight last-iterate convergence rates for no-regret learning in multi-player games”. In:Advances in neural information pro- cessing systems (NeurIPS)(2020). 10 [GTG22] Eduard Gorbunov, Adrien Taylor, and Gauthier Gidel. “Last-Iterate Convergence of Optimistic Gradient Method for Monoton...

work page 2020
[5]

Fixed points of nonexpanding maps

[Hal67] Benjamin Halpern. “Fixed points of nonexpanding maps”. In:Bulletin of the American Math- ematical Society73.6 (1967), pp. 957–961. [KG22] Dmitry Kovalev and Alexander Gasnikov. “The first optimal algorithm for smooth and strongly- convex-strongly-concave minimax optimization”. In:Advances in Neural Information Process- ing Systems35 (2022), pp. 14...

work page 1967
[6]

Proximal algorithms

[PB14] Neal Parikh and Stephen Boyd. “Proximal algorithms”. In:Foundations and trends® in Opti- mization1.3 (2014), pp. 127–239. [Pop80] Leonid Denisovich Popov. “A modification of the Arrow-Hurwicz method for search of saddle points”. In:Mathematical notes of the Academy of Sciences of the USSR28.5 (1980), pp. 845–

work page 2014
[7]

Ryu, Kun Yuan, and Wotao Yin

[RYY19] Ernest K Ryu, Kun Yuan, and Wotao Yin. “Ode analysis of stochastic gradient methods with optimism and anchoring for minimax problems”. In:arXiv preprint arXiv:1905.10899(2019). [SST+26] Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Fran- cisco J. R. Ruiz, Pushmeet Kohli, and Swarat Chaudhuri.An Improved Last-Iter...

work page arXiv 1905
[8]

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

arXiv:2604.03782 [math.OC]. [TN26] Quoc Tran-Dinh and Nghia Nguyen-Trung. “Revisiting Extragradient-Type Methods: Part 1—Gen- eralizations and Sublinear Convergence Rates: Q. Tran-Dinh, N. Nguyen-Trung”. In:Compu- tational Optimization and Applications(2026), pp. 1–70. [Tra24] Quoc Tran-Dinh. “From Halpern’s fixed-point iterations to Nesterov’s accelerate...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

arXiv preprint arXiv:2203.10947 , year=

[BCN22] Radu Ioan Bot, Ern ¨o Robert Csetnek, and Dang-Khoa Nguyen. “Fast OGDA in continuous and discrete time”. In:arXiv preprint arXiv:2203.10947(2022). [BNZ23] Radu Ioan Bot, Dang-Khoa Nguyen, and Chunxiang Zong. “Fast Forward-Backward splitting for monotone inclusions with a convergence rate of the tangent residual ofo(1/k)”. In:arXiv preprint arXiv:2...

work page arXiv 2022

[2] [2]

Accelerated Single-Call Methods for Constrained Min-Max Optimization

[CZ23a] Yang Cai and Weiqiang Zheng. “Accelerated Single-Call Methods for Constrained Min-Max Optimization”. In:International Conference on Learning Representations (ICLR)(2023). [CZ23b] Yang Cai and Weiqiang Zheng. “Doubly optimal no-regret learning in monotone games”. In: International Conference on Machine Learning. PMLR. 2023, pp. 3507–3524. [DP18] Co...

work page 2023

[3] [3]

Extragradient method:O(1/K)last- iterate convergence for monotone variational inequalities and connections with cocoercivity

[GLG22] Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. “Extragradient method:O(1/K)last- iterate convergence for monotone variational inequalities and connections with cocoercivity”. In:International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR. 2022, pp. 366–402. [GPD+20] Noah Golowich, Sarath Pattathil, Constantinos Daskala...

work page 2022

[4] [4]

Tight last-iterate convergence rates for no-regret learning in multi-player games

[GPD20] Noah Golowich, Sarath Pattathil, and Constantinos Daskalakis. “Tight last-iterate convergence rates for no-regret learning in multi-player games”. In:Advances in neural information pro- cessing systems (NeurIPS)(2020). 10 [GTG22] Eduard Gorbunov, Adrien Taylor, and Gauthier Gidel. “Last-Iterate Convergence of Optimistic Gradient Method for Monoton...

work page 2020

[5] [5]

Fixed points of nonexpanding maps

[Hal67] Benjamin Halpern. “Fixed points of nonexpanding maps”. In:Bulletin of the American Math- ematical Society73.6 (1967), pp. 957–961. [KG22] Dmitry Kovalev and Alexander Gasnikov. “The first optimal algorithm for smooth and strongly- convex-strongly-concave minimax optimization”. In:Advances in Neural Information Process- ing Systems35 (2022), pp. 14...

work page 1967

[6] [6]

Proximal algorithms

[PB14] Neal Parikh and Stephen Boyd. “Proximal algorithms”. In:Foundations and trends® in Opti- mization1.3 (2014), pp. 127–239. [Pop80] Leonid Denisovich Popov. “A modification of the Arrow-Hurwicz method for search of saddle points”. In:Mathematical notes of the Academy of Sciences of the USSR28.5 (1980), pp. 845–

work page 2014

[7] [7]

Ryu, Kun Yuan, and Wotao Yin

[RYY19] Ernest K Ryu, Kun Yuan, and Wotao Yin. “Ode analysis of stochastic gradient methods with optimism and anchoring for minimax problems”. In:arXiv preprint arXiv:1905.10899(2019). [SST+26] Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Fran- cisco J. R. Ruiz, Pushmeet Kohli, and Swarat Chaudhuri.An Improved Last-Iter...

work page arXiv 1905

[8] [8]

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

arXiv:2604.03782 [math.OC]. [TN26] Quoc Tran-Dinh and Nghia Nguyen-Trung. “Revisiting Extragradient-Type Methods: Part 1—Gen- eralizations and Sublinear Convergence Rates: Q. Tran-Dinh, N. Nguyen-Trung”. In:Compu- tational Optimization and Applications(2026), pp. 1–70. [Tra24] Quoc Tran-Dinh. “From Halpern’s fixed-point iterations to Nesterov’s accelerate...

work page internal anchor Pith review Pith/arXiv arXiv 2026