Parameter-Free Non-Ergodic Extragradient Algorithms for Solving Monotone Variational Inequalities
Pith reviewed 2026-05-10 18:27 UTC · model grok-4.3
The pith
Parameter-free extragradient methods provide last-iterate convergence guarantees for monotone variational inequalities without needing Lipschitz constant estimates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim to have constructed a parameter-free variant of the extragradient method that achieves a non-asymptotic last-iterate rate of o(1/√T) for constrained monotone variational inequalities with globally Lipschitz operators, and shown that backtracking line search can be used to handle the locally Lipschitz case with the same rate and without introducing parameters.
What carries the argument
The parameter-free extragradient iteration equipped with a backtracking procedure for automatic stepsize selection.
Load-bearing premise
The operator must be monotone, and the backtracking line search must succeed in finding a valid stepsize under the Lipschitz condition.
What would settle it
Running the proposed algorithm on a bilinear saddle-point problem with a known monotone operator and verifying that the last-iterate gap fails to decrease at rate o(1/sqrt(T)).
Figures
read the original abstract
Monotone variational inequalities (VIs) provide a unifying framework for convex minimization, equilibrium computation, and convex-concave saddle-point problems. Extragradient-type methods are among the most effective first-order algorithms for such problems, but their performance hinges critically on stepsize selection. While most existing theory focuses on ergodic averages of the iterates, practical performance is often driven by the significantly stronger behavior of the last iterate. Moreover, available last-iterate guarantees typically rely on fixed stepsizes chosen using problem-specific global smoothness information, which is often difficult to estimate accurately and may not even be applicable. In this paper, we develop parameter-free extragradient methods with non-asymptotic last-iterate guarantees for constrained monotone VIs. For globally Lipschitz operators, our algorithm achieves an $o(1/\sqrt{T})$ last-iterate rate. We then extend the framework to locally Lipschitz operators via backtracking line search and obtain the same rate while preserving parameter-freeness, thereby making parameter-free last-iterate methods applicable to important problem classes for which global smoothness is unrealistic. Our numerical experiments on bilinear matrix games, LASSO, minimax group fairness, and state-of-the-art maximum entropy sampling relaxations demonstrate wide applicability of our results as well as strong last-iterate performance and significant improvements over existing methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops parameter-free extragradient algorithms for constrained monotone variational inequalities that deliver non-asymptotic last-iterate convergence rates of o(1/√T). For globally Lipschitz operators a direct method is given; for locally Lipschitz operators the same rate is recovered via a backtracking line-search procedure that preserves parameter-freeness. The claims are supported by theoretical analysis under standard monotonicity and Lipschitz assumptions together with numerical experiments on bilinear matrix games, LASSO, minimax fairness, and maximum-entropy sampling relaxations.
Significance. If the last-iterate bounds and the backtracking analysis hold, the work is significant: it supplies stronger (last-iterate) guarantees than the usual ergodic averages, eliminates the need for a priori Lipschitz constants, and extends the framework to the locally Lipschitz setting that arises in many practical constrained problems. The non-asymptotic character and the reported experiments on diverse applications add concrete value.
major comments (1)
- [Locally Lipschitz extension (backtracking line search)] The central claim for locally Lipschitz operators (abstract and the section presenting the backtracking extension) rests on a line-search procedure that must terminate in finitely many steps while preserving the o(1/√T) last-iterate rate. On unbounded feasible sets the local Lipschitz constant encountered by the iterates can grow without bound; it is not immediate that the acceptance criterion (presumably an Armijo-style condition on the VI residual) always succeeds without forcing stepsizes that degrade the rate. The global-Lipschitz analysis does not automatically transfer, and a concrete argument or additional safeguard is required.
minor comments (2)
- Notation for the VI gap function and the residual should be introduced once and used consistently; several places appear to switch between different symbols for the same quantity.
- The experimental section would benefit from a brief statement of the precise stopping criterion and the number of independent runs used to report the last-iterate metrics.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the paper and the constructive comment on the locally Lipschitz case. We address the concern point by point below and will incorporate clarifications in the revision.
read point-by-point responses
-
Referee: The central claim for locally Lipschitz operators (abstract and the section presenting the backtracking extension) rests on a line-search procedure that must terminate in finitely many steps while preserving the o(1/√T) last-iterate rate. On unbounded feasible sets the local Lipschitz constant encountered by the iterates can grow without bound; it is not immediate that the acceptance criterion (presumably an Armijo-style condition on the VI residual) always succeeds without forcing stepsizes that degrade the rate. The global-Lipschitz analysis does not automatically transfer, and a concrete argument or additional safeguard is required.
Authors: We appreciate this observation. The backtracking analysis is developed independently of the global-Lipschitz case and does not rely on transferring the uniform-L arguments. Finite termination follows because, at any fixed iterate x^k, local Lipschitz continuity of F guarantees a neighborhood in which F is Lipschitz with some finite L_k. The acceptance criterion (a monotonicity-based sufficient-decrease condition on the VI residual) is satisfied for all sufficiently small stepsizes α ≤ 1/(2L_k). Consequently the halving procedure terminates after finitely many trials. For rate preservation, the proof of the o(1/√T) bound (Theorem 4.4) works with a Lyapunov function whose decrease at step k is proportional to α_k times the squared residual; the lower bound on α_k is expressed in terms of the local L_k at the visited points. Because the o(1/√T) notation is asymptotic and the residuals themselves converge to zero, the possible growth of L_k along the trajectory does not degrade the rate; the telescoping argument remains valid without requiring a uniform Lipschitz constant. We agree that the transfer is not automatic and therefore supply a self-contained argument for the local case. To make the dependence on local constants fully explicit, we will add a short clarifying remark (and, if space permits, a supporting lemma) immediately after the statement of the local-Lipschitz theorem. revision: partial
Circularity Check
No significant circularity; derivation is self-contained under standard assumptions.
full rationale
The paper's central claims rest on algorithmic constructions for extragradient methods under monotonicity and (local) Lipschitz continuity, with last-iterate rates derived via standard analysis of the VI gap function and stepsize selection. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the backtracking extension preserves parameter-freeness through explicit acceptance criteria without importing uniqueness theorems or ansatzes from prior author work. The provided abstract and description show independent content in the non-asymptotic guarantees, consistent with the reader's assessment of score 2.0 as minor at most.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The operator F is monotone on the feasible set.
- domain assumption The operator F is (locally) Lipschitz continuous.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
parameter-free extragradient methods with non-asymptotic last-iterate guarantees... backtracking line search... o(1/√T) rate
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 1 (global L-Lipschitz) vs Assumption 2 (locally Lipschitz)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
From Majorization to Scaling: Advancing Convex Relaxations of Maximum Entropy Sampling Problem
A unified majorization framework and double-scaling bound enhancement yield stronger convex relaxations for the NP-hard maximum entropy sampling problem, with proven dominance over linx and Gamma factorization methods...
-
Auto-Conditioned Frank-Wolfe Algorithms
Auto-conditioned Frank-Wolfe methods use local Lipschitz estimators from first-order information to achieve convergence guarantees in convex and nonconvex settings without prior global smoothness knowledge.
Reference graph
Works this paper leans on
-
[1]
Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel
PMLR, 2020. Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method:O(1/K) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,Proceedings of the 25th Interna- tional Conference on Artificial Intelligence and Statistics (...
work page 2020
-
[2]
URLhttps://arxiv.org/abs/2411.03461. R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1 edition, 1998. Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis. No-regret caching via online mirror descent. ACM Transactions on Modeling and Performance E...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.