Parameter-Free Non-Ergodic Extragradient Algorithms for Solving Monotone Variational Inequalities

Fatma K{\i}l{\i}n\c{c}-Karzan; Lingqing Shen

arxiv: 2604.07662 · v2 · submitted 2026-04-09 · 🧮 math.OC · cs.LG

Parameter-Free Non-Ergodic Extragradient Algorithms for Solving Monotone Variational Inequalities

Lingqing Shen , Fatma K{\i}l{\i}n\c{c}-Karzan This is my paper

Pith reviewed 2026-05-10 18:27 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords monotone variational inequalitiesextragradient methodsparameter-free algorithmslast-iterate convergencebacktracking line searchconstrained optimizationsaddle point problems

0 comments

The pith

Parameter-free extragradient methods provide last-iterate convergence guarantees for monotone variational inequalities without needing Lipschitz constant estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops extragradient-type algorithms for monotone variational inequalities that require no problem-specific parameters for stepsize choice. The algorithms guarantee that the last iterate converges at a rate of o(1 over the square root of the number of iterations) when the operator is globally Lipschitz continuous. An extension using backtracking line search allows the same guarantee for locally Lipschitz operators while remaining parameter-free. This is useful because real-world problems often do not provide easy access to global smoothness information, and algorithms are typically run until the last iterate is used. Experiments on several applications show competitive performance compared to methods that rely on fixed stepsizes.

Core claim

The authors claim to have constructed a parameter-free variant of the extragradient method that achieves a non-asymptotic last-iterate rate of o(1/√T) for constrained monotone variational inequalities with globally Lipschitz operators, and shown that backtracking line search can be used to handle the locally Lipschitz case with the same rate and without introducing parameters.

What carries the argument

The parameter-free extragradient iteration equipped with a backtracking procedure for automatic stepsize selection.

Load-bearing premise

The operator must be monotone, and the backtracking line search must succeed in finding a valid stepsize under the Lipschitz condition.

What would settle it

Running the proposed algorithm on a bilinear saddle-point problem with a known monotone operator and verifying that the last-iterate gap fails to decrease at rate o(1/sqrt(T)).

Figures

Figures reproduced from arXiv: 2604.07662 by Fatma K{\i}l{\i}n\c{c}-Karzan, Lingqing Shen.

**Figure 2.** Figure 2: Convergence comparison of different algorithms for LASSO instances. [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence comparison of different algorithms for solving group fairness classification. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison for convergence and iteration count and solution time (seconds) to reach the [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

read the original abstract

Monotone variational inequalities (VIs) provide a unifying framework for convex minimization, equilibrium computation, and convex-concave saddle-point problems. Extragradient-type methods are among the most effective first-order algorithms for such problems, but their performance hinges critically on stepsize selection. While most existing theory focuses on ergodic averages of the iterates, practical performance is often driven by the significantly stronger behavior of the last iterate. Moreover, available last-iterate guarantees typically rely on fixed stepsizes chosen using problem-specific global smoothness information, which is often difficult to estimate accurately and may not even be applicable. In this paper, we develop parameter-free extragradient methods with non-asymptotic last-iterate guarantees for constrained monotone VIs. For globally Lipschitz operators, our algorithm achieves an $o(1/\sqrt{T})$ last-iterate rate. We then extend the framework to locally Lipschitz operators via backtracking line search and obtain the same rate while preserving parameter-freeness, thereby making parameter-free last-iterate methods applicable to important problem classes for which global smoothness is unrealistic. Our numerical experiments on bilinear matrix games, LASSO, minimax group fairness, and state-of-the-art maximum entropy sampling relaxations demonstrate wide applicability of our results as well as strong last-iterate performance and significant improvements over existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives parameter-free extragradient methods with non-asymptotic last-iterate o(1/sqrt(T)) rates for monotone VIs and extends them to local Lipschitz via backtracking.

read the letter

The main thing here is a set of extragradient schemes that run without any stepsize or Lipschitz tuning yet still deliver non-asymptotic last-iterate rates of o(1/sqrt(T)) on constrained monotone variational inequalities. The global-Lipschitz version is straightforward, and the authors add a backtracking line-search version that keeps the same rate for locally Lipschitz operators while staying parameter-free. That pairing is the concrete advance over most existing last-iterate analyses, which usually fix the stepsize using a known global constant or fall back to ergodic averages. The experiments on bilinear games, LASSO, group fairness, and max-entropy sampling show the methods are usable and improve last-iterate behavior over some baselines. The global case looks solid on standard assumptions. The local-Lipschitz extension is the part worth watching. The stress-test point about unbounded feasible sets is fair: without a uniform Lipschitz bound, the line search could in principle take many trials or force steps that degrade the rate, and the abstract does not spell out how the acceptance criterion prevents that. If the proofs control the backtrack count and keep the o(1/sqrt(T)) intact, the claim holds; otherwise the global version is the reliable part. This is aimed at people who actually implement VI solvers for optimization or game problems and want something that does not require manual tuning. A reader working on first-order methods for saddle points or equilibria would find the algorithms and numerics useful to compare against. The work is grounded enough in monotone VI assumptions to deserve a serious referee, mainly to check the local-case analysis and the exact rate statements. I would send it out for review.

Referee Report

1 major / 2 minor

Summary. The paper develops parameter-free extragradient algorithms for constrained monotone variational inequalities that deliver non-asymptotic last-iterate convergence rates of o(1/√T). For globally Lipschitz operators a direct method is given; for locally Lipschitz operators the same rate is recovered via a backtracking line-search procedure that preserves parameter-freeness. The claims are supported by theoretical analysis under standard monotonicity and Lipschitz assumptions together with numerical experiments on bilinear matrix games, LASSO, minimax fairness, and maximum-entropy sampling relaxations.

Significance. If the last-iterate bounds and the backtracking analysis hold, the work is significant: it supplies stronger (last-iterate) guarantees than the usual ergodic averages, eliminates the need for a priori Lipschitz constants, and extends the framework to the locally Lipschitz setting that arises in many practical constrained problems. The non-asymptotic character and the reported experiments on diverse applications add concrete value.

major comments (1)

[Locally Lipschitz extension (backtracking line search)] The central claim for locally Lipschitz operators (abstract and the section presenting the backtracking extension) rests on a line-search procedure that must terminate in finitely many steps while preserving the o(1/√T) last-iterate rate. On unbounded feasible sets the local Lipschitz constant encountered by the iterates can grow without bound; it is not immediate that the acceptance criterion (presumably an Armijo-style condition on the VI residual) always succeeds without forcing stepsizes that degrade the rate. The global-Lipschitz analysis does not automatically transfer, and a concrete argument or additional safeguard is required.

minor comments (2)

Notation for the VI gap function and the residual should be introduced once and used consistently; several places appear to switch between different symbols for the same quantity.
The experimental section would benefit from a brief statement of the precise stopping criterion and the number of independent runs used to report the last-iterate metrics.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the paper and the constructive comment on the locally Lipschitz case. We address the concern point by point below and will incorporate clarifications in the revision.

read point-by-point responses

Referee: The central claim for locally Lipschitz operators (abstract and the section presenting the backtracking extension) rests on a line-search procedure that must terminate in finitely many steps while preserving the o(1/√T) last-iterate rate. On unbounded feasible sets the local Lipschitz constant encountered by the iterates can grow without bound; it is not immediate that the acceptance criterion (presumably an Armijo-style condition on the VI residual) always succeeds without forcing stepsizes that degrade the rate. The global-Lipschitz analysis does not automatically transfer, and a concrete argument or additional safeguard is required.

Authors: We appreciate this observation. The backtracking analysis is developed independently of the global-Lipschitz case and does not rely on transferring the uniform-L arguments. Finite termination follows because, at any fixed iterate x^k, local Lipschitz continuity of F guarantees a neighborhood in which F is Lipschitz with some finite L_k. The acceptance criterion (a monotonicity-based sufficient-decrease condition on the VI residual) is satisfied for all sufficiently small stepsizes α ≤ 1/(2L_k). Consequently the halving procedure terminates after finitely many trials. For rate preservation, the proof of the o(1/√T) bound (Theorem 4.4) works with a Lyapunov function whose decrease at step k is proportional to α_k times the squared residual; the lower bound on α_k is expressed in terms of the local L_k at the visited points. Because the o(1/√T) notation is asymptotic and the residuals themselves converge to zero, the possible growth of L_k along the trajectory does not degrade the rate; the telescoping argument remains valid without requiring a uniform Lipschitz constant. We agree that the transfer is not automatic and therefore supply a self-contained argument for the local case. To make the dependence on local constants fully explicit, we will add a short clarifying remark (and, if space permits, a supporting lemma) immediately after the statement of the local-Lipschitz theorem. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under standard assumptions.

full rationale

The paper's central claims rest on algorithmic constructions for extragradient methods under monotonicity and (local) Lipschitz continuity, with last-iterate rates derived via standard analysis of the VI gap function and stepsize selection. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the backtracking extension preserves parameter-freeness through explicit acceptance criteria without importing uniqueness theorems or ansatzes from prior author work. The provided abstract and description show independent content in the non-asymptotic guarantees, consistent with the reader's assessment of score 2.0 as minor at most.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions from variational inequality theory rather than new free parameters or invented entities. No fitted constants are introduced; the parameter-free property is the main contribution.

axioms (2)

domain assumption The operator F is monotone on the feasible set.
Invoked throughout as the core property enabling extragradient convergence.
domain assumption The operator F is (locally) Lipschitz continuous.
Required for the stepsize selection and rate proofs; backtracking handles the local case.

pith-pipeline@v0.9.0 · 5548 in / 1437 out tokens · 41495 ms · 2026-05-10T18:27:52.712519+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

parameter-free extragradient methods with non-asymptotic last-iterate guarantees... backtracking line search... o(1/√T) rate
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1 (global L-Lipschitz) vs Assumption 2 (locally Lipschitz)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Majorization to Scaling: Advancing Convex Relaxations of Maximum Entropy Sampling Problem
math.OC 2026-04 unverdicted novelty 7.0

A unified majorization framework and double-scaling bound enhancement yield stronger convex relaxations for the NP-hard maximum entropy sampling problem, with proven dominance over linx and Gamma factorization methods...
Auto-Conditioned Frank-Wolfe Algorithms
math.OC 2026-05 unverdicted novelty 5.0

Auto-conditioned Frank-Wolfe methods use local Lipschitz estimators from first-order information to achieve convergence guarantees in convex and nonconvex settings without prior global smoothness knowledge.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · cited by 2 Pith papers

[1]

Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel

PMLR, 2020. Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method:O(1/K) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,Proceedings of the 25th Interna- tional Conference on Artificial Intelligence and Statistics (...

work page 2020
[2]

URLhttps://arxiv.org/abs/2411.03461. R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1 edition, 1998. Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis. No-regret caching via online mirror descent. ACM Transactions on Modeling and Performance E...

work page arXiv 1998

[1] [1]

Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel

PMLR, 2020. Eduard Gorbunov, Nicolas Loizou, and Gauthier Gidel. Extragradient method:O(1/K) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors,Proceedings of the 25th Interna- tional Conference on Artificial Intelligence and Statistics (...

work page 2020

[2] [2]

URLhttps://arxiv.org/abs/2411.03461. R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 1 edition, 1998. Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis. No-regret caching via online mirror descent. ACM Transactions on Modeling and Performance E...

work page arXiv 1998