RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Abolfazl Hashemi; Antesh Upadhyay; Behzad Sharif; M. Berk Sahin; Zhankun Luo

arxiv: 2603.22155 · v2 · submitted 2026-03-23 · 💻 cs.LG · math.OC

RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Zhankun Luo , M. Berk Sahin , Antesh Upadhyay , Behzad Sharif , Abolfazl Hashemi This is my paper

Pith reviewed 2026-05-15 00:22 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords variational inequalitiesextragradientunbiased estimationvariance reductionantithetic samplingconvergence ratesroot findingconvex-concave games

0 comments

The pith

Randomized mid-point sampling produces unbiased extragradient updates for variational inequalities with O(1/k) convergence

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that classic extragradient can introduce discretization bias on nonlinear vector fields. It replaces the fixed extrapolation step with randomized mid-point sampling to keep updates unbiased. The variance-reduced variant RAMPAGE+ adds antithetic sampling so negative correlation cancels first-order variance terms, turning the method into an unbiased geometric path-integrator. Both achieve O(1/k) rates for root finding under co-coercive, co-hypomonotone, and generalized Lipschitz regimes, plus guarantees for smooth convex-concave games. A reader would care because bias in long iterative solvers can shift solutions away from true equilibria in optimization and game problems.

Core claim

Extragradient may suffer from discretization bias when applied to non-linear vector fields. RAMPAGE resolves this via randomized mid-point sampling to achieve unbiased updates, while RAMPAGE+ leverages antithetic sampling to act as an unbiased geometric path-integrator that completely removes internal first-order terms from the variance, yielding provable O(1/k) convergence for root finding under co-coercive, co-hypomonotone, and generalized Lipschitzness regimes as well as for stochastic and deterministic smooth convex-concave games.

What carries the argument

Randomized mid-point sampling in the extrapolation step, which evaluates the vector field at a stochastic convex combination between the current iterate and its extrapolated point to debias the update.

If this is right

Unbiased updates prevent accumulation of discretization error over many iterations.
RAMPAGE+ removes first-order variance contributions through negative correlation.
O(1/k) rates hold for root finding in co-coercive, co-hypomonotone, and generalized Lipschitz regimes.
Symmetrically scaled variants extend the approach to constrained variational inequalities.
The methods apply to both stochastic and deterministic smooth convex-concave games with deterministic bounds in several cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric path-integrator view may suggest similar debiasing for other first-order discretization schemes in numerical ODE methods.
Antithetic sampling could combine with momentum or other accelerators to further lower variance in high-dimensional equilibrium problems.
Empirical runs on quadratic and mildly nonlinear monotone operators would directly measure residual bias after many steps.

Load-bearing premise

That randomized mid-point sampling with antithetic variance reduction produces unbiased updates and removes first-order variance terms without introducing new biases under the stated co-coercivity and Lipschitz regimes.

What would settle it

A numerical test or closed-form calculation on a simple nonlinear co-coercive operator showing that the expected RAMPAGE update does not match the continuous vector field or converges to the wrong fixed point.

read the original abstract

A celebrated method for Variational Inequalities (VIs) is Extragradient (EG), which can be viewed as a standard discrete-time integration scheme. With this view in mind, in this paper we show that EG may suffer from discretization bias when applied to non-linear vector fields, conservative or otherwise. To resolve this discretization shortcoming, we introduce RAndomized Mid-Point for debiAsed Gradient Extrapolation (RAMPAGE) and its variance-reduced counterpart, RAMPAGE+, which leverages antithetic sampling. In contrast with EG, both methods are unbiased. Furthermore, leveraging negative correlation, RAMPAGE+ acts as an unbiased, geometric path-integrator that completely removes internal first-order terms from the variance, provably improving upon RAMPAGE. We further demonstrate that both methods enjoy provable $\mathcal{O}(1/k)$ convergence guarantees for a range of problems including root finding under co-coercive, co-hypomonotone, and generalized Lipschitzness regimes. Furthermore, we introduce symmetrically scaled variants to extend our results to constrained VIs. Finally, we provide convergence guarantees of both methods for stochastic and deterministic smooth convex-concave games. Somewhat interestingly, despite being a randomized method, RAMPAGE+ attains purely deterministic bounds for a number of the studied settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAMPAGE gives a randomized midpoint fix to make Extragradient unbiased for nonlinear VIs, with an antithetic version that removes first-order variance.

read the letter

The main thing here is a direct fix for discretization bias in Extragradient by sampling a random midpoint instead of the usual two-step extrapolation. RAMPAGE claims unbiasedness via linearity of expectation, and RAMPAGE+ adds antithetic sampling to cancel the leading variance term, yielding an unbiased geometric integrator. Both get O(1/k) rates for root finding under co-coercive, co-hypomonotone, and generalized Lipschitz regimes, plus extensions to constrained VIs via symmetric scaling and to stochastic/deterministic smooth convex-concave games. The deterministic bounds from a randomized method are the part that catches attention if the analysis holds.

Referee Report

2 major / 3 minor

Summary. The manuscript claims that Extragradient (EG) incurs discretization bias on non-linear vector fields when viewed as a discrete integration scheme for variational inequalities. It introduces RAMPAGE, which employs randomized mid-point sampling to produce unbiased gradient extrapolations, and RAMPAGE+, which augments this with antithetic sampling to cancel first-order variance contributions while remaining unbiased. Both methods are asserted to achieve O(1/k) convergence for root-finding under co-coercive, co-hypomonotone, and generalized Lipschitz regimes, with extensions to symmetrically scaled variants for constrained VIs and to stochastic/deterministic smooth convex-concave games; RAMPAGE+ is further claimed to deliver deterministic bounds despite randomization.

Significance. If the unbiasedness via linearity of expectation and the variance cancellation hold, the work supplies a principled Monte Carlo debiasing technique that strengthens convergence analysis for non-linear VIs beyond standard EG. The deterministic guarantees for a randomized method and the coverage of multiple regimes (including games) represent concrete strengths that could influence practical algorithm design in optimization.

major comments (2)

[§3] §3, unbiasedness argument: the claim that randomized mid-point sampling yields an unbiased estimator of the path integral relies on linearity of expectation independent of F; however, for non-conservative fields the integration path must be explicitly specified to ensure the expectation equals the continuous operator without residual discretization error.
[Theorem 4.1] Theorem 4.1 (O(1/k) rate under co-hypomonotonicity): the proof sketch invokes the variance reduction of RAMPAGE+ to tighten the bound, but the step from negative correlation to complete removal of first-order terms requires an explicit variance expansion (analogous to Eq. (12) for RAMPAGE) to confirm no higher-order residuals remain under the stated Lipschitz regime.

minor comments (3)

[§5] The symmetrically scaled variants for constrained VIs are introduced in §5 but lack a clear statement of how the scaling parameter is selected (e.g., via projection or normalization) and whether it preserves the unbiasedness property.
Notation for the generalized Lipschitzness regime should be aligned with standard references (e.g., explicit constant L vs. local Lipschitz) to avoid ambiguity when comparing to co-coercive assumptions.
[Figure 2] Figure 2 (convergence plots) would benefit from error bars or multiple random seeds to illustrate the variance reduction claimed for RAMPAGE+ over RAMPAGE.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment and constructive comments, which help strengthen the presentation of our unbiased debiasing approach. We address each major comment below and will incorporate the suggested clarifications and expansions in the revised manuscript.

read point-by-point responses

Referee: [§3] §3, unbiasedness argument: the claim that randomized mid-point sampling yields an unbiased estimator of the path integral relies on linearity of expectation independent of F; however, for non-conservative fields the integration path must be explicitly specified to ensure the expectation equals the continuous operator without residual discretization error.

Authors: We agree that the integration path should be stated explicitly for rigor, particularly when the vector field is non-conservative. In the revised manuscript we will define the path as the straight-line segment from the current point x_k to the extrapolated point x_k + γ F(x_k). Using only linearity of expectation, we will prove that the expectation of the randomized midpoint estimator equals the line integral of F along this segment, with no residual discretization bias in the mean. The argument does not rely on path independence or conservativeness and therefore holds for general Lipschitz fields. revision: yes
Referee: [Theorem 4.1] Theorem 4.1 (O(1/k) rate under co-hypomonotonicity): the proof sketch invokes the variance reduction of RAMPAGE+ to tighten the bound, but the step from negative correlation to complete removal of first-order terms requires an explicit variance expansion (analogous to Eq. (12) for RAMPAGE) to confirm no higher-order residuals remain under the stated Lipschitz regime.

Authors: We appreciate the suggestion to make the variance analysis fully explicit. In the revision we will insert a detailed variance expansion for RAMPAGE+ (modeled on Eq. (12) for RAMPAGE) that isolates the cross term arising from antithetic sampling. Under the generalized Lipschitz assumption this cross term exactly cancels the leading first-order variance contribution, leaving only O(γ²) residuals that are absorbed into the existing O(1/k) bound. The expanded derivation confirms that no uncancelled first-order terms survive, thereby justifying the tightened rate stated in Theorem 4.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper establishes unbiasedness of the RAMPAGE estimator directly from linearity of expectation applied to randomized midpoint sampling of the vector field, a standard Monte Carlo identity that holds independently of the field's linearity or the target result. Variance reduction in RAMPAGE+ follows from explicit cancellation of first-order terms under antithetic sampling, again by direct algebraic expansion without parameter fitting or self-referential definitions. O(1/k) convergence rates are then obtained as standard extensions once unbiasedness is granted, under the stated co-coercivity, co-hypomonotonicity, and Lipschitz assumptions; these steps do not reduce to fitted inputs, self-citations, or ansatzes imported from prior work by the same authors. The argument structure is self-contained against external benchmarks such as classical extragradient analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from variational inequality theory rather than new free parameters or invented entities.

axioms (2)

domain assumption Vector field satisfies co-coercivity, co-hypomonotonicity, or generalized Lipschitzness
Invoked to obtain O(1/k) convergence for root finding.
domain assumption Smooth convex-concave structure for game settings
Used for stochastic and deterministic game convergence guarantees.

pith-pipeline@v0.9.0 · 5548 in / 1409 out tokens · 47783 ms · 2026-05-15T00:22:16.083637+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation
cs.LG 2026-05 unverdicted novelty 7.0

A unified recursion framework for stochastic variance-reduced estimation yields high-probability bounds and the first Õ(ε^{-3}) oracle complexity for stochastic optimization with expectation constraints.