Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity

Shuailong Zhu; Xiaohui Chen

arxiv: 2501.14993 · v3 · pith:I74WVEBSnew · submitted 2025-01-25 · 🧮 math.OC · stat.ML

Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity

Shuailong Zhu , Xiaohui Chen This is my paper

Pith reviewed 2026-05-23 05:07 UTC · model grok-4.3

classification 🧮 math.OC stat.ML

keywords Wasserstein proximal algorithmPolyak-Łojasiewicz inequalityconvergence analysisgeodesic convexitymean-field neural networksWasserstein gradient flowsoptimization on metric spaces

0 comments

The pith

The Wasserstein proximal algorithm achieves linear convergence under a Polyak-Łojasiewicz inequality without needing geodesic convexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes the convergence of the proximal algorithm in Wasserstein space for minimizing functionals on probability measures. It shows that a Wasserstein version of the Polyak-Łojasiewicz inequality is enough for the algorithm to converge linearly and without bias. This removes the need for strong geodesic convexity assumed in earlier work. The result is motivated by mean-field neural network training and is supported by numerical experiments showing faster convergence than noisy gradient descent.

Core claim

Under a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality, the proximal algorithm achieves an unbiased and linear convergence rate that improves upon existing rates under strong geodesic convexity. The analysis extends to the inexact proximal algorithm for geodesically semiconvex objectives.

What carries the argument

The Wasserstein analog of the Polyak-Łojasiewicz inequality, which bounds the functional value by the squared Wasserstein gradient norm and enables linear convergence of the proximal map iterations.

Load-bearing premise

The objective functional satisfies a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality.

What would settle it

A concrete functional on the space of probability measures that satisfies the Wasserstein PL inequality but for which the proximal algorithm fails to exhibit linear convergence.

read the original abstract

The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity of the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-{\L}ojasiewicz inequality, we establish that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves linear convergence for the Wasserstein proximal algorithm under a PL-type condition without geodesic convexity and claims a better rate than the strongly convex case.

read the letter

The main thing to know is that this work drops the geodesic convexity assumption that earlier papers on Wasserstein proximal methods relied on. It replaces that with a Wasserstein version of the Polyak-Łojasiewicz inequality and derives an unbiased linear rate from it. The rate is stated to improve on what was known under strong geodesic convexity. The proof is presented as self-contained, and the authors also cover the inexact proximal case for geodesically semiconvex objectives. The numerical experiments on mean-field neural networks show the proximal method beating noisy gradient descent in convergence speed, which is a concrete check on the motivating application. That part is useful because it ties the theory back to the training dynamics the abstract mentions. The central assumption is the PL inequality itself. The paper treats it as given rather than deriving verifiable conditions for when it holds in the neural-net setting, so the result stays conditional. Readers will still need to confirm the inequality separately for their functionals. The numerics are limited to a few examples and do not include a direct rate comparison against the strong-convexity baseline, which would have strengthened the improvement claim. Overall the argument is straightforward and avoids obvious circularity. This is for people working on distribution optimization and mean-field limits. It is worth sending to a serious referee because it cleanly relaxes a standard hypothesis with a direct proof and some supporting experiments.

Referee Report

0 major / 1 minor

Summary. The manuscript analyzes convergence of the Wasserstein proximal algorithm for minimizing functionals over probability measures. It establishes an unbiased linear convergence rate under a Wasserstein analog of the Polyak-Łojasiewicz inequality, without assuming geodesic convexity of the objective. The claimed rate improves on prior results that require strong geodesic convexity. The analysis is extended to the inexact proximal algorithm under geodesic semiconvexity, and numerical experiments on mean-field neural network training are included to compare against noisy gradient descent.

Significance. If the central derivation holds, the result meaningfully widens the scope of proximal methods in Wasserstein space by replacing strong geodesic convexity with a PL-type condition that is explicitly invoked and may apply to non-convex problems such as mean-field neural network training. The self-contained nature of the analysis and the explicit improvement claim over the convexity case are positive features.

minor comments (1)

[Abstract] Abstract: the phrase 'unbiased and linear convergence rate' should be defined more precisely (e.g., whether 'unbiased' refers to the absence of an additive error term in the rate or to a property of the proximal map).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, significance assessment, and recommendation of minor revision. The report contains no enumerated major comments, so we have no specific points requiring point-by-point rebuttal. We remain available to address any minor issues or clarifications the referee or editor may wish to raise.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained from external PL assumption

full rationale

The paper derives linear convergence of the Wasserstein proximal algorithm from an explicitly stated external assumption (the Wasserstein analog of the Polyak-Łojasiewicz inequality) without geodesic convexity. This is a standard implication in optimization analysis: the rate follows directly from the inequality via the proximal map definition, with no self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claim to unverified inputs. The improvement over strong geodesic convexity cases is a comparison of rates under different assumptions, not a circular reduction. The manuscript structure and abstract confirm the result is conditional on the stated weakest assumption rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Wasserstein PL inequality as a domain assumption; no free parameters or invented entities are indicated in the abstract.

axioms (1)

domain assumption The objective functional satisfies a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality.
This condition is invoked to obtain the linear convergence rate without geodesic convexity.

pith-pipeline@v0.9.0 · 5682 in / 1058 out tokens · 54808 ms · 2026-05-23T05:07:24.804138+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality, the proximal algorithm achieves an unbiased and linear convergence rate.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Wasserstein proximal algorithm (2) ... JKO scheme

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space
math.OC 2026-05 unverdicted novelty 7.0

Introduces WSFN, a Newton-type method on Wasserstein space that escapes saddle points in polynomial time and achieves linear convergence to global minimizers under benign landscape assumptions.