Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis

Bohan Zhou; Changbo Zhu; Kaheon Kim; Xiaohui Chen

arxiv: 2505.13660 · v2 · submitted 2025-05-19 · 🧮 math.OC · cs.LG· stat.ML

Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis

Kaheon Kim , Bohan Zhou , Changbo Zhu , Xiaohui Chen This is my paper

Pith reviewed 2026-05-22 13:50 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML

keywords Wasserstein barycenterOptimal transportSobolev gradient ascentDual formulationConvergence analysisGrid discretizationConstraint-free optimization

0 comments

The pith

A Sobolev gradient ascent computes Wasserstein barycenters without c-concavity projections on dual potentials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a constraint-free concave dual formulation for the Wasserstein barycenter and adapts gradient ascent to the Sobolev geometry on regular grids. This produces a simple algorithm whose iterates converge globally to the exact barycenter at the same rate as Euclidean subgradient descent on nonsmooth convex problems. The central simplification is that the usual c-concavity projection step on Kantorovich dual potentials is no longer required for convergence, which removes a major computational bottleneck present in earlier primal and dual solvers. Numerical tests show faster and more accurate results than existing barycenter methods on grid data.

Core claim

The paper introduces a new constraint-free concave dual formulation for the Wasserstein barycenter. Tailoring vanilla dual gradient ascent to the Sobolev geometry yields a scalable Sobolev gradient ascent algorithm that computes exact barycenters for distributions discretized on a regular grid. Global convergence is proved at the classical subgradient rate, and the c-concavity projection operator on the Kantorovich dual potentials is shown to be unnecessary for guaranteeing convergence.

What carries the argument

The Sobolev gradient ascent (SGA) algorithm, which replaces the c-concavity projection with the Sobolev inner product structure while preserving global convergence of the dual ascent iterates.

If this is right

SGA converges globally to the exact barycenter at the same rate as classical subgradient descent for nonsmooth convex functions.
The algorithm avoids the expensive c-concavity projection required by prior dual methods, yielding simpler code and lower per-iteration cost.
The method applies directly to any collection of measures discretized on the same regular grid.
Empirical results show faster and more accurate barycenter computation than existing optimal-transport solvers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Sobolev replacement for c-concavity may extend to other dual ascent schemes in optimal transport where projection costs dominate.
Similar geometry-driven ascent could simplify computation in related problems such as multimarginal transport or unbalanced barycenters.
Testing the method on non-grid or continuous measures would clarify whether the grid assumption is essential or merely convenient.

Load-bearing premise

Input distributions are discretized over a regular grid and the Sobolev inner product suffices to replace the c-concavity projection while still guaranteeing global convergence.

What would settle it

Run SGA on a regular-grid barycenter instance without any projection step and observe whether the dual potentials fail to produce the correct barycenter or converge slower than the subgradient rate.

read the original abstract

This paper introduces a new constraint-free concave dual formulation for the Wasserstein barycenter. Tailoring the vanilla dual gradient ascent algorithm to the Sobolev geometry, we derive a scalable Sobolev gradient ascent (SGA) algorithm to compute the barycenter for input distributions discretized over a regular grid. Despite the algorithmic simplicity, we provide a global convergence analysis that achieves the same rate as the classical subgradient descent methods for minimizing nonsmooth convex functions in the Euclidean space. A central feature of our SGA algorithm is that the computationally expensive $c$-concavity projection operator enforced on the Kantorovich dual potentials is unnecessary to guarantee convergence, leading to significant algorithmic and theoretical simplifications over all existing primal and dual methods for computing the exact barycenter. Our numerical experiments demonstrate the superior empirical performance of SGA over the existing optimal transport barycenter solvers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's Sobolev gradient ascent for barycenters claims to drop the c-concavity projection while keeping subgradient convergence, but this needs verification that the limit solves the original problem.

read the letter

The colleague should know that the paper introduces a new constraint-free dual formulation for the Wasserstein barycenter problem and uses a tailored Sobolev gradient ascent procedure on regular grids to compute it. The central claim is that this setup allows dropping the expensive c-concavity projection on the Kantorovich potentials without losing the global convergence guarantee at the classical subgradient rate.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a constraint-free concave dual formulation for the Wasserstein barycenter problem. Tailoring gradient ascent to the Sobolev geometry on regular-grid discretizations, it proposes the Sobolev Gradient Ascent (SGA) algorithm and claims global convergence to the exact barycenter at the classical nonsmooth subgradient rate O(1/sqrt(k)), without requiring the c-concavity projection step on Kantorovich potentials that is standard in prior dual methods. Numerical experiments are presented to demonstrate empirical superiority over existing OT barycenter solvers.

Significance. If the central claim holds, the removal of the c-concavity projection while retaining global convergence would constitute a meaningful algorithmic and theoretical simplification for exact barycenter computation on grids. The work would then offer a scalable alternative to both primal and dual OT methods, with the Sobolev structure providing a natural replacement for explicit feasibility enforcement. The empirical results lend support, but the significance hinges on whether the analysis rigorously establishes that the unconstrained Sobolev iterates converge to the true dual optimum.

major comments (2)

[§4] §4 (Global convergence analysis, main theorem on subgradient rate): The proof that SGA attains the O(1/sqrt(k)) rate without c-concavity projection assumes the Sobolev inner product on the regular grid yields ascent directions that remain valid subgradients of the dual functional and that limit points recover the exact barycenter value. However, the discretization of the Sobolev quadratic form does not automatically enforce pointwise c-concavity; an explicit argument is needed showing either that the limit satisfies the c-concavity constraint or that the objective gap to the true Wasserstein barycenter vanishes at the claimed rate.
[Algorithm 1 and discretization section] Algorithm 1 and discretization section: The claim that the Sobolev geometry replaces the projection operator rests on the regular-grid assumption, yet no quantitative bound is given on how grid resolution interacts with the Sobolev metric to control the distance between the discrete dual functional and its continuous counterpart. Without such an error analysis, the global convergence guarantee may apply only to the discrete unconstrained problem rather than the exact continuous barycenter.

minor comments (2)

[Preliminaries] Notation: Define the discrete Sobolev inner product explicitly (including the precise finite-difference or finite-element realization) in the preliminaries so that the gradient computation in SGA is immediately reproducible.
[Numerical experiments] Experiments: Add a table or figure caption entry stating the grid sizes, number of input measures, and exact comparison metrics (e.g., Wasserstein distance to ground-truth barycenter) used for the reported runtime and accuracy plots.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We address each major comment point by point below, providing clarifications and indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: §4 (Global convergence analysis, main theorem on subgradient rate): The proof that SGA attains the O(1/sqrt(k)) rate without c-concavity projection assumes the Sobolev inner product on the regular grid yields ascent directions that remain valid subgradients of the dual functional and that limit points recover the exact barycenter value. However, the discretization of the Sobolev quadratic form does not automatically enforce pointwise c-concavity; an explicit argument is needed showing either that the limit satisfies the c-concavity constraint or that the objective gap to the true Wasserstein barycenter vanishes at the claimed rate.

Authors: We appreciate this comment on the convergence analysis. In Section 4, we prove that the iterates generated by SGA converge to a maximizer of the dual functional at the rate O(1/sqrt(k)) using the properties of the Sobolev inner product, which ensures that the directions are valid ascent directions corresponding to subgradients. Regarding c-concavity, since the dual formulation is unconstrained and concave, any limit point that achieves the optimal dual value corresponds to a c-concave potential by the theory of optimal transport. We will revise the manuscript to include an explicit remark after the main theorem explaining why the limit point satisfies the c-concavity condition implicitly through optimality. revision: yes
Referee: Algorithm 1 and discretization section: The claim that the Sobolev geometry replaces the projection operator rests on the regular-grid assumption, yet no quantitative bound is given on how grid resolution interacts with the Sobolev metric to control the distance between the discrete dual functional and its continuous counterpart. Without such an error analysis, the global convergence guarantee may apply only to the discrete unconstrained problem rather than the exact continuous barycenter.

Authors: Thank you for pointing this out. The manuscript is concerned with the exact computation of the Wasserstein barycenter for measures discretized on a fixed regular grid. The Sobolev gradient ascent is defined directly on this discrete setting, and the convergence analysis applies to the discrete dual problem without needing to approximate the continuous case. We will add a clarification in the discretization section and the introduction to emphasize that the guarantees are for the discrete problem, which is the standard setting for grid-based OT computations. A full error analysis for grid refinement is beyond the current scope but could be an interesting direction for future research. revision: partial

Circularity Check

0 steps flagged

No load-bearing circularity; convergence analysis adapts classical subgradient results to Sobolev geometry without reducing to self-fit or self-citation

full rationale

The provided abstract and context describe a new constraint-free dual formulation for the Wasserstein barycenter, followed by a Sobolev gradient ascent algorithm whose global convergence is analyzed at the classical O(1/sqrt(k)) rate for nonsmooth convex minimization. No equations, derivations, or self-citations are exhibited that define the target barycenter value in terms of the algorithm's output, fit a parameter to a subset and rename it a prediction, or import a uniqueness theorem solely from the authors' prior work. The central simplification (omitting c-concavity projection) is presented as a consequence of the Sobolev inner product on the regular grid, with the rate comparison drawn from external Euclidean subgradient theory. This leaves the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard optimal-transport duality and Sobolev-space geometry; no new free parameters, invented entities, or ad-hoc axioms are visible in the abstract.

axioms (2)

domain assumption The proposed dual formulation for the Wasserstein barycenter is concave and constraint-free.
This is the central modeling choice that removes the need for c-concavity projections.
domain assumption Sobolev gradient steps on the dual potentials converge globally at the classical subgradient rate for nonsmooth convex problems.
The convergence analysis is asserted to match Euclidean subgradient descent without additional regularity assumptions beyond grid discretization.

pith-pipeline@v0.9.0 · 5685 in / 1400 out tokens · 36644 ms · 2026-05-22T13:50:53.387119+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

new constraint-free concave dual formulation ... Sobolev gradient ascent (SGA) algorithm ... c-concavity projection operator ... unnecessary ... global convergence rate that matches ... subgradient descent methods
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.equivNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Algorithm 3 SGA ... ∇fi D = (-Δ)^{-1} δDfi ... convergence rate ... M / sqrt(T)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.