Provably convergent stochastic fixed-point algorithm for free-support Wasserstein barycenter of continuous non-parametric measures
Pith reviewed 2026-05-19 13:24 UTC · model grok-4.3
The pith
A stochastic fixed-point iteration converges almost surely to the 2-Wasserstein barycenter of continuous non-parametric measures when optimal transport map errors remain controlled.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop an estimator-based stochastic fixed-point framework for approximately computing the 2-Wasserstein barycenter of continuous, non-parametric probability measures. We establish almost sure convergence of the iterates and identify sufficient conditions for geometric rates of convergence under controlled errors in optimal transport map estimation. The framework is realized by a concrete algorithm that employs a modified entropic OT map estimator and applies to input measures satisfying Caffarelli-type regularity conditions.
What carries the argument
Stochastic fixed-point iteration that replaces exact optimal transport maps with a modified entropic estimator whose approximation error is controlled at each step.
If this is right
- The algorithm produces a sequence of measures that converges almost surely to the true barycenter.
- Geometric convergence holds once the per-step OT map error is kept below a paper-derived tolerance.
- The method extends to any collection of continuous measures inside the dense Caffarelli-regular subset of the Wasserstein space.
- Benchmark instances with known approximate barycenters can be generated to test estimation accuracy and sampling quality.
Where Pith is reading between the lines
- The same controlled-error argument may apply to other fixed-point schemes that rely on transport maps, such as those arising in distributionally robust optimization.
- Because the regularity set is dense, the algorithm can serve as a practical default for many real-world continuous measures even when exact regularity is not verified.
- The synthetic benchmark procedure offers a template for constructing test suites that compare multiple barycenter methods on identical, non-trivial instances.
Load-bearing premise
The input measures obey Caffarelli-type regularity conditions so that the entropic optimal transport map estimator has controlled error.
What would settle it
Numerical runs on measures that violate Caffarelli regularity in which the stochastic iterates diverge or fail to approach the known barycenter once the allowed estimation error exceeds the derived threshold.
read the original abstract
We develop an estimator-based stochastic fixed-point framework for approximately computing the 2-Wasserstein barycenter of continuous, non-parametric probability measures. Notably, we provide the first rigorous convergence analysis for implementable estimator-based stochastic extensions of the fixed-point iterative scheme proposed by \'Alvarez-Esteban, del Barrio, Cuesta-Albertos, and Matr\'an (2016). In particular, we establish almost sure convergence, and identify sufficient conditions for geometric rates of convergence under controlled errors in optimal transport (OT) map estimation. We subsequently propose a concrete, provably convergent, and computationally tractable stochastic algorithm that accommodates input measures satisfying Caffarelli-type regularity conditions, which form a dense subset of the Wasserstein space. This algorithm leverages a modified entropic OT map estimator to enable efficient and scalable implementation. To facilitate quantitative evaluation, we further propose a novel and efficient procedure for synthetically generating benchmark instances, in which the input measures exhibit non-trivial features and the corresponding barycenters are approximately known. Numerical experiments on both synthetic and real-world datasets demonstrate the strong computational efficiency, estimation accuracy, and sampling flexibility of our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops an estimator-based stochastic fixed-point framework for computing the 2-Wasserstein barycenter of continuous non-parametric probability measures. It provides the first rigorous almost-sure convergence analysis for implementable stochastic extensions of the 2016 fixed-point scheme of Alvarez-Esteban et al., under controlled errors in optimal transport map estimation, and identifies sufficient conditions for geometric convergence rates. A concrete algorithm is proposed that uses a modified entropic OT map estimator for input measures satisfying Caffarelli-type regularity (a dense subset of the Wasserstein space), together with a synthetic benchmark generation procedure and numerical experiments on synthetic and real data.
Significance. If the central convergence result holds, the work supplies the first provably convergent, computationally tractable stochastic algorithm for free-support Wasserstein barycenters of continuous measures, extending the 2016 deterministic fixed-point iteration to the estimator-based setting while preserving almost-sure convergence. The explicit sufficient conditions for geometric rates and the benchmark-generation procedure are useful additions to the optimal transport and stochastic approximation literature.
major comments (1)
- [Convergence analysis section / main convergence theorem] The almost-sure convergence of the stochastic fixed-point iteration (presumably Theorem 3.1 or the main result in the convergence analysis section) requires that the cumulative OT-map estimation error sequence satisfy a summability condition (e.g., ∑ ε_n < ∞ a.s.) for the invoked stochastic approximation theorem to yield a.s. convergence rather than convergence in probability. The paper invokes Caffarelli regularity to guarantee that the modified entropic OT map estimator has controlled error, yet no explicit quantitative bound is supplied showing that the error (as a function of regularization parameter, sample size, and dimension) decays sufficiently rapidly to meet this summability requirement uniformly over the regularity class. If the error decays only like 1/√n or slower, the a.s. claim may not hold.
minor comments (2)
- [Numerical experiments / benchmark section] The synthetic benchmark generation procedure is described as producing instances with 'approximately known' barycenters; a short paragraph clarifying the validation method (e.g., how the reference barycenter is computed or approximated) would improve reproducibility.
- [Algorithm description] Notation for the modified entropic estimator (regularization parameter, sample sizes) should be introduced consistently in the algorithm description and reused in the error-bound statements.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive comment on the convergence analysis. We address the point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: The almost-sure convergence of the stochastic fixed-point iteration (presumably Theorem 3.1 or the main result in the convergence analysis section) requires that the cumulative OT-map estimation error sequence satisfy a summability condition (e.g., ∑ ε_n < ∞ a.s.) for the invoked stochastic approximation theorem to yield a.s. convergence rather than convergence in probability. The paper invokes Caffarelli regularity to guarantee that the modified entropic OT map estimator has controlled error, yet no explicit quantitative bound is supplied showing that the error (as a function of regularization parameter, sample size, and dimension) decays sufficiently rapidly to meet this summability requirement uniformly over the regularity class. If the error decays only like 1/√n or slower, the a.s. claim may not hold.
Authors: We thank the referee for this precise observation. Theorem 3.1 applies a stochastic approximation result whose almost-sure convergence indeed requires the summability condition ∑ ε_n < ∞ a.s. on the cumulative OT-map estimation errors. The manuscript states that Caffarelli regularity ensures the modified entropic estimator produces controlled errors and that the algorithm selects sequences of regularization parameters and sample sizes to meet the summability requirement. We acknowledge, however, that the current version does not supply explicit quantitative bounds on the estimator error (in terms of regularization, sample size, and dimension) that verify summability uniformly over the regularity class. In the revision we will add a dedicated subsection (and supporting appendix) that derives the convergence rate of the modified entropic OT map estimator under Caffarelli conditions. We will then show that polynomial growth of the per-iteration sample size (e.g., n_k = k^3) combined with a suitable decay of the regularization parameter yields an error sequence satisfying ∑ ε_n < ∞ a.s., thereby confirming the almost-sure convergence claim for the concrete algorithm. revision: yes
Circularity Check
Convergence analysis extends standard stochastic approximation with external 2016 fixed-point reference
full rationale
The derivation chain invokes standard stochastic approximation and fixed-point theory for a.s. convergence under controlled OT-map errors. The 2016 Alvarez-Esteban et al. scheme is treated as external prior work by non-overlapping authors. Caffarelli regularity is invoked only to justify applicability of a modified entropic estimator with bounded error; this is an enabling assumption rather than a self-referential reduction. No equations or steps reduce the claimed convergence to a fitted parameter or self-citation chain. The central result retains independent mathematical content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Caffarelli-type regularity conditions on the input measures
- standard math Existence and uniqueness of the 2-Wasserstein barycenter for the given measures
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish almost sure convergence... under controlled errors in optimal transport (OT) map estimation... Caffarelli-type regularity conditions
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
modified entropic OT map estimator... Sinkhorn’s algorithm
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Agueh and G. Carlier. Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011
work page 2011
-
[2]
J. M. Altschuler and E. Boix-Adser `a. Wasserstein barycenters are NP-hard to compute.SIAM Journal on Mathematics of Data Science, 4(1):179–203, 2022
work page 2022
-
[3]
P. C. ´Alvarez-Esteban, E. del Barrio, J. Cuesta-Albertos, and C. Matr ´an. A fixed-point approach to barycenters in Wasserstein space.Journal of Mathematical Analysis and Applications, 441(2):744–762, 2016
work page 2016
-
[4]
L. Ambrosio, N. Gigli, and G. Savar ´e.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008
work page 2008
-
[5]
J. Backhoff, J. Fontbona, G. Rios, and F. Tobar. Stochastic gradient descent for barycenters in Wasserstein space.J. Appl. Probab., 62(1):15–43, 2025
work page 2025
- [6]
-
[7]
V . I. Bogachev.Measure theory. Vol. I, II. Springer-Verlag, Berlin, 2007
work page 2007
-
[8]
L. A. Caffarelli. A localization property of viscosity solutions to the Monge-Amp `ere equation and their strict convexity.Annals of Mathematics, 131(1):129–134, 1990
work page 1990
-
[9]
L. A. Caffarelli. Some regularity properties of solutions of Monge-Amp`ere equation.Communications on Pure and Applied Mathematics, 44(8-9):965–969, 1991
work page 1991
-
[10]
L. A. Caffarelli. The regularity of mappings with a convex potential.Journal of the American Mathemat- ical Society, 5(1):99–104, 1992
work page 1992
-
[11]
L. A. Caffarelli. Boundary regularity of maps with convex potentials–II.Annals of Mathematics, 144(3): 453–496, 1996
work page 1996
-
[12]
T. Campbell and T. Broderick. Bayesian coreset construction via greedy iterative geodesic ascent. In International Conference on Machine Learning, pages 698–706. PMLR, 2018
work page 2018
-
[13]
G. Carlier, K. Eichinger, and A. Kroshnin. Entropic-Wasserstein barycenters: PDE characterization, reg- ularity, and CLT.SIAM Journal on Mathematical Analysis, 53(5):5880–5914, 2021
work page 2021
-
[14]
G. Carlier, A. Delalande, and Q. M ´erigot. Quantitative stability of barycenters in the Wasserstein space. Probab. Theory Related Fields, 188(3-4):1257–1286, 2024
work page 2024
-
[15]
B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76: 1–32, 2017
work page 2017
- [16]
-
[17]
arXiv preprint arXiv:2407.18163 , volume=
S. Chewi, J. Niles-Weed, and P. Rigollet. Statistical optimal transport.Preprint, arXiv:2407.18163, 2024
-
[18]
L. Chizat. Doubly regularized entropic Wasserstein barycenter.Foundations of Computational Mathe- matics, 2025
work page 2025
- [19]
- [20]
- [21]
- [22]
-
[23]
A. D. D. Craik. Prehistory of Fa `a di Bruno’s formula.The American Mathematical Monthly, 112(2): 119–130, 2005
work page 2005
-
[24]
J. A. Cuesta-Albertos, L. R ¨uschendorf, and A. Tuero-D´ıaz. Optimal coupling of multivariate distributions and stochastic processes.J. Multivariate Anal., 46(2):335–361, 1993
work page 1993
-
[25]
M. Curmei and G. Hall. Shape-constrained regression using sum of squares polynomials.Operations Research, 73(1):543–559, 2023. PROV ABLY CONVERGENT ALGORITHM FOR FREE-SUPPORT W ASSERSTEIN BARYCENTER 65
work page 2023
-
[26]
M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, 2013
work page 2013
-
[27]
M. Cuturi and A. Doucet. Fast computation of Wasserstein barycenters. InInternational conference on machine learning, pages 685–693. PMLR, 2014
work page 2014
-
[28]
M. Cuturi and G. Peyr ´e. Semidual regularized optimal transport.SIAM Review, 60(4):941–965, 2018
work page 2018
- [29]
-
[30]
N. Deb, P. Ghosal, and B. Sen. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. InAdvances in Neural Information Processing Systems, volume 34, pages 29736–29753, 2021
work page 2021
-
[31]
A. L. Dontchev and R. T. Rockafellar.Implicit functions and solution mappings: A view from variational analysis. Springer Monographs in Mathematics. Springer, Dordrecht, 2009
work page 2009
-
[32]
P. Dvurechensky, A. Gasnikov, and A. Kroshnin. Computational optimal transport: Complexity by accel- erated gradient descent is better than by Sinkhorn’s algorithm. In J. Dy and A. Krause, editors,Proceed- ings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 1367–1376. PMLR, 10–15 Jul 2018
work page 2018
-
[33]
L. C. Evans.Partial Differential Equations, volume 19 ofGraduate Studies in Mathematics. American Mathematical Society, 2nd edition, 2010
work page 2010
-
[34]
J. Fan, A. Taghvaei, and Y . Chen. Scalable computations of Wasserstein barycenter via input convex neural networks. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 1571–1581. PMLR, 2021
work page 2021
-
[35]
Feydy.Geometric Data Analysis, Beyond Convolutions
J. Feydy.Geometric Data Analysis, Beyond Convolutions. Phd thesis, Universit ´e Paris-Saclay, France, 2020
work page 2020
- [36]
- [37]
-
[38]
R. Flamary, K. Lounici, and A. Ferrari. Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation.Preprint, arXiv:1905.10155, 2019
-
[39]
R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, et al. POT: Python optimal transport.Journal of Machine Learning Research, 22 (78):1–8, 2021
work page 2021
-
[40]
N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability theory and related fields, 162(3-4):707–738, 2015
work page 2015
-
[41]
J. Franklin and J. Lorenz. On the scaling of multidimensional matrices.Linear Algebra Appl., 114/115: 717–735, 1989
work page 1989
-
[42]
A. Genevay, M. Cuturi, G. Peyr ´e, and F. Bach. Stochastic optimization for large-scale optimal transport. InAdvances in Neural Information Processing Systems, volume 29, 2016
work page 2016
-
[43]
P. Ghosal and M. Nutz. On the convergence rate of Sinkhorn’s algorithm.Mathematics of Operations Research, 2025
work page 2025
-
[44]
P. Ghosal and B. Sen. Multivariate ranks and quantiles using optimal transport: consistency, rates and nonparametric testing.Ann. Statist., 50(2):1012–1037, 2022
work page 2022
-
[45]
N. Gigli. On H ¨older continuity-in-time of the optimal transport map towards measures along a curve. Proceedings of the Edinburgh Mathematical Society, 54(2):401–409, 2011
work page 2011
-
[46]
A. Gonz ´alez-Sanz, L. De Lara, L. B ´ethune, and J.-M. Loubes. GAN estimation of Lipschitz optimal transport maps.Preprint, arXiv:2202.07965, 2022
-
[47]
J.-C. H ¨utter and P. Rigollet. Minimax estimation of smooth optimal transport maps.The Annals of Statistics, 49(2):1166–1194, 2021
work page 2021
- [48]
-
[49]
L. V . Kantorovich. On a problem of Monge.CR (Doklady) Acad. Sci. URSS (NS), 3:225–226, 1948. 66 Z. CHEN, A. NEUFELD, AND Q. XIANG
work page 1948
-
[50]
H. Karcher. Riemannian center of mass and mollifier smoothing.Communications on Pure and Applied Mathematics, 30(5):509–541, 1977
work page 1977
-
[51]
K. Kim, R. Yao, C. Zhu, and X. Chen. Optimal transport barycenter via nonconvex-concave minimax optimization. InProceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 30879–30899. PMLR, 2025
work page 2025
-
[52]
A. Korotin, V . Egiazarian, L. Li, and E. Burnaev. Wasserstein iterative networks for barycenter estimation. InAdvances in Neural Information Processing Systems, volume 35, pages 15672–15686, 2022
work page 2022
- [53]
-
[54]
T. Le Gouic and J.-M. Loubes. Existence and consistency of Wasserstein barycenters.Probab. Theory Related Fields, 168(3-4):901–917, 2017
work page 2017
-
[55]
L. Li, A. Genevay, M. Yurochkin, and J. M. Solomon. Continuous regularized Wasserstein barycenters. InAdvances in Neural Information Processing Systems, volume 33, pages 17755–17765, 2020
work page 2020
-
[56]
A. Makkuva, A. Taghvaei, S. Oh, and J. Lee. Optimal transport mapping via input convex neural networks. InInternational Conference on Machine Learning, pages 6672–6681. PMLR, 2020
work page 2020
- [57]
-
[58]
A. J. McNeil, R. Frey, and P. Embrechts.Quantitative risk management: concepts, techniques and tools- revised edition. Princeton University Press, 2015
work page 2015
-
[59]
S. Mendelson and A. Pajor. On singular values of matrices with independent rows.Bernoulli, 12(5): 761–773, 2006
work page 2006
-
[60]
S. Minsker, S. Srivastava, L. Lin, and D. Dunson. Scalable and robust bayesian inference via the median posterior. InInternational conference on machine learning, pages 1656–1664. PMLR, 2014
work page 2014
- [61]
-
[62]
B. Muzellec, A. Vacher, F. Bach, F.-X. Vialard, and A. Rudi. Near-optimal estimation of smooth transport maps with kernel sums-of-squares.Preprint, arXiv:2112.01907, 2021
-
[63]
Nesterov.Introductory lectures on convex optimization: A basic course, volume 87
Y . Nesterov.Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2004
work page 2004
-
[64]
V . M. Panaretos and Y . Zemel.An invitation to statistics in Wasserstein space. Springer, 2020
work page 2020
-
[65]
F.-P. Paty, A. d’Aspremont, and M. Cuturi. Regularity as regularization: Smooth and strongly convex Brenier potentials in optimal transport. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, volume 108, pages 1222–1232. PMLR, 2020
work page 2020
-
[66]
E. V . Petracou, A. Xepapadeas, and A. N. Yannacopoulos. Decision making under model uncertainty: Fr´echet–Wasserstein mean preferences.Management Science, 68(2):1195–1211, 2022
work page 2022
-
[67]
G. Peyr ´e and M. Cuturi. Computational optimal transport: With applications to data science.Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019
work page 2019
-
[68]
B. T. Polyak. Gradient methods for the minimisation of functionals.USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963
work page 1963
-
[69]
A.-A. Pooladian and J. Niles-Weed. Entropic estimation of optimal transport maps.Preprint, arXiv:2109.12004, 2024
-
[70]
J. Rabin, G. Peyr ´e, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. InScale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29–June 2, 2011, Revised Selected Papers 3, pages 435–446. Springer, 2012
work page 2011
-
[71]
R. T. Rockafellar.Convex Analysis:(PMS-28). Princeton university press, 1970
work page 1970
-
[72]
R. T. Rockafellar and R. J.-B. Wets.Variational analysis, volume 317 ofGrundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1998
work page 1998
-
[73]
Y . Rychener, A. Esteban-P´erez, J. M. Morales, and D. Kuhn. Wasserstein distributionally robust optimiza- tion with heterogeneous data sources.Preprint, arXiv:2407.13582, 2024
-
[74]
F. Santambrogio.Optimal transport for applied mathematicians, volume 87 ofProgress in Nonlinear Differential Equations and their Applications. Birkh¨auser/Springer, Cham, 2015
work page 2015
- [75]
-
[76]
S. Srivastava, V . Cevher, Q. Dinh, and D. Dunson. W ASP: Scalable Bayes via barycenters of subset poste- riors. InProceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, volume 38, pages 912–920. PMLR, 2015
work page 2015
-
[77]
S. Srivastava, C. Li, and D. B. Dunson. Scalable Bayes via barycenter in Wasserstein space.Journal of Machine Learning Research, 19(8):1–35, 2018
work page 2018
-
[78]
B. Tas ¸kesen, S. Shafieezadeh-Abadeh, and D. Kuhn. Semi-discrete optimal transport: hardness, regular- ization and numerical solution.Mathematical Programming, 199:1033–1106, 2023
work page 2023
- [79]
-
[80]
B. Tas ¸kesen, S. Shafieezadeh-Abadeh, D. Kuhn, and K. Natarajan. Discrete optimal transport with inde- pendent marginals is #P-hard.SIAM Journal on Optimization, 33(2):589–614, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.