Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
Pith reviewed 2026-05-21 23:03 UTC · model grok-4.3
The pith
A decentralized Sinkhorn algorithm approximates Wasserstein barycenters by turning geometric means into log-domain averages for gossip-based consensus.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The centralized Sinkhorn iteration for the entropic Wasserstein barycenter can be decentralized because the geometric mean of the local measures equals the exponential of their arithmetic mean in log space; agents therefore run local Sinkhorn updates interleaved with gossip steps that drive all agents toward consensus on the log-scale quantities, and the resulting sequence stays within a provably bounded distance of the centralized solution.
What carries the argument
Log-domain reformulation of the geometric mean into an arithmetic average that enables gossip consensus on Sinkhorn iterates.
If this is right
- The approximation error remains linearly bounded by the three tunable communication parameters.
- Message count grows near-linearly with the number of agents rather than quadratically.
- The algorithm continues to function under asynchronous updates and occasional packet drops.
- Accuracy stays close to the centralized result across different network topologies.
Where Pith is reading between the lines
- The same log-domain trick could be tested on other barycenter problems that involve geometric means.
- In practice one could tune the trigger threshold and quantization level to meet explicit energy or bandwidth budgets while staying inside a chosen error tolerance.
- Sensor fusion tasks that already use Wasserstein distances might adopt this gossip version to avoid a single point of failure.
Load-bearing premise
The geometric mean of the input distributions can be rewritten exactly as an arithmetic average after a logarithm is taken.
What would settle it
Compare the output of the decentralized procedure against the true centralized Sinkhorn barycenter while systematically increasing the consensus tolerance, event threshold, or quantization bits and check whether the observed error grows linearly with those parameters.
Figures
read the original abstract
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a fully decentralized Sinkhorn algorithm for entropic Wasserstein barycenters. It reformulates the centralized geometric mean as an arithmetic average in the log-domain to enable gossip-based local consensus, interleaving these steps with local Sinkhorn updates. Event-triggered communication and b-bit quantization are added for bandwidth efficiency and robustness to asynchrony and packet loss. A convergence result is stated showing that the iterates approach a neighborhood of the centralized solution, with bias linear in the consensus tolerance, trigger threshold, and quantization error; complexity is claimed to scale near-linearly with network size. Simulations are reported to confirm near-centralized accuracy with substantially fewer messages.
Significance. If the reformulation and convergence analysis are valid, the work would provide a practical geometry-aware method for distributed fusion of heterogeneous measures over sparse networks, avoiding the need for a coordinator. The explicit bias bounds in terms of tunable parameters and the communication reductions constitute clear strengths for systems applications. The near-linear scaling claim, if supported, would further strengthen the contribution.
major comments (1)
- [Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.
minor comments (2)
- [Algorithm description] Clarify the precise discrete support sizes and the exact form of the local normalization used in the decentralized updates; this would help readers assess whether the log-domain arithmetic average holds exactly or only approximately.
- [Numerical experiments] The simulation section would benefit from reporting the precise values of consensus tolerance, trigger threshold, and quantization bits b used in each experiment, together with the number of Monte Carlo runs.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive comments on our manuscript. We address the major comment point-by-point below, providing clarifications on the reformulation and its relation to the convergence analysis. Where appropriate, we have made revisions to improve clarity without altering the core technical claims.
read point-by-point responses
-
Referee: [Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.
Authors: We appreciate this observation on the normalization constant. The manuscript does not claim an exact recasting of the geometric mean; the abstract explicitly states that the log-domain reformulation 'enables approximation through local gossip protocols.' In §3, the centralized geometric mean is expressed via the log-average, but we immediately note that the subsequent normalization to restore unit mass introduces a global additive term. Our gossip-based consensus approximates the log-average locally, while the normalization discrepancy is treated as an additional perturbation. The convergence theorem then bounds the total error (including this term) by a neighborhood whose radius scales linearly with the consensus tolerance, event-trigger threshold, and quantization bits. Thus the global coupling does not invalidate the protocol; it is absorbed into the proven bias bound. In the revised manuscript we have added an explicit paragraph in §3 clarifying that the equivalence is approximate and showing how the normalization error enters the Lyapunov analysis. revision: partial
Circularity Check
No significant circularity; derivation self-contained via explicit reformulation and parameter-derived bounds
full rationale
The paper's core step is an explicit reformulation of the centralized geometric mean of measures into a log-domain arithmetic average, which then enables gossip-based consensus to approximate Sinkhorn iterations. Convergence is proved to a neighborhood of the centralized entropic barycenter, with the bias bound derived directly from the stated algorithm parameters (consensus tolerance, trigger threshold, quantization error) rather than fitted to outcomes. No load-bearing step reduces by construction to a self-definition, a renamed fit, or a self-citation chain; the assumptions and proof are presented as independent mathematical content within the paper.
Axiom & Free-Parameter Ledger
free parameters (3)
- consensus tolerance
- trigger threshold
- quantization bits b
axioms (2)
- domain assumption Mild assumptions on network connectivity and communication allow gossip protocols to achieve approximate consensus.
- domain assumption The log-domain reformulation of the geometric mean is valid for the entropic regularized Wasserstein barycenter.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_add unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
v ← (∏ K⊤ui)1/N ; log v = (1/N) ∑ si
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Reza Olfati-Saber, J. A. Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007
work page 2007
-
[2]
Communication-efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. Fig. 5: Accuracy vs support sized
work page 2017
-
[3]
Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006
Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006
work page 2006
-
[4]
C ´edric Villani.Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009
work page 2009
-
[5]
Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011
Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011
work page 2011
-
[6]
Justin Solomon, Fernando de Goes, Gabriel Peyr ´e, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015. (Proc. SIGGRAPH)
work page 2015
-
[7]
Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019
work page 2019
-
[8]
Computational optimal transport
Gabriel Peyr ´e and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019
work page 2019
-
[9]
Nicolas Papadakis, Edoardo Provenzi, and Vicent Caselles. Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014
work page 2014
-
[10]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), volume 26, 2013
work page 2013
-
[11]
Fast decentralized optimal transport via ADMM
Xiangfeng Ye, Jian Li, Hongyuan Zhang, and Yinyu Wu. Fast decentralized optimal transport via ADMM. InProceedings of the International Conference on Machine Learning (ICML), pages 4024– 4033, 2017
work page 2017
-
[12]
Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018
-
[13]
Decentralized optimal transport on general graphs
Alexander Rogozin, Valentina Bochko, Pavel Dvurechensky, and Alexander Gasnikov. Decentralized optimal transport on general graphs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 15960–15971, 2021
work page 2021
-
[14]
Federated learning with Wasserstein barycenters
Matthew Staib, Sashank J Reddi, Satyen Kale, Sanjiv Kumar, and Suvrit Sra. Federated learning with Wasserstein barycenters. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[15]
Ruidi Li and Yin Tat Lee. Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021
-
[16]
Paulo Tabuada. Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007
work page 2007
-
[17]
Tuncer Can Aysal and Kenneth E Barner. Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008
work page 2008
-
[18]
Luis Caicedo Torres, Luiz Manella Pereira, and M Hadi Amini. A survey on optimal transport for machine learning: Theory and applications.arXiv preprint arXiv:2106.01963, 2021
-
[19]
From word embeddings to document distances
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational Conference on Machine Learning, pages 957–966, 2015
work page 2015
-
[20]
Ali Baheri. Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025
-
[21]
Ali Baheri. Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023
-
[22]
WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning
Ali Baheri, Zahra Sharooei, and Chirayu Salgarkar. WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning. Proceedings of Machine Learning Research vol, 283:1–12, 2025
work page 2025
-
[23]
Ali Baheri. Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023
-
[24]
Princeton University Press, 2016
Alfred Galichon.Optimal transport methods in economics. Princeton University Press, 2016
work page 2016
-
[25]
Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyr ´e. Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015
work page 2015
-
[26]
Fast computation of wasserstein barycenters
Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. InProceedings of the 31st International Conference on Machine Learning (ICML), volume 32 ofProceedings of Machine Learning Research, pages 685–693, 2014
work page 2014
-
[27]
Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019
work page 2019
-
[28]
Nicolas Bonneel, Julien Rabin, Gabriel Peyr ´e, and Hanspeter Pfister. Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015
work page 2015
-
[29]
Stochastic wasserstein barycenters
Sebastian Claici, Edward Chien, and Justin Solomon. Stochastic wasserstein barycenters. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 999–1008, 2018
work page 2018
-
[30]
Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004
Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004
work page 2004
-
[31]
Quantized consensus.Automatica, 43(7):1192–1203, 2007
Anand Kashyap, Tamer Bas ¸ar, and Rayadurgam Srikant. Quantized consensus.Automatica, 43(7):1192–1203, 2007
work page 2007
-
[32]
C ´esar A Uribe, Soomin Lee, Alexander Gasnikov, and Angelia Nedi´c. A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018
work page 2018
-
[33]
Cambridge University Press, 2012
Bas Lemmens and Roger Nussbaum.Nonlinear Perron-Frobenius Theory, volume 189. Cambridge University Press, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.