pith. sign in

arxiv: 2509.14521 · v3 · pith:74EBRHWVnew · submitted 2025-09-18 · 📡 eess.SY · cs.SY

Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters

Pith reviewed 2026-05-21 23:03 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords Wasserstein barycenterSinkhorn algorithmdecentralized consensusoptimal transportgossip protocolsevent-triggered communicationquantized messages
0
0 comments X

The pith

A decentralized Sinkhorn algorithm approximates Wasserstein barycenters by turning geometric means into log-domain averages for gossip-based consensus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a fully decentralized Sinkhorn algorithm so that agents on a network can compute an entropic Wasserstein barycenter using only neighbor-to-neighbor exchanges. The key step rewrites the geometric mean of distributions as an arithmetic average of their logarithms, which lets standard gossip protocols replace the usual centralized coordination. Event-triggered messages and b-bit quantization are added to limit communication while still handling asynchrony and lost packets. Under mild conditions the iterates converge to a neighborhood of the true centralized barycenter, and the size of that neighborhood grows linearly with the allowed consensus error, the trigger threshold, and the quantization error. The method therefore gives a practical geometry-aware way to fuse local distributions over sparse, unreliable networks.

Core claim

The centralized Sinkhorn iteration for the entropic Wasserstein barycenter can be decentralized because the geometric mean of the local measures equals the exponential of their arithmetic mean in log space; agents therefore run local Sinkhorn updates interleaved with gossip steps that drive all agents toward consensus on the log-scale quantities, and the resulting sequence stays within a provably bounded distance of the centralized solution.

What carries the argument

Log-domain reformulation of the geometric mean into an arithmetic average that enables gossip consensus on Sinkhorn iterates.

If this is right

  • The approximation error remains linearly bounded by the three tunable communication parameters.
  • Message count grows near-linearly with the number of agents rather than quadratically.
  • The algorithm continues to function under asynchronous updates and occasional packet drops.
  • Accuracy stays close to the centralized result across different network topologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same log-domain trick could be tested on other barycenter problems that involve geometric means.
  • In practice one could tune the trigger threshold and quantization level to meet explicit energy or bandwidth budgets while staying inside a chosen error tolerance.
  • Sensor fusion tasks that already use Wasserstein distances might adopt this gossip version to avoid a single point of failure.

Load-bearing premise

The geometric mean of the input distributions can be rewritten exactly as an arithmetic average after a logarithm is taken.

What would settle it

Compare the output of the decentralized procedure against the true centralized Sinkhorn barycenter while systematically increasing the consensus tolerance, event threshold, or quantization bits and check whether the observed error grows linearly with those parameters.

Figures

Figures reproduced from arXiv: 2509.14521 by Ali Baheri, Alireza Vahid, David Millard.

Figure 1
Figure 1. Figure 1: Convergence traces (log residual) for always-gossip [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Centralized vs decentralized barycenter overlap on [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy vs support size d. [3] Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, 2006. [4] Cedric Villani. ´ Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. [5] Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space. … view at source ↗
read the original abstract

Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a fully decentralized Sinkhorn algorithm for entropic Wasserstein barycenters. It reformulates the centralized geometric mean as an arithmetic average in the log-domain to enable gossip-based local consensus, interleaving these steps with local Sinkhorn updates. Event-triggered communication and b-bit quantization are added for bandwidth efficiency and robustness to asynchrony and packet loss. A convergence result is stated showing that the iterates approach a neighborhood of the centralized solution, with bias linear in the consensus tolerance, trigger threshold, and quantization error; complexity is claimed to scale near-linearly with network size. Simulations are reported to confirm near-centralized accuracy with substantially fewer messages.

Significance. If the reformulation and convergence analysis are valid, the work would provide a practical geometry-aware method for distributed fusion of heterogeneous measures over sparse networks, avoiding the need for a coordinator. The explicit bias bounds in terms of tunable parameters and the communication reductions constitute clear strengths for systems applications. The near-linear scaling claim, if supported, would further strengthen the contribution.

major comments (1)
  1. [Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.
minor comments (2)
  1. [Algorithm description] Clarify the precise discrete support sizes and the exact form of the local normalization used in the decentralized updates; this would help readers assess whether the log-domain arithmetic average holds exactly or only approximately.
  2. [Numerical experiments] The simulation section would benefit from reporting the precise values of consensus tolerance, trigger threshold, and quantization bits b used in each experiment, together with the number of Monte Carlo runs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on our manuscript. We address the major comment point-by-point below, providing clarifications on the reformulation and its relation to the convergence analysis. Where appropriate, we have made revisions to improve clarity without altering the core technical claims.

read point-by-point responses
  1. Referee: [Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.

    Authors: We appreciate this observation on the normalization constant. The manuscript does not claim an exact recasting of the geometric mean; the abstract explicitly states that the log-domain reformulation 'enables approximation through local gossip protocols.' In §3, the centralized geometric mean is expressed via the log-average, but we immediately note that the subsequent normalization to restore unit mass introduces a global additive term. Our gossip-based consensus approximates the log-average locally, while the normalization discrepancy is treated as an additional perturbation. The convergence theorem then bounds the total error (including this term) by a neighborhood whose radius scales linearly with the consensus tolerance, event-trigger threshold, and quantization bits. Thus the global coupling does not invalidate the protocol; it is absorbed into the proven bias bound. In the revised manuscript we have added an explicit paragraph in §3 clarifying that the equivalence is approximate and showing how the normalization error enters the Lyapunov analysis. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via explicit reformulation and parameter-derived bounds

full rationale

The paper's core step is an explicit reformulation of the centralized geometric mean of measures into a log-domain arithmetic average, which then enables gossip-based consensus to approximate Sinkhorn iterations. Convergence is proved to a neighborhood of the centralized entropic barycenter, with the bias bound derived directly from the stated algorithm parameters (consensus tolerance, trigger threshold, quantization error) rather than fitted to outcomes. No load-bearing step reduces by construction to a self-definition, a renamed fit, or a self-citation chain; the assumptions and proof are presented as independent mathematical content within the paper.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The approach depends on standard domain assumptions from distributed consensus and entropic optimal transport rather than new free parameters or invented entities. Tunable thresholds are presented as design choices for accuracy-communication trade-offs.

free parameters (3)
  • consensus tolerance
    Tunable parameter that directly scales the bias in the convergence neighborhood.
  • trigger threshold
    Tunable parameter controlling when messages are sent in the event-triggered scheme.
  • quantization bits b
    Tunable parameter for b-bit quantization that trades communication cost against approximation error.
axioms (2)
  • domain assumption Mild assumptions on network connectivity and communication allow gossip protocols to achieve approximate consensus.
    Invoked to guarantee that local exchanges can mimic centralized Sinkhorn iterations.
  • domain assumption The log-domain reformulation of the geometric mean is valid for the entropic regularized Wasserstein barycenter.
    Central premise that enables the arithmetic-average interpretation used for decentralization.

pith-pipeline@v0.9.0 · 5724 in / 1505 out tokens · 56630 ms · 2026-05-21T23:03:54.933966+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Reza Olfati-Saber, J. A. Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007

  2. [2]

    Communication-efficient learning of deep networks from decentralized data

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. Fig. 5: Accuracy vs support sized

  3. [3]

    Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

    Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

  4. [4]

    Springer, Berlin, 2009

    C ´edric Villani.Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009

  5. [5]

    Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

    Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

  6. [6]

    Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015

    Justin Solomon, Fernando de Goes, Gabriel Peyr ´e, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015. (Proc. SIGGRAPH)

  7. [7]

    Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming

    Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019

  8. [8]

    Computational optimal transport

    Gabriel Peyr ´e and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

  9. [9]

    Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

    Nicolas Papadakis, Edoardo Provenzi, and Vicent Caselles. Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

  10. [10]

    Sinkhorn distances: Lightspeed computation of optimal transport

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), volume 26, 2013

  11. [11]

    Fast decentralized optimal transport via ADMM

    Xiangfeng Ye, Jian Li, Hongyuan Zhang, and Yinyu Wu. Fast decentralized optimal transport via ADMM. InProceedings of the International Conference on Machine Learning (ICML), pages 4024– 4033, 2017

  12. [12]

    Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

    Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

  13. [13]

    Decentralized optimal transport on general graphs

    Alexander Rogozin, Valentina Bochko, Pavel Dvurechensky, and Alexander Gasnikov. Decentralized optimal transport on general graphs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 15960–15971, 2021

  14. [14]

    Federated learning with Wasserstein barycenters

    Matthew Staib, Sashank J Reddi, Satyen Kale, Sanjiv Kumar, and Suvrit Sra. Federated learning with Wasserstein barycenters. In International Conference on Learning Representations (ICLR), 2021

  15. [15]

    Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

    Ruidi Li and Yin Tat Lee. Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

  16. [16]

    Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

    Paulo Tabuada. Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

  17. [17]

    Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

    Tuncer Can Aysal and Kenneth E Barner. Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

  18. [18]

    C., Pereira, L

    Luis Caicedo Torres, Luiz Manella Pereira, and M Hadi Amini. A survey on optimal transport for machine learning: Theory and applications.arXiv preprint arXiv:2106.01963, 2021

  19. [19]

    From word embeddings to document distances

    Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational Conference on Machine Learning, pages 957–966, 2015

  20. [20]

    Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

    Ali Baheri. Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

  21. [21]

    Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

    Ali Baheri. Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

  22. [22]

    WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning

    Ali Baheri, Zahra Sharooei, and Chirayu Salgarkar. WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning. Proceedings of Machine Learning Research vol, 283:1–12, 2025

  23. [23]

    Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

    Ali Baheri. Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

  24. [24]

    Princeton University Press, 2016

    Alfred Galichon.Optimal transport methods in economics. Princeton University Press, 2016

  25. [25]

    Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

    Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyr ´e. Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

  26. [26]

    Fast computation of wasserstein barycenters

    Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. InProceedings of the 31st International Conference on Machine Learning (ICML), volume 32 ofProceedings of Machine Learning Research, pages 685–693, 2014

  27. [27]

    Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

    Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

  28. [28]

    Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

    Nicolas Bonneel, Julien Rabin, Gabriel Peyr ´e, and Hanspeter Pfister. Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

  29. [29]

    Stochastic wasserstein barycenters

    Sebastian Claici, Edward Chien, and Justin Solomon. Stochastic wasserstein barycenters. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 999–1008, 2018

  30. [30]

    Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

    Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

  31. [31]

    Quantized consensus.Automatica, 43(7):1192–1203, 2007

    Anand Kashyap, Tamer Bas ¸ar, and Rayadurgam Srikant. Quantized consensus.Automatica, 43(7):1192–1203, 2007

  32. [32]

    A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

    C ´esar A Uribe, Soomin Lee, Alexander Gasnikov, and Angelia Nedi´c. A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

  33. [33]

    Cambridge University Press, 2012

    Bas Lemmens and Roger Nussbaum.Nonlinear Perron-Frobenius Theory, volume 189. Cambridge University Press, 2012