Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters

Ali Baheri; Alireza Vahid; David Millard

arxiv: 2509.14521 · v3 · pith:74EBRHWVnew · submitted 2025-09-18 · 📡 eess.SY · cs.SY

Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters

Ali Baheri , David Millard , Alireza Vahid This is my paper

Pith reviewed 2026-05-21 23:03 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords Wasserstein barycenterSinkhorn algorithmdecentralized consensusoptimal transportgossip protocolsevent-triggered communicationquantized messages

0 comments

The pith

A decentralized Sinkhorn algorithm approximates Wasserstein barycenters by turning geometric means into log-domain averages for gossip-based consensus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a fully decentralized Sinkhorn algorithm so that agents on a network can compute an entropic Wasserstein barycenter using only neighbor-to-neighbor exchanges. The key step rewrites the geometric mean of distributions as an arithmetic average of their logarithms, which lets standard gossip protocols replace the usual centralized coordination. Event-triggered messages and b-bit quantization are added to limit communication while still handling asynchrony and lost packets. Under mild conditions the iterates converge to a neighborhood of the true centralized barycenter, and the size of that neighborhood grows linearly with the allowed consensus error, the trigger threshold, and the quantization error. The method therefore gives a practical geometry-aware way to fuse local distributions over sparse, unreliable networks.

Core claim

The centralized Sinkhorn iteration for the entropic Wasserstein barycenter can be decentralized because the geometric mean of the local measures equals the exponential of their arithmetic mean in log space; agents therefore run local Sinkhorn updates interleaved with gossip steps that drive all agents toward consensus on the log-scale quantities, and the resulting sequence stays within a provably bounded distance of the centralized solution.

What carries the argument

Log-domain reformulation of the geometric mean into an arithmetic average that enables gossip consensus on Sinkhorn iterates.

If this is right

The approximation error remains linearly bounded by the three tunable communication parameters.
Message count grows near-linearly with the number of agents rather than quadratically.
The algorithm continues to function under asynchronous updates and occasional packet drops.
Accuracy stays close to the centralized result across different network topologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same log-domain trick could be tested on other barycenter problems that involve geometric means.
In practice one could tune the trigger threshold and quantization level to meet explicit energy or bandwidth budgets while staying inside a chosen error tolerance.
Sensor fusion tasks that already use Wasserstein distances might adopt this gossip version to avoid a single point of failure.

Load-bearing premise

The geometric mean of the input distributions can be rewritten exactly as an arithmetic average after a logarithm is taken.

What would settle it

Compare the output of the decentralized procedure against the true centralized Sinkhorn barycenter while systematically increasing the consensus tolerance, event threshold, or quantization bits and check whether the observed error grows linearly with those parameters.

Figures

Figures reproduced from arXiv: 2509.14521 by Ali Baheri, Alireza Vahid, David Millard.

**Figure 2.** Figure 2: Centralized vs decentralized barycenter overlap on [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 5.** Figure 5: Accuracy vs support size d. [3] Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, 2006. [4] Cedric Villani. ´ Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. [5] Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space. … view at source ↗

read the original abstract

Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a decentralized Sinkhorn method for entropic Wasserstein barycenters by recasting geometric means as log-domain averages handled via gossip, with event-triggering and quantization added for efficiency, but the normalization steps in Sinkhorn likely complicate the exact reformulation.

read the letter

The main takeaway is that this work develops a fully decentralized version of the Sinkhorn algorithm for computing entropic Wasserstein barycenters. It turns the centralized geometric mean into an arithmetic average after taking logs so that local gossip protocols can approximate the needed consensus without a coordinator, then folds in event-triggered b-bit quantized messages to cut communication while handling asynchrony and packet loss. They also give a convergence result to a neighborhood of the true barycenter whose bias scales linearly with the consensus tolerance, trigger threshold, and quantization error, plus simulations that show near-centralized accuracy with far fewer messages on different network topologies.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a fully decentralized Sinkhorn algorithm for entropic Wasserstein barycenters. It reformulates the centralized geometric mean as an arithmetic average in the log-domain to enable gossip-based local consensus, interleaving these steps with local Sinkhorn updates. Event-triggered communication and b-bit quantization are added for bandwidth efficiency and robustness to asynchrony and packet loss. A convergence result is stated showing that the iterates approach a neighborhood of the centralized solution, with bias linear in the consensus tolerance, trigger threshold, and quantization error; complexity is claimed to scale near-linearly with network size. Simulations are reported to confirm near-centralized accuracy with substantially fewer messages.

Significance. If the reformulation and convergence analysis are valid, the work would provide a practical geometry-aware method for distributed fusion of heterogeneous measures over sparse networks, avoiding the need for a coordinator. The explicit bias bounds in terms of tunable parameters and the communication reductions constitute clear strengths for systems applications. The near-linear scaling claim, if supported, would further strengthen the contribution.

major comments (1)

[Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.

minor comments (2)

[Algorithm description] Clarify the precise discrete support sizes and the exact form of the local normalization used in the decentralized updates; this would help readers assess whether the log-domain arithmetic average holds exactly or only approximately.
[Numerical experiments] The simulation section would benefit from reporting the precise values of consensus tolerance, trigger threshold, and quantization bits b used in each experiment, together with the number of Monte Carlo runs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on our manuscript. We address the major comment point-by-point below, providing clarifications on the reformulation and its relation to the convergence analysis. Where appropriate, we have made revisions to improve clarity without altering the core technical claims.

read point-by-point responses

Referee: [Abstract and §3 (reformulation and proof sketch)] Abstract and the central reformulation (weakest assumption noted in the reader report): the claim that the geometric mean of measures can be exactly recast as an arithmetic mean after taking logs, allowing purely local gossip to approximate the centralized Sinkhorn iteration, is placed in tension by the normalization step. Each Sinkhorn update produces a measure that must integrate to one; the resulting global normalizing constant c yields an additive log c term that cannot be recovered from neighbor averages alone. This global coupling appears to prevent the exact equivalence required for the gossip protocol to mimic the centralized geometric mean without additional coordination or correction steps.

Authors: We appreciate this observation on the normalization constant. The manuscript does not claim an exact recasting of the geometric mean; the abstract explicitly states that the log-domain reformulation 'enables approximation through local gossip protocols.' In §3, the centralized geometric mean is expressed via the log-average, but we immediately note that the subsequent normalization to restore unit mass introduces a global additive term. Our gossip-based consensus approximates the log-average locally, while the normalization discrepancy is treated as an additional perturbation. The convergence theorem then bounds the total error (including this term) by a neighborhood whose radius scales linearly with the consensus tolerance, event-trigger threshold, and quantization bits. Thus the global coupling does not invalidate the protocol; it is absorbed into the proven bias bound. In the revised manuscript we have added an explicit paragraph in §3 clarifying that the equivalence is approximate and showing how the normalization error enters the Lyapunov analysis. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via explicit reformulation and parameter-derived bounds

full rationale

The paper's core step is an explicit reformulation of the centralized geometric mean of measures into a log-domain arithmetic average, which then enables gossip-based consensus to approximate Sinkhorn iterations. Convergence is proved to a neighborhood of the centralized entropic barycenter, with the bias bound derived directly from the stated algorithm parameters (consensus tolerance, trigger threshold, quantization error) rather than fitted to outcomes. No load-bearing step reduces by construction to a self-definition, a renamed fit, or a self-citation chain; the assumptions and proof are presented as independent mathematical content within the paper.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The approach depends on standard domain assumptions from distributed consensus and entropic optimal transport rather than new free parameters or invented entities. Tunable thresholds are presented as design choices for accuracy-communication trade-offs.

free parameters (3)

consensus tolerance
Tunable parameter that directly scales the bias in the convergence neighborhood.
trigger threshold
Tunable parameter controlling when messages are sent in the event-triggered scheme.
quantization bits b
Tunable parameter for b-bit quantization that trades communication cost against approximation error.

axioms (2)

domain assumption Mild assumptions on network connectivity and communication allow gossip protocols to achieve approximate consensus.
Invoked to guarantee that local exchanges can mimic centralized Sinkhorn iterations.
domain assumption The log-domain reformulation of the geometric mean is valid for the entropic regularized Wasserstein barycenter.
Central premise that enables the arithmetic-average interpretation used for decentralization.

pith-pipeline@v0.9.0 · 5724 in / 1505 out tokens · 56630 ms · 2026-05-21T23:03:54.933966+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_add unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

v ← (∏ K⊤ui)1/N ; log v = (1/N) ∑ si

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Reza Olfati-Saber, J. A. Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007

work page 2007
[2]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. Fig. 5: Accuracy vs support sized

work page 2017
[3]

Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

work page 2006
[4]

Springer, Berlin, 2009

C ´edric Villani.Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009

work page 2009
[5]

Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

work page 2011
[6]

Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015

Justin Solomon, Fernando de Goes, Gabriel Peyr ´e, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015. (Proc. SIGGRAPH)

work page 2015
[7]

Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming

Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019

work page 2019
[8]

Computational optimal transport

Gabriel Peyr ´e and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

work page 2019
[9]

Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

Nicolas Papadakis, Edoardo Provenzi, and Vicent Caselles. Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

work page 2014
[10]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), volume 26, 2013

work page 2013
[11]

Fast decentralized optimal transport via ADMM

Xiangfeng Ye, Jian Li, Hongyuan Zhang, and Yinyu Wu. Fast decentralized optimal transport via ADMM. InProceedings of the International Conference on Machine Learning (ICML), pages 4024– 4033, 2017

work page 2017
[12]

Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

work page arXiv 2018
[13]

Decentralized optimal transport on general graphs

Alexander Rogozin, Valentina Bochko, Pavel Dvurechensky, and Alexander Gasnikov. Decentralized optimal transport on general graphs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 15960–15971, 2021

work page 2021
[14]

Federated learning with Wasserstein barycenters

Matthew Staib, Sashank J Reddi, Satyen Kale, Sanjiv Kumar, and Suvrit Sra. Federated learning with Wasserstein barycenters. In International Conference on Learning Representations (ICLR), 2021

work page 2021
[15]

Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

Ruidi Li and Yin Tat Lee. Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

work page arXiv 2021
[16]

Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

Paulo Tabuada. Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

work page 2007
[17]

Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

Tuncer Can Aysal and Kenneth E Barner. Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

work page 2008
[18]

C., Pereira, L

Luis Caicedo Torres, Luiz Manella Pereira, and M Hadi Amini. A survey on optimal transport for machine learning: Theory and applications.arXiv preprint arXiv:2106.01963, 2021

work page arXiv 2021
[19]

From word embeddings to document distances

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational Conference on Machine Learning, pages 957–966, 2015

work page 2015
[20]

Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

Ali Baheri. Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

work page arXiv 2025
[21]

Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

Ali Baheri. Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

work page arXiv 2023
[22]

WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning

Ali Baheri, Zahra Sharooei, and Chirayu Salgarkar. WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning. Proceedings of Machine Learning Research vol, 283:1–12, 2025

work page 2025
[23]

Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

Ali Baheri. Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

work page arXiv 2023
[24]

Princeton University Press, 2016

Alfred Galichon.Optimal transport methods in economics. Princeton University Press, 2016

work page 2016
[25]

Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyr ´e. Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

work page 2015
[26]

Fast computation of wasserstein barycenters

Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. InProceedings of the 31st International Conference on Machine Learning (ICML), volume 32 ofProceedings of Machine Learning Research, pages 685–693, 2014

work page 2014
[27]

Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

work page 2019
[28]

Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

Nicolas Bonneel, Julien Rabin, Gabriel Peyr ´e, and Hanspeter Pfister. Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

work page 2015
[29]

Stochastic wasserstein barycenters

Sebastian Claici, Edward Chien, and Justin Solomon. Stochastic wasserstein barycenters. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 999–1008, 2018

work page 2018
[30]

Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

work page 2004
[31]

Quantized consensus.Automatica, 43(7):1192–1203, 2007

Anand Kashyap, Tamer Bas ¸ar, and Rayadurgam Srikant. Quantized consensus.Automatica, 43(7):1192–1203, 2007

work page 2007
[32]

A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

C ´esar A Uribe, Soomin Lee, Alexander Gasnikov, and Angelia Nedi´c. A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

work page 2018
[33]

Cambridge University Press, 2012

Bas Lemmens and Roger Nussbaum.Nonlinear Perron-Frobenius Theory, volume 189. Cambridge University Press, 2012

work page 2012

[1] [1]

Reza Olfati-Saber, J. A. Fax, and Richard M. Murray. Consensus and cooperation in networked multi-agent systems.Proceedings of the IEEE, 95(1):215–233, 2007

work page 2007

[2] [2]

Communication-efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. Fig. 5: Accuracy vs support sized

work page 2017

[3] [3]

Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algorithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

work page 2006

[4] [4]

Springer, Berlin, 2009

C ´edric Villani.Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009

work page 2009

[5] [5]

Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

work page 2011

[6] [6]

Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015

Justin Solomon, Fernando de Goes, Gabriel Peyr ´e, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains.ACM Transactions on Graphics, 34(4):66:1– 66:11, 2015. (Proc. SIGGRAPH)

work page 2015

[7] [7]

Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming

Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019

work page 2019

[8] [8]

Computational optimal transport

Gabriel Peyr ´e and Marco Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

work page 2019

[9] [9]

Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

Nicolas Papadakis, Edoardo Provenzi, and Vicent Caselles. Optimal transport for diffeomorphic registration.International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–219, 2014

work page 2014

[10] [10]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems (NeurIPS), volume 26, 2013

work page 2013

[11] [11]

Fast decentralized optimal transport via ADMM

Xiangfeng Ye, Jian Li, Hongyuan Zhang, and Yinyu Wu. Fast decentralized optimal transport via ADMM. InProceedings of the International Conference on Machine Learning (ICML), pages 4024– 4033, 2017

work page 2017

[12] [12]

Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Decentralized distributed optimization with accelerated ADMM for Wasserstein barycenters.arXiv preprint arXiv:1806.07396, 2018

work page arXiv 2018

[13] [13]

Decentralized optimal transport on general graphs

Alexander Rogozin, Valentina Bochko, Pavel Dvurechensky, and Alexander Gasnikov. Decentralized optimal transport on general graphs. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, pages 15960–15971, 2021

work page 2021

[14] [14]

Federated learning with Wasserstein barycenters

Matthew Staib, Sashank J Reddi, Satyen Kale, Sanjiv Kumar, and Suvrit Sra. Federated learning with Wasserstein barycenters. In International Conference on Learning Representations (ICLR), 2021

work page 2021

[15] [15]

Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

Ruidi Li and Yin Tat Lee. Federated optimization of Wasserstein distributionally robust models.arXiv preprint arXiv:2102.05628, 2021

work page arXiv 2021

[16] [16]

Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

Paulo Tabuada. Event-triggered real-time scheduling of stabilizing control tasks.IEEE Transactions on Automatic Control, 52(9):1680– 1685, 2007

work page 2007

[17] [17]

Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

Tuncer Can Aysal and Kenneth E Barner. Compressive wireless sensing.42nd Annual Conference on Information Sciences and Systems, pages 134–139, 2008

work page 2008

[18] [18]

C., Pereira, L

Luis Caicedo Torres, Luiz Manella Pereira, and M Hadi Amini. A survey on optimal transport for machine learning: Theory and applications.arXiv preprint arXiv:2106.01963, 2021

work page arXiv 2021

[19] [19]

From word embeddings to document distances

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational Conference on Machine Learning, pages 957–966, 2015

work page 2015

[20] [20]

Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

Ali Baheri. Wasserstein-barycenter consensus for cooperative multi- agent reinforcement learning.arXiv preprint arXiv:2506.12497, 2025

work page arXiv 2025

[21] [21]

Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

Ali Baheri. Understanding reward ambiguity through optimal transport theory in inverse reinforcement learning.arXiv preprint arXiv:2310.12055, 2023

work page arXiv 2023

[22] [22]

WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning

Ali Baheri, Zahra Sharooei, and Chirayu Salgarkar. WA VE: Wasser- stein adaptive value estimation for actor-critic reinforcement learning. Proceedings of Machine Learning Research vol, 283:1–12, 2025

work page 2025

[23] [23]

Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

Ali Baheri. Risk-aware reinforcement learning through optimal trans- port theory.arXiv preprint arXiv:2309.06239, 2023

work page arXiv 2023

[24] [24]

Princeton University Press, 2016

Alfred Galichon.Optimal transport methods in economics. Princeton University Press, 2016

work page 2016

[25] [25]

Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyr ´e. Iterative bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

work page 2015

[26] [26]

Fast computation of wasserstein barycenters

Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. InProceedings of the 31st International Conference on Machine Learning (ICML), volume 32 ofProceedings of Machine Learning Research, pages 685–693, 2014

work page 2014

[27] [27]

Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

Bernhard Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019

work page 2019

[28] [28]

Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

Nicolas Bonneel, Julien Rabin, Gabriel Peyr ´e, and Hanspeter Pfister. Sliced and radon wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

work page 2015

[29] [29]

Stochastic wasserstein barycenters

Sebastian Claici, Edward Chien, and Justin Solomon. Stochastic wasserstein barycenters. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 999–1008, 2018

work page 2018

[30] [30]

Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

work page 2004

[31] [31]

Quantized consensus.Automatica, 43(7):1192–1203, 2007

Anand Kashyap, Tamer Bas ¸ar, and Rayadurgam Srikant. Quantized consensus.Automatica, 43(7):1192–1203, 2007

work page 2007

[32] [32]

A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

C ´esar A Uribe, Soomin Lee, Alexander Gasnikov, and Angelia Nedi´c. A dual approach for optimal algorithms in distributed opti- mization over networks.IEEE Transactions on Automatic Control, 63(10):3265–3280, 2018

work page 2018

[33] [33]

Cambridge University Press, 2012

Bas Lemmens and Roger Nussbaum.Nonlinear Perron-Frobenius Theory, volume 189. Cambridge University Press, 2012

work page 2012