Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss
Pith reviewed 2026-05-25 11:44 UTC · model grok-4.3
The pith
A PyTorch implementation computes entropy-regularized Wasserstein loss via batched Sinkhorn iterations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The report reviews the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and documents a practical implementation in PyTorch.
What carries the argument
Batched Sinkhorn iterations that solve the entropy-regularized optimal transport problem between pairs of distributions.
If this is right
- Multiple sample pairs can be processed in a single forward pass, reducing overhead in training loops.
- The loss becomes available as a drop-in component inside existing PyTorch models for tasks such as generative modeling.
- Users obtain a concrete reference point for verifying custom re-implementations of the same regularized distance.
- The code supports direct experimentation with the entropy regularization parameter on real data.
Where Pith is reading between the lines
- The notebook could serve as a starting template for porting the same algorithm to other automatic-differentiation frameworks.
- Once integrated, the loss enables direct comparisons of transport-based objectives against standard divergence measures on the same datasets.
- The implementation invites tests on whether the batched version preserves the same convergence behavior as the scalar version for large batch sizes.
Load-bearing premise
The original Sinkhorn iterations translate directly into stable batched PyTorch code without further numerical safeguards.
What would settle it
Executing the notebook on standard uniform distributions and checking whether the returned loss values match those from an independent reference implementation of the same algorithm.
read the original abstract
In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reviews the entropy-regularized Wasserstein loss introduced by Cuturi and documents a practical PyTorch implementation of batched Sinkhorn iterations, with code provided in an accompanying notebook.
Significance. A correct and numerically stable batched implementation would be useful for PyTorch users applying optimal transport losses in machine learning, as it directly translates a known algorithm into a common framework. The public code link is a strength for reproducibility.
major comments (1)
- [Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).
Simulated Author's Rebuttal
We thank the referee for their review and for highlighting an important aspect of numerical stability in the Sinkhorn algorithm. We address the single major comment below.
read point-by-point responses
-
Referee: [Implementation section / notebook] The implementation description provides no indication of log-domain stabilization (e.g., log-sum-exp) or other guards against overflow/underflow in the u/v scaling vector updates. This is load-bearing for the central claim of a practical implementation, as standard primal Sinkhorn iterations are known to be unstable for small epsilon or large dynamic range in the cost matrix (see Cuturi 2013, §3).
Authors: We agree that the absence of any discussion of numerical stabilization weakens the manuscript's claim of documenting a 'practical' implementation. The notebook code performs the standard primal updates in the linear domain without explicit log-sum-exp or other guards, which can indeed lead to overflow for small epsilon. We will revise the manuscript to add a short subsection on numerical considerations (including the known limitations of the provided code and references to stabilized variants) and will update the notebook with an optional log-domain path. This constitutes a major revision to the text and code. revision: yes
Circularity Check
No circularity: direct implementation of external prior work (Cuturi)
full rationale
The paper is an implementation report that reviews the entropy-regularized Wasserstein loss from Cuturi (external citation) and provides PyTorch code for batched Sinkhorn iterations. No new derivations, fitted parameters, self-citations, or ansatzes are introduced. The central content is a translation of published prior work into code, with no load-bearing steps that reduce to the paper's own inputs by construction. This matches the default non-circular case for implementation notes.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Arjovsky et al., Wasserstein GAN, arXiv 1701.07875
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013
M. Cuturi, Sinkhorn Distances: Lightspeed Computation of Optimal Transport, NIPS 2013
work page 2013
-
[3]
D. Daza, Approximating Wasserstein distances with PyTorch, blog entry at https://dfdazac.github.io/sinkhorn.html, 2019
work page 2019
-
[4]
Computational optima l transport
G. Peyré and M. Cuturi, Computational Optimal Transport, arXiv 1803.00567 (v3)
-
[5]
J. Franklin and J. Lorenz, On the Scaling of Multidimensional Matrices, Linear algebra and its applications, 114/115 (1989)
work page 1989
-
[6]
Frogner et al., Learning with a Wasserstein Loss, NIPS 2015
C. Frogner et al., Learning with a Wasserstein Loss, NIPS 2015
work page 2015
-
[7]
Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017
I. Gulrajani et al., Improved Training of Wasserstein GANs, NIPS 2017
work page 2017
-
[8]
G. Luise et al., Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance, NeurIPS 2018
work page 2018
-
[9]
Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018
T. Miyato et al., Spectral Normalization for Generative Adversarial Networks, ICLR 2018. 5
work page 2018
-
[10]
Y. Rubner et al., The Earth Mover’s Distance, MultiDimensional Scaling, and Color-Based Image Retrieval, Proceedings of the ARPA Image Understanding Wor kshop, 1997
work page 1997
-
[11]
Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems
B. Schmitzer, Stabilized sparse scaling algorithms for entropy regularized transport problems, arXiv 1610.06519
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
T. Viehmann, Batch Sinkhorn Iteration Wasserstein Distance, PyTorch code and notebook, 2017, https://github.com/t-vi/pytorch-tvmisc/blob/ae4d945 97751f98d4a0d7b10188dd02c13a0c6fd/wasserstein-distance/Pytorch_Wasserstein.ipynb
work page 2017
-
[13]
Villani, Optimal Transport - Old and New, Springer, 2009
C. Villani, Optimal Transport - Old and New, Springer, 2009. 6
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.