Sliced-Regularized Optimal Transport

Khai Nguyen

arxiv: 2604.23944 · v3 · pith:WQSSGNYSnew · submitted 2026-04-27 · 📊 stat.ML · cs.LG

Sliced-Regularized Optimal Transport

Khai Nguyen This is my paper

Pith reviewed 2026-05-21 08:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords optimal transportsliced optimal transportregularizationSinkhorn algorithmtransport plancolor transfergradient flowsdivergence

0 comments

The pith

SROT regularizes optimal transport toward a sliced OT plan rather than an independent coupling to approximate the exact plan more closely than entropic OT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces sliced-regularized optimal transport (SROT) as a new regularized OT formulation. Instead of the independent coupling used in entropic OT, SROT regularizes the transport plan toward a smoothened sliced OT plan. This produces plans that are closer to the exact OT solution at the same regularization level and that also improve upon the sliced reference itself. The work supplies a dual formulation, a Sinkhorn-style algorithm that preserves scalability, and the induced SROT divergence along with its properties. Experiments on synthetic data, color transfer, and gradient flows confirm that SROT outperforms both entropic OT and plain sliced OT in approximating exact transport.

Core claim

SROT is obtained by adding a regularization term that measures divergence from a smoothened sliced OT plan to the classical OT objective. The formulation admits an explicit dual problem and is solved by a Sinkhorn-style iteration that retains the linear scaling of entropic OT. Under identical regularization strength the resulting plan lies closer to the exact OT plan than the entropic solution, and the plan also refines the sliced reference. The induced SROT divergence is shown to be a valid divergence with favorable topological and computational properties.

What carries the argument

The SROT objective that penalizes deviation of the transport plan from a smoothened sliced OT plan instead of from the independent coupling.

If this is right

SROT can replace entropic OT in applications that need transport plans closer to the exact optimum while keeping the same computational cost.
The SROT divergence supplies a new way to compare probability measures that inherits scalability from sliced OT yet improves upon it.
Gradient flows driven by the SROT divergence can be expected to follow trajectories closer to those of the unregularized Wasserstein gradient flow.
Color-transfer tasks obtain mappings that better preserve the geometry of the source and target distributions than either entropic OT or plain sliced OT.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The post-Bayesian view of SROT may allow principled incorporation of other structured priors beyond sliced OT.
Because the method improves upon its own reference, iterative refinement that feeds the SROT plan back as the new sliced prior could be explored.
The topological properties of the SROT divergence suggest it could metrize weak convergence on compact domains with explicit rates.

Load-bearing premise

Regularizing the transport plan toward a smoothened sliced OT plan produces a strictly better reference than the independent coupling and does not introduce new biases that cancel the accuracy gains.

What would settle it

On any small discrete problem where the exact OT plan can be computed by linear programming, measure the Frobenius distance of the SROT plan to the exact plan and compare it to the distance achieved by entropic OT at the same regularization parameter.

Figures

Figures reproduced from arXiv: 2604.23944 by Khai Nguyen.

**Figure 1.** Figure 1: (a) An example of SOT plan (adapting from [ view at source ↗

**Figure 2.** Figure 2: Visualization of transportation plans from OT, SOT, EOT, and SROT for synthetic view at source ↗

**Figure 3.** Figure 3: Ablation studies of varying the regularization strengths ( view at source ↗

**Figure 4.** Figure 4: Color transfer results of OT, SOT, EOT, and SROT. 4.2 Color Transfer Color transfer is formulated as an OT problem by representing each image as a weighted point cloud in the normalized RGB space, [0, 1]3 . Each image is discretized into K = 256 colors via median-cut quantization without dithering, yielding palette centroids (atoms) and normalized bin frequencies (weights). We compare three couplings: (i) … view at source ↗

**Figure 5.** Figure 5: Gradient flows of Sinkhorn divergence and SR divergence with Wasserstein distance as netural evaluation metric. flow converge faster in the sense of Wasserstein distance, 5 Conclusion We propose sliced-regularized optimal transport (SROT), a framework that leverages a smoothened SOT plan as an informative prior to improve regularized OT. SROT retains the computational efficiency of EOT via a Sinkhorn-style… view at source ↗

**Figure 6.** Figure 6: Ablation study of varying the number of projections view at source ↗

**Figure 7.** Figure 7: Computational speed measurement when varying the number of projections view at source ↗

**Figure 8.** Figure 8: Gradient flows of Sinkhorn divergence and SR divergence with Wasserstein distance as netural evaluation metric. two-rings case. This may be due to the greater smoothness of the corresponding SOT plan. For both uniform and softmin initializations, SROT improves as L increases, even when the initial plans do not. Overall, we recommend uniform SOT as a default choice, while noting that it may not be optimal a… view at source ↗

read the original abstract

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes sliced-regularized optimal transport (SROT), which regularizes the transport plan toward a smoothened sliced OT (SOT) plan rather than the independent coupling used in entropic OT (EOT). It provides a formal definition, derives the dual formulation with a post-Bayesian interpretation, develops a Sinkhorn-style algorithm, introduces the induced SROT divergence and analyzes its properties, and validates the method on synthetic datasets, color transfer tasks, and gradient flows, claiming that SROT approximates the exact OT plan more accurately than EOT at the same regularization level and improves upon the reference SOT plan.

Significance. If the accuracy improvements are confirmed under controlled comparisons, SROT could provide a practical, scalable regularized OT variant that leverages the computational advantages of sliced OT while achieving higher fidelity to the unregularized optimum. The dual derivation, Sinkhorn-style solver, and topological analysis of the induced divergence are positive elements that support potential adoption in ML applications involving transport plans.

major comments (2)

[Abstract and §6] Abstract and §6 (Experiments): The central claim that SROT yields more accurate approximations of the exact OT plan than EOT 'under the same level of regularization' requires explicit verification that effective regularization strength is matched. Because the SOT reference is already an approximation to OT, the same nominal ε may induce different KL-to-reference values or marginal violations than in EOT; the manuscript should report these quantities (or equivalent bias measures) for both methods across the tested ε values to substantiate that reported gains are not due to weaker effective regularization in SROT.
[§3] §3 (Formal definition): The smoothing operation applied to the SOT plan to obtain the reference distribution is not fully specified with respect to additional hyperparameters or their impact on the overall regularization; if any smoothing bandwidth is chosen separately from ε, this should be stated explicitly and its sensitivity analyzed, as it could affect the claimed parameter-free character relative to EOT.

minor comments (3)

[§3] Notation for the smoothed SOT reference and the SROT plan should be introduced with a clear table or diagram in the definition section to avoid ambiguity when comparing to EOT.
[§6] In the color transfer and gradient flow experiments, include quantitative tables with standard deviations over multiple runs rather than relying solely on qualitative visuals.
[§4] The dual derivation in §4 would benefit from an explicit statement of the optimality conditions linking the dual variables to the transport plan.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §6] Abstract and §6 (Experiments): The central claim that SROT yields more accurate approximations of the exact OT plan than EOT 'under the same level of regularization' requires explicit verification that effective regularization strength is matched. Because the SOT reference is already an approximation to OT, the same nominal ε may induce different KL-to-reference values or marginal violations than in EOT; the manuscript should report these quantities (or equivalent bias measures) for both methods across the tested ε values to substantiate that reported gains are not due to weaker effective regularization in SROT.

Authors: We agree that comparing at the same nominal value of the regularization parameter ε does not automatically guarantee matched effective regularization strength, since the references differ (independent coupling versus smoothed SOT plan). In the revised manuscript we will augment §6 with explicit reporting of the KL divergence to the respective reference and the marginal constraint violation errors for both SROT and EOT, evaluated across the full range of tested ε values. These additional bias measures will allow readers to verify that the reported accuracy gains are not attributable to weaker effective regularization in SROT. revision: yes
Referee: [§3] §3 (Formal definition): The smoothing operation applied to the SOT plan to obtain the reference distribution is not fully specified with respect to additional hyperparameters or their impact on the overall regularization; if any smoothing bandwidth is chosen separately from ε, this should be stated explicitly and its sensitivity analyzed, as it could affect the claimed parameter-free character relative to EOT.

Authors: We acknowledge that the precise smoothing procedure applied to the SOT plan, including any bandwidth parameter and its relation to ε, was not stated with sufficient explicitness in §3. In the revision we will provide the exact mathematical definition of the smoothing operator, clarify the rule used to select any bandwidth (including whether it is tied to ε), and add a short sensitivity study with respect to that bandwidth in the experimental section. This will make the regularization fully transparent and allow direct comparison with the parameter-free nature of EOT. revision: yes

Circularity Check

0 steps flagged

SROT formulation and dual derivation are self-contained without reduction to inputs by construction

full rationale

The paper introduces SROT by defining regularization toward a smoothed SOT plan as an explicit modeling choice distinct from EOT's independent coupling, then derives the dual formulation and Sinkhorn-style solver from this definition. Accuracy claims relative to exact OT and improvements over the SOT reference are presented as outcomes of the new prior and validated experimentally rather than forced by the equations themselves. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain; the central formulation adds independent content and remains falsifiable against external OT benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The proposal rests on standard optimal transport duality and the existence of a well-defined sliced OT plan; the key modeling choice is the new regularization target, which is introduced without additional free parameters beyond the usual regularization strength.

free parameters (1)

regularization strength
Controls the trade-off between fidelity to the SOT reference and smoothness; chosen per experiment but not fitted to the final accuracy metric in the abstract description.

axioms (2)

standard math Existence and uniqueness properties of the dual formulation for the proposed regularized objective
Invoked when deriving the dual of SROT, standard in convex OT theory.
domain assumption The sliced OT plan can be meaningfully smoothened while remaining a valid coupling
Required for the reference distribution in the regularization term.

pith-pipeline@v0.9.0 · 5745 in / 1448 out tokens · 54454 ms · 2026-05-21T08:59:41.666934+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ASAP: Amortized Doubly-Stochastic Attention via Sliced Dual Projection
cs.LG 2026-05 conditional novelty 7.0

ASAP amortizes Sinkhorn-based doubly-stochastic attention by learning a parametric map from 1D potentials to the Sinkhorn dual and reconstructing the plan via two-sided entropic c-transform, delivering 5.3x faster inf...

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Altschuler, J

J. Altschuler, J. Niles-Weed, and P. Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. InAdvances in Neural Information Processing Systems, pages 1964–1974, 2017. (Cited on pages 2 and 8.)

work page 1964
[2]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017. (Cited on page 1.)

work page 2017
[3]

Benamou, G

J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111– A1138, 2015. (Cited on pages 1, 7, and 14.)

work page 2015
[4]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate Bayesian computation with the Wasserstein distance.Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(2):235–269, 2019. (Cited on page 1.)

work page 2019
[5]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. On parameter estimation with the Wasserstein distance.Information and Inference: A Journal of the IMA, 8(4):657–676, 2019. (Cited on page 1.)

work page 2019
[6]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distribu- tions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130,

work page
[7]

Bonet, L

C. Bonet, L. Drumetz, and N. Courty. Sliced-Wasserstein distances and flows on Cartan- Hadamard manifolds.Journal of Machine Learning Research, 26(32):1–76, 2025. (Cited on page 4.)

work page 2025
[8]

Bonneel, J

N. Bonneel, J. Rabin, G. Peyré, and H. Pfister. Sliced and Radon Wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 1(51):22–45, 2015. (Cited on page 4.)

work page 2015
[9]

L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967. (Cited on page 15.)

work page 1967
[10]

Bunne, S

C. Bunne, S. G. Stark, G. Gut, J. S. Del Castillo, M. Levesque, K.-V. Lehmann, L. Pelkmans, A. Krause, and G. Rätsch. Learning single-cell perturbation responses using neural optimal transport.Nature methods, 20(11):1759–1768, 2023. (Cited on page 1.)

work page 2023
[11]

Catalano, H

M. Catalano, H. Lavenant, A. Lijoi, and I. Prünster. A Wasserstein index of dependence for random measures.Journal of the American Statistical Association, 119(547):2396–2406, 2024. (Cited on page 1.)

work page 2024
[12]

Catalano, A

M. Catalano, A. Lijoi, and I. Prünster. Measuring dependence in the Wasserstein distance for Bayesian nonparametric models.The Annals of Statistics, 49(5):2916–2947, 2021. (Cited on page 1.) 18

work page 2021
[13]

Chapel, R

L. Chapel, R. Tavenard, and S. Vaiter. Differentiable generalized sliced Wasserstein plans. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. (Cited on pages 2 and 4.)

work page 2026
[14]

Chizat, G

L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard. Scaling algorithms for unbalanced optimal transport problems.Mathematics of Computation, 87(314):2563–2609, 2018. (Cited on page 2.)

work page 2018
[15]

Courty, R

N. Courty, R. Flamary, A. Habrard, and A. Rakotomamonjy. Joint distribution optimal transportation for domain adaptation. InAdvances in Neural Information Processing Systems, pages 3730–3739, 2017. (Cited on page 1.)

work page 2017
[16]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. (Cited on pages 1, 2, and 3.)

work page 2013
[17]

B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, and N. Courty. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. InProceedings of the European Conference on Computer Vision (ECCV), pages 447–463, 2018. (Cited on page 1.)

work page 2018
[18]

Feydy, B

J. Feydy, B. Charlier, F.-X. Vialard, and G. Peyré. Optimal transport for diffeomorphic registration. InMedical Image Computing and Computer Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part I 20, pages 291–299. Springer, 2017. (Cited on page 1.)

work page 2017
[19]

Feydy, T

J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouve, and G. Peyré. Interpolating between optimal transport and mmd using sinkhorn divergences. In K. Chaudhuri and M. Sugiyama, editors,Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 268...

work page 2019
[20]

Entropic optimal transport beyond product reference couplings: the Gaussian case on Euclidean space

P. Freulon, N. Georgakis, and V. Panaretos. Entropic optimal transport beyond product reference couplings: the Gaussian case on Euclidean space.arXiv preprint arXiv:2507.01709,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Genevay, L

A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré. Sample complexity of sinkhorn divergences. InThe 22nd international conference on artificial intelligence and statistics, pages 1574–1583. PMLR, 2019. (Cited on page 1.)

work page 2019
[22]

Genevay, M

A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport.Advances in neural information processing systems, 29, 2016. (Cited on page 7.)

work page 2016
[23]

Genevay, G

A. Genevay, G. Peyré, and M. Cuturi. Learning generative models with Sinkhorn divergences. InInternational Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR,

work page
[24]

(Cited on pages 1, 3, and 8.)

work page
[25]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test.The Journal of Machine Learning Research, 13(1):723–773, 2012. (Cited on page 8.) 19

work page 2012
[26]

P. He, O. Khangaonkar, H. Pirsiavash, Y. Bai, and S. Kolouri. Sinkhorn-drifting generative models.arXiv preprint arXiv:2603.12366, 2026. (Cited on page 1.)

work page arXiv 2026
[27]

Kolouri, K

S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde. Generalized sliced Wasserstein distances. InAdvances in Neural Information Processing Systems, pages 261–272, 2019. (Cited on page 4.)

work page 2019
[28]

Kolouri, S

S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde. Optimal mass transport: Signal processing and machine-learning applications.IEEE signal processing magazine, 34(4):43–59,

work page
[29]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. (Cited on page 1.)

work page 2023
[30]

T. Liu, J. Puigcerver, and M. Blondel. Sparsity-constrained optimal transport. InThe Eleventh International Conference on Learning Representations, 2023. (Cited on page 2.)

work page 2023
[31]

X. Liu, R. D. Martin, Y. Bai, A. Shahbazi, M. Thorpe, A. Aldroubi, and S. Kolouri. Expected sliced transport plans. InThe Thirteenth International Conference on Learning Representations,

work page
[32]

(Cited on pages 2 and 4.)

work page
[33]

D. A. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport.Applied Mathematics & Optimization, 83(3):1919–1949, 2021. (Cited on pages 1 and 2.)

work page 1919
[34]

Mahey, L

G. Mahey, L. Chapel, G. Gasso, C. Bonet, and N. Courty. Fast optimal transport through sliced Wasserstein generalized geodesics. InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 35350–35385, 2023. (Cited on pages 2 and 4.)

work page 2023
[35]

Manole, S

T. Manole, S. Balakrishnan, J. Niles-Weed, and L. Wasserman. Plugin estimation of smooth optimal transport maps.The Annals of Statistics, 52(3):966–998, 2024. (Cited on page 1.)

work page 2024
[36]

Muzellec and M

B. Muzellec and M. Cuturi. Subspace detours: Building transport plans that are optimal on subspace projections. InAdvances in Neural Information Processing Systems, pages 6917–6928,

work page
[37]

K. Nguyen. An introduction to sliced optimal transport: foundations, advances, extensions, and applications.Foundations and Trends®in Computer Graphics and Vision, 17(3-4):171–391,

work page
[38]

(Cited on pages 2 and 6.)

work page
[39]

Nguyen and P

K. Nguyen and P. Mueller. Summarizing nonparametric Bayesian mixture posteriors–sliced optimal transport metrics for Gaussian mixtures.Journal of Computational and Graphical Statistics, (just-accepted):1–22, 2026. (Cited on page 4.)

work page 2026
[40]

Nguyen, H

K. Nguyen, H. Nguyen, and N. Ho. Fast estimation of Wasserstein distances via regression on sliced Wasserstein distances. InThe Fourteenth International Conference on Learning Representations, 2026. (Cited on page 2.)

work page 2026
[41]

Nguyen, Y

K. Nguyen, Y. Ni, and P. Mueller. Vertical consensus inference for high-dimensional random partition.arXiv preprint arXiv:2603.27864, 2026. (Cited on page 1.) 20

work page arXiv 2026
[42]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019. (Cited on page 2.)

work page 2019
[43]

Pooladian, H

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Chen. Multisample flow matching: Straightening flows with minibatch couplings. InInternational Conference on Machine Learning, pages 28100–28127. PMLR, 2023. (Cited on page 1.)

work page 2023
[44]

Rabin, G

J. Rabin, G. Peyré, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. InScale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29–June 2, 2011, Revised Selected Papers 3, pages 435–446. Springer, One New York Plaza, Suite 4600, New York, NY 10004-1562, 2...

work page 2011
[45]

Rigollet and A

P. Rigollet and A. J. Stromme. On the sample complexity of entropic optimal transport.The Annals of Statistics, 53(1):61–90, 2025. (Cited on page 1.)

work page 2025
[46]

H. E. Robbins. An empirical Bayes approach to statistics. InBreakthroughs in Statistics: Foundations and Basic Theory, pages 388–394. Springer, 1992. (Cited on page 5.)

work page 1992
[47]

Rowland, J

M. Rowland, J. Hron, Y. Tang, K. Choromanski, T. Sarlos, and A. Weller. Orthogonal estimation of Wasserstein distances. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 186–195. PMLR, 2019. (Cited on page 4.)

work page 2019
[48]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94,

work page
[49]

Scetbon and M

M. Scetbon and M. Cuturi. Low-rank optimal transport: Approximation, statistics and debiasing. Advances in Neural Information Processing Systems, 35:6802–6814, 2022. (Cited on page 2.)

work page 2022
[50]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming.Cell, 176(4):928–943, 2019. (Cited on page 1.)

work page 2019
[51]

Sinkhorn

R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums.The American Mathematical Monthly, 74(4):402–405, 1967. (Cited on page 7.)

work page 1967
[52]

Sinkhorn and P

R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967. (Cited on page 1.)

work page 1967
[53]

Solomon, F

J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34(4):1–11, 2015. (Cited on page 1.)

work page 2015
[54]

Solomon, G

J. Solomon, G. Peyré, V. G. Kim, and S. Sra. Entropic metric alignment for correspondence problems.ACM Transactions on Graphics (TOG), 35(4):72, 2016. (Cited on page 1.)

work page 2016
[55]

Sliced opti- mal transport plans.arXiv preprint arXiv:2508.01243, 2025

E. Tanguy, L. Chapel, and J. Delon. Sliced optimal transport plans.arXiv preprint arXiv:2508.01243, 2025. (Cited on pages 2 and 4.) 21

work page internal anchor Pith review arXiv 2025
[56]

Thornton and M

J. Thornton and M. Cuturi. Rethinking initialization of the Sinkhorn algorithm. InInternational Conference on Artificial Intelligence and Statistics, pages 8682–8698. PMLR, 2023. (Cited on page 2.)

work page 2023
[57]

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024. Expert Certification. (Cited on page 1.)

work page 2024
[58]

Amortized Optimal Transport from Sliced Potentials

M.-P. Truong and K. Nguyen. Amortized optimal transport from sliced potentials.arXiv preprint arXiv:2604.15114, 2026. (Cited on page 2.)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[59]

Villani.Topics in optimal transportation

C. Villani.Topics in optimal transportation. Number 58. American Mathematical Soc., 2003. (Cited on page 1.)

work page 2003
[60]

Villani.Optimal transport: old and new, volume 338

C. Villani.Optimal transport: old and new, volume 338. Springer, One New York Plaza, Suite 4600, New York, NY 10004-1562, 2009. (Cited on pages 1 and 3.) 22

work page 2009

[1] [1]

Altschuler, J

J. Altschuler, J. Niles-Weed, and P. Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. InAdvances in Neural Information Processing Systems, pages 1964–1974, 2017. (Cited on pages 2 and 8.)

work page 1964

[2] [2]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017. (Cited on page 1.)

work page 2017

[3] [3]

Benamou, G

J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111– A1138, 2015. (Cited on pages 1, 7, and 14.)

work page 2015

[4] [4]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate Bayesian computation with the Wasserstein distance.Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(2):235–269, 2019. (Cited on page 1.)

work page 2019

[5] [5]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. On parameter estimation with the Wasserstein distance.Information and Inference: A Journal of the IMA, 8(4):657–676, 2019. (Cited on page 1.)

work page 2019

[6] [6]

P. G. Bissiri, C. C. Holmes, and S. G. Walker. A general framework for updating belief distribu- tions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130,

work page

[7] [7]

Bonet, L

C. Bonet, L. Drumetz, and N. Courty. Sliced-Wasserstein distances and flows on Cartan- Hadamard manifolds.Journal of Machine Learning Research, 26(32):1–76, 2025. (Cited on page 4.)

work page 2025

[8] [8]

Bonneel, J

N. Bonneel, J. Rabin, G. Peyré, and H. Pfister. Sliced and Radon Wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 1(51):22–45, 2015. (Cited on page 4.)

work page 2015

[9] [9]

L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967. (Cited on page 15.)

work page 1967

[10] [10]

Bunne, S

C. Bunne, S. G. Stark, G. Gut, J. S. Del Castillo, M. Levesque, K.-V. Lehmann, L. Pelkmans, A. Krause, and G. Rätsch. Learning single-cell perturbation responses using neural optimal transport.Nature methods, 20(11):1759–1768, 2023. (Cited on page 1.)

work page 2023

[11] [11]

Catalano, H

M. Catalano, H. Lavenant, A. Lijoi, and I. Prünster. A Wasserstein index of dependence for random measures.Journal of the American Statistical Association, 119(547):2396–2406, 2024. (Cited on page 1.)

work page 2024

[12] [12]

Catalano, A

M. Catalano, A. Lijoi, and I. Prünster. Measuring dependence in the Wasserstein distance for Bayesian nonparametric models.The Annals of Statistics, 49(5):2916–2947, 2021. (Cited on page 1.) 18

work page 2021

[13] [13]

Chapel, R

L. Chapel, R. Tavenard, and S. Vaiter. Differentiable generalized sliced Wasserstein plans. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. (Cited on pages 2 and 4.)

work page 2026

[14] [14]

Chizat, G

L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard. Scaling algorithms for unbalanced optimal transport problems.Mathematics of Computation, 87(314):2563–2609, 2018. (Cited on page 2.)

work page 2018

[15] [15]

Courty, R

N. Courty, R. Flamary, A. Habrard, and A. Rakotomamonjy. Joint distribution optimal transportation for domain adaptation. InAdvances in Neural Information Processing Systems, pages 3730–3739, 2017. (Cited on page 1.)

work page 2017

[16] [16]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. (Cited on pages 1, 2, and 3.)

work page 2013

[17] [17]

B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, and N. Courty. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. InProceedings of the European Conference on Computer Vision (ECCV), pages 447–463, 2018. (Cited on page 1.)

work page 2018

[18] [18]

Feydy, B

J. Feydy, B. Charlier, F.-X. Vialard, and G. Peyré. Optimal transport for diffeomorphic registration. InMedical Image Computing and Computer Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part I 20, pages 291–299. Springer, 2017. (Cited on page 1.)

work page 2017

[19] [19]

Feydy, T

J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouve, and G. Peyré. Interpolating between optimal transport and mmd using sinkhorn divergences. In K. Chaudhuri and M. Sugiyama, editors,Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 268...

work page 2019

[20] [20]

Entropic optimal transport beyond product reference couplings: the Gaussian case on Euclidean space

P. Freulon, N. Georgakis, and V. Panaretos. Entropic optimal transport beyond product reference couplings: the Gaussian case on Euclidean space.arXiv preprint arXiv:2507.01709,

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Genevay, L

A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré. Sample complexity of sinkhorn divergences. InThe 22nd international conference on artificial intelligence and statistics, pages 1574–1583. PMLR, 2019. (Cited on page 1.)

work page 2019

[22] [22]

Genevay, M

A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport.Advances in neural information processing systems, 29, 2016. (Cited on page 7.)

work page 2016

[23] [23]

Genevay, G

A. Genevay, G. Peyré, and M. Cuturi. Learning generative models with Sinkhorn divergences. InInternational Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR,

work page

[24] [24]

(Cited on pages 1, 3, and 8.)

work page

[25] [25]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test.The Journal of Machine Learning Research, 13(1):723–773, 2012. (Cited on page 8.) 19

work page 2012

[26] [26]

P. He, O. Khangaonkar, H. Pirsiavash, Y. Bai, and S. Kolouri. Sinkhorn-drifting generative models.arXiv preprint arXiv:2603.12366, 2026. (Cited on page 1.)

work page arXiv 2026

[27] [27]

Kolouri, K

S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde. Generalized sliced Wasserstein distances. InAdvances in Neural Information Processing Systems, pages 261–272, 2019. (Cited on page 4.)

work page 2019

[28] [28]

Kolouri, S

S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde. Optimal mass transport: Signal processing and machine-learning applications.IEEE signal processing magazine, 34(4):43–59,

work page

[29] [29]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. (Cited on page 1.)

work page 2023

[30] [30]

T. Liu, J. Puigcerver, and M. Blondel. Sparsity-constrained optimal transport. InThe Eleventh International Conference on Learning Representations, 2023. (Cited on page 2.)

work page 2023

[31] [31]

X. Liu, R. D. Martin, Y. Bai, A. Shahbazi, M. Thorpe, A. Aldroubi, and S. Kolouri. Expected sliced transport plans. InThe Thirteenth International Conference on Learning Representations,

work page

[32] [32]

(Cited on pages 2 and 4.)

work page

[33] [33]

D. A. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport.Applied Mathematics & Optimization, 83(3):1919–1949, 2021. (Cited on pages 1 and 2.)

work page 1919

[34] [34]

Mahey, L

G. Mahey, L. Chapel, G. Gasso, C. Bonet, and N. Courty. Fast optimal transport through sliced Wasserstein generalized geodesics. InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 35350–35385, 2023. (Cited on pages 2 and 4.)

work page 2023

[35] [35]

Manole, S

T. Manole, S. Balakrishnan, J. Niles-Weed, and L. Wasserman. Plugin estimation of smooth optimal transport maps.The Annals of Statistics, 52(3):966–998, 2024. (Cited on page 1.)

work page 2024

[36] [36]

Muzellec and M

B. Muzellec and M. Cuturi. Subspace detours: Building transport plans that are optimal on subspace projections. InAdvances in Neural Information Processing Systems, pages 6917–6928,

work page

[37] [37]

K. Nguyen. An introduction to sliced optimal transport: foundations, advances, extensions, and applications.Foundations and Trends®in Computer Graphics and Vision, 17(3-4):171–391,

work page

[38] [38]

(Cited on pages 2 and 6.)

work page

[39] [39]

Nguyen and P

K. Nguyen and P. Mueller. Summarizing nonparametric Bayesian mixture posteriors–sliced optimal transport metrics for Gaussian mixtures.Journal of Computational and Graphical Statistics, (just-accepted):1–22, 2026. (Cited on page 4.)

work page 2026

[40] [40]

Nguyen, H

K. Nguyen, H. Nguyen, and N. Ho. Fast estimation of Wasserstein distances via regression on sliced Wasserstein distances. InThe Fourteenth International Conference on Learning Representations, 2026. (Cited on page 2.)

work page 2026

[41] [41]

Nguyen, Y

K. Nguyen, Y. Ni, and P. Mueller. Vertical consensus inference for high-dimensional random partition.arXiv preprint arXiv:2603.27864, 2026. (Cited on page 1.) 20

work page arXiv 2026

[42] [42]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019. (Cited on page 2.)

work page 2019

[43] [43]

Pooladian, H

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Chen. Multisample flow matching: Straightening flows with minibatch couplings. InInternational Conference on Machine Learning, pages 28100–28127. PMLR, 2023. (Cited on page 1.)

work page 2023

[44] [44]

Rabin, G

J. Rabin, G. Peyré, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. InScale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29–June 2, 2011, Revised Selected Papers 3, pages 435–446. Springer, One New York Plaza, Suite 4600, New York, NY 10004-1562, 2...

work page 2011

[45] [45]

Rigollet and A

P. Rigollet and A. J. Stromme. On the sample complexity of entropic optimal transport.The Annals of Statistics, 53(1):61–90, 2025. (Cited on page 1.)

work page 2025

[46] [46]

H. E. Robbins. An empirical Bayes approach to statistics. InBreakthroughs in Statistics: Foundations and Basic Theory, pages 388–394. Springer, 1992. (Cited on page 5.)

work page 1992

[47] [47]

Rowland, J

M. Rowland, J. Hron, Y. Tang, K. Choromanski, T. Sarlos, and A. Weller. Orthogonal estimation of Wasserstein distances. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 186–195. PMLR, 2019. (Cited on page 4.)

work page 2019

[48] [48]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94,

work page

[49] [49]

Scetbon and M

M. Scetbon and M. Cuturi. Low-rank optimal transport: Approximation, statistics and debiasing. Advances in Neural Information Processing Systems, 35:6802–6814, 2022. (Cited on page 2.)

work page 2022

[50] [50]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming.Cell, 176(4):928–943, 2019. (Cited on page 1.)

work page 2019

[51] [51]

Sinkhorn

R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums.The American Mathematical Monthly, 74(4):402–405, 1967. (Cited on page 7.)

work page 1967

[52] [52]

Sinkhorn and P

R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967. (Cited on page 1.)

work page 1967

[53] [53]

Solomon, F

J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (ToG), 34(4):1–11, 2015. (Cited on page 1.)

work page 2015

[54] [54]

Solomon, G

J. Solomon, G. Peyré, V. G. Kim, and S. Sra. Entropic metric alignment for correspondence problems.ACM Transactions on Graphics (TOG), 35(4):72, 2016. (Cited on page 1.)

work page 2016

[55] [55]

Sliced opti- mal transport plans.arXiv preprint arXiv:2508.01243, 2025

E. Tanguy, L. Chapel, and J. Delon. Sliced optimal transport plans.arXiv preprint arXiv:2508.01243, 2025. (Cited on pages 2 and 4.) 21

work page internal anchor Pith review arXiv 2025

[56] [56]

Thornton and M

J. Thornton and M. Cuturi. Rethinking initialization of the Sinkhorn algorithm. InInternational Conference on Artificial Intelligence and Statistics, pages 8682–8698. PMLR, 2023. (Cited on page 2.)

work page 2023

[57] [57]

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024. Expert Certification. (Cited on page 1.)

work page 2024

[58] [58]

Amortized Optimal Transport from Sliced Potentials

M.-P. Truong and K. Nguyen. Amortized optimal transport from sliced potentials.arXiv preprint arXiv:2604.15114, 2026. (Cited on page 2.)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[59] [59]

Villani.Topics in optimal transportation

C. Villani.Topics in optimal transportation. Number 58. American Mathematical Soc., 2003. (Cited on page 1.)

work page 2003

[60] [60]

Villani.Optimal transport: old and new, volume 338

C. Villani.Optimal transport: old and new, volume 338. Springer, One New York Plaza, Suite 4600, New York, NY 10004-1562, 2009. (Cited on pages 1 and 3.) 22

work page 2009