pith. sign in

arxiv: 2604.12996 · v1 · submitted 2026-04-14 · 🧮 math.ST · stat.TH

Equivalence of optimal transport problems to regularization on the family of f-divergences

Pith reviewed 2026-05-10 13:48 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords optimal transportf-divergenceregularizationequivalencePolish spacesbounded costunique minimizer
0
0 comments X

The pith

An optimal transport problem regularized by one f-divergence shares the same unique minimizer as another regularized by a different g-divergence after transforming the cost function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves that an optimal transport problem regularized by a given f-divergence admits the same solution as another optimal transport problem regularized by a different g-divergence when the cost function is appropriately transformed. The equivalence is established in Polish spaces with bounded cost functions, where the minimizer is assumed unique. A reader would care because it shows that the choice of regularizer from the f-divergence family is interchangeable up to cost adjustment rather than producing fundamentally different outcomes. This unifies a range of regularized transport formulations that might otherwise appear distinct.

Core claim

An OT problem regularized by a given f-divergence admits the same solution as another OT problem regularized by a different g-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

What carries the argument

The cost function transformation that equates the regularized problems so they share the same unique minimizer.

Load-bearing premise

The minimizer is unique and the cost functions are bounded on Polish spaces.

What would settle it

Construct a concrete Polish space example with bounded costs where the transformed cost still produces distinct minimizers for the two regularized problems.

read the original abstract

This work establishes that an optimal transport~(OT) problem regularized by a given $f$-divergence admits the same solution as another OT problem regularized by a different $g$-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that an optimal transport (OT) problem regularized by a given f-divergence admits the same unique minimizer as another OT problem regularized by a different g-divergence, provided the cost function is appropriately transformed. The result is set in Polish spaces with bounded measurable cost functions and relies on uniqueness of the minimizer; it is presented as a structural identity rather than an approximation.

Significance. If the equivalence holds, it supplies a direct relation between distinct f-divergence regularizations of OT, which may allow results or algorithms developed for one divergence to be transferred to another via cost adjustment. This is a clean variational identity with potential utility in analysis and computation of regularized transport problems.

minor comments (2)
  1. The abstract states the equivalence but does not indicate the explicit form of the cost transformation or the relation between f and g; including a short illustrative example or the key identity would improve accessibility.
  2. The uniqueness assumption on the minimizer is central; a brief discussion of when this holds (e.g., strict convexity of the divergences or strict positivity of the cost) would strengthen the statement of the result.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. We appreciate the recognition that the established equivalence supplies a direct relation between distinct f-divergence regularizations of optimal transport, which may facilitate transferring results or algorithms via cost adjustment. This is presented as a structural identity rather than an approximation, holding in Polish spaces with bounded measurable costs under uniqueness of the minimizer.

Circularity Check

0 steps flagged

No significant circularity; direct structural equivalence

full rationale

The paper establishes a mathematical equivalence: an f-divergence-regularized OT problem with cost c shares the same unique minimizer as a g-divergence-regularized OT problem with transformed cost c'. This is shown directly in Polish spaces with bounded measurable costs under the uniqueness assumption. No derivation step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the result is a variational identity proven from the definitions of the regularized problems. The argument is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes smuggled via prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract identifies no free parameters, no ad-hoc axioms beyond standard OT assumptions (Polish spaces, bounded costs), and no invented entities. The result is framed as a structural equivalence within existing theory.

pith-pipeline@v0.9.0 · 5368 in / 1000 out tokens · 49839 ms · 2026-05-10T13:48:05.633484+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Wasserstein auto-encoders,

    I. Tolstikhin, O. Bousquet, S. Gelly, and B. Sch ¨olkopf, “Wasserstein auto-encoders,” inProceedings of the International Conference on Learning Representations (ICLR), May 2018

  2. [2]

    Wasserstein GAN,

    M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” inPro- ceedings of the International Conference on Machine Learning (ICML), Jul. 2017, pp. 214–223

  3. [3]

    Optimal transport for domain adaptation,

    N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1852–1865, Sep. 2017

  4. [4]

    Learning with a Wasserstein loss,

    C. Frogner, C. Zhang, H. Mobahi, M. Araya-Polo, and T. Poggio, “Learning with a Wasserstein loss,” inProceedings of the 29th Interna- tional Conference on Neural Information Processing Systems - Volume 2, Dec. 2015, pp. 2053–2061

  5. [5]

    Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,

    J. Solomon, F. de Goes, G. Peyr ´e, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, “Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,”ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–11, Jul. 2015

  6. [6]

    A survey on optimal transport for machine learning: Theory and applications,

    L. M. Pereira and M. H. Amini, “A survey on optimal transport for machine learning: Theory and applications,”IEEE Access, vol. 13, pp. 26 506–26 526, Jan. 2025

  7. [7]

    Villani,Optimal Transport: Old and New, 1st ed

    C. Villani,Optimal Transport: Old and New, 1st ed. Berlin, Heidelberg: Springer-Verlag, 2009

  8. [8]

    Peyr ´e and M

    G. Peyr ´e and M. Cuturi,Computational Optimal Transport, 1st ed. Hanover, MA, USA: Foundations and Trends in Machine Learning, 2020

  9. [9]

    Smooth and sparse optimal transport,

    M. Blondel, V . Seguy, and A. Rolet, “Smooth and sparse optimal transport,” inProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Mar. 2018, pp. 314–323

  10. [10]

    Sinkhorn distances: Lightspeed computation of optimal transportation distances,

    M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transportation distances,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), vol. 2, Dec. 2013, pp. 2292–2300

  11. [11]

    Optimal transport losses and sinkhorn algorithm with general convex regularization,

    S. Di Marino and A. Gerolin, “Optimal transport losses and sinkhorn algorithm with general convex regularization,”arXiv preprint arXiv:2007.00976, Jul. 2020

  12. [12]

    Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,

    D. Terj ´ek and D. Gonz ´alez-S´anchez, “Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,” inPro- ceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 151, Mar. 2022, pp. 5135–5165

  13. [13]

    Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed

    O. Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed. Beachwood, OH, USA: Institute of Mathematical Statistics, 2007, vol. 56

  14. [14]

    Empirical risk minimization with relative entropy regularization,

    S. M. Perlaza, G. Bisson, I. Esnaola, A. Jean-Marie, and S. Rini, “Empirical risk minimization with relative entropy regularization,”IEEE Transactions on Information Theory, vol. 70, no. 7, pp. 5122–5161, Jul. 2024

  15. [15]

    Empirical risk minimization with relative entropy regularization Type-II,

    F. Daunas, I. Esnaola, S. M. Perlaza, and H. V . Poor, “Empirical risk minimization with relative entropy regularization Type-II,” INRIA, Centre Inria d’Universit´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9508, May 2023

  16. [16]

    Equivalence of empirical risk minimization to regularization on the family off−divergences,

    ——, “Equivalence of empirical risk minimization to regularization on the family off−divergences,” inProceedings of the IEEE International Symposium on Information Theory (ISIT), Jul. 2024, pp. 759–764

  17. [17]

    Empirical risk minimization with f-divergence regularization in statistical learning,

    ——, “Empirical risk minimization with f-divergence regularization in statistical learning,” INRIA, Centre Inria d’Universit ´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9521, Oct. 2023

  18. [18]

    Asymmetry of the relative entropy in the regularization of empirical risk minimization,

    ——, “Asymmetry of the relative entropy in the regularization of empirical risk minimization,”IEEE Transactions on Information Theory, vol. 71, no. 8, pp. 6198–6226, Aug. 2025

  19. [19]

    Zalinescu,Convex analysis in general vector spaces, 1st ed

    C. Zalinescu,Convex analysis in general vector spaces, 1st ed. Singa- pore: World Scientific, 2002

  20. [20]

    Partially-finite programming inl 1 and the existence of maximum entropy estimates,

    J. M. Borwein and A. S. Lewis, “Partially-finite programming inl 1 and the existence of maximum entropy estimates,”SIAM Journal on Optimization, vol. 3, no. 2, pp. 248–267, May 1993

  21. [21]

    f-divergence for convex bodies,

    E. M. Werner, “f-divergence for convex bodies,” inAsymptotic Geo- metric Analysis: Proceedings of the Fall 2010 Fields Institute Thematic Program. New York, NY: Springer, 2013, pp. 381–395

  22. [22]

    On optimal transport with f-divergence regularization,

    M. Nicaise, Y . Bermudez, and S. M. Perlaza, “On optimal transport with f-divergence regularization,” INRIA, Centre Inria d’Universit´e Cˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9607, January 2026

  23. [23]

    Information-type measures of difference of probability distri- butions and indirect observations,

    I. Csisz ´ar, “Information-type measures of difference of probability distri- butions and indirect observations,”Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 299–318, 1967

  24. [24]

    Optimal bounds between f-divergences and integral probability metrics,

    R. Agrawal and T. Horel, “Optimal bounds between f-divergences and integral probability metrics,”Journal of Machine Learning Research, vol. 22, no. 1, pp. 5662–5720, Jan. 2021