Equivalence of optimal transport problems to regularization on the family of f-divergences
Pith reviewed 2026-05-10 13:48 UTC · model grok-4.3
The pith
An optimal transport problem regularized by one f-divergence shares the same unique minimizer as another regularized by a different g-divergence after transforming the cost function.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An OT problem regularized by a given f-divergence admits the same solution as another OT problem regularized by a different g-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.
What carries the argument
The cost function transformation that equates the regularized problems so they share the same unique minimizer.
Load-bearing premise
The minimizer is unique and the cost functions are bounded on Polish spaces.
What would settle it
Construct a concrete Polish space example with bounded costs where the transformed cost still produces distinct minimizers for the two regularized problems.
read the original abstract
This work establishes that an optimal transport~(OT) problem regularized by a given $f$-divergence admits the same solution as another OT problem regularized by a different $g$-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that an optimal transport (OT) problem regularized by a given f-divergence admits the same unique minimizer as another OT problem regularized by a different g-divergence, provided the cost function is appropriately transformed. The result is set in Polish spaces with bounded measurable cost functions and relies on uniqueness of the minimizer; it is presented as a structural identity rather than an approximation.
Significance. If the equivalence holds, it supplies a direct relation between distinct f-divergence regularizations of OT, which may allow results or algorithms developed for one divergence to be transferred to another via cost adjustment. This is a clean variational identity with potential utility in analysis and computation of regularized transport problems.
minor comments (2)
- The abstract states the equivalence but does not indicate the explicit form of the cost transformation or the relation between f and g; including a short illustrative example or the key identity would improve accessibility.
- The uniqueness assumption on the minimizer is central; a brief discussion of when this holds (e.g., strict convexity of the divergences or strict positivity of the cost) would strengthen the statement of the result.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. We appreciate the recognition that the established equivalence supplies a direct relation between distinct f-divergence regularizations of optimal transport, which may facilitate transferring results or algorithms via cost adjustment. This is presented as a structural identity rather than an approximation, holding in Polish spaces with bounded measurable costs under uniqueness of the minimizer.
Circularity Check
No significant circularity; direct structural equivalence
full rationale
The paper establishes a mathematical equivalence: an f-divergence-regularized OT problem with cost c shares the same unique minimizer as a g-divergence-regularized OT problem with transformed cost c'. This is shown directly in Polish spaces with bounded measurable costs under the uniqueness assumption. No derivation step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the result is a variational identity proven from the definitions of the regularized problems. The argument is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes smuggled via prior work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
I. Tolstikhin, O. Bousquet, S. Gelly, and B. Sch ¨olkopf, “Wasserstein auto-encoders,” inProceedings of the International Conference on Learning Representations (ICLR), May 2018
work page 2018
-
[2]
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” inPro- ceedings of the International Conference on Machine Learning (ICML), Jul. 2017, pp. 214–223
work page 2017
-
[3]
Optimal transport for domain adaptation,
N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1852–1865, Sep. 2017
work page 2017
-
[4]
Learning with a Wasserstein loss,
C. Frogner, C. Zhang, H. Mobahi, M. Araya-Polo, and T. Poggio, “Learning with a Wasserstein loss,” inProceedings of the 29th Interna- tional Conference on Neural Information Processing Systems - Volume 2, Dec. 2015, pp. 2053–2061
work page 2015
-
[5]
Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,
J. Solomon, F. de Goes, G. Peyr ´e, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, “Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,”ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–11, Jul. 2015
work page 2015
-
[6]
A survey on optimal transport for machine learning: Theory and applications,
L. M. Pereira and M. H. Amini, “A survey on optimal transport for machine learning: Theory and applications,”IEEE Access, vol. 13, pp. 26 506–26 526, Jan. 2025
work page 2025
-
[7]
Villani,Optimal Transport: Old and New, 1st ed
C. Villani,Optimal Transport: Old and New, 1st ed. Berlin, Heidelberg: Springer-Verlag, 2009
work page 2009
-
[8]
G. Peyr ´e and M. Cuturi,Computational Optimal Transport, 1st ed. Hanover, MA, USA: Foundations and Trends in Machine Learning, 2020
work page 2020
-
[9]
Smooth and sparse optimal transport,
M. Blondel, V . Seguy, and A. Rolet, “Smooth and sparse optimal transport,” inProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Mar. 2018, pp. 314–323
work page 2018
-
[10]
Sinkhorn distances: Lightspeed computation of optimal transportation distances,
M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transportation distances,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), vol. 2, Dec. 2013, pp. 2292–2300
work page 2013
-
[11]
Optimal transport losses and sinkhorn algorithm with general convex regularization,
S. Di Marino and A. Gerolin, “Optimal transport losses and sinkhorn algorithm with general convex regularization,”arXiv preprint arXiv:2007.00976, Jul. 2020
-
[12]
Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,
D. Terj ´ek and D. Gonz ´alez-S´anchez, “Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,” inPro- ceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 151, Mar. 2022, pp. 5135–5165
work page 2022
-
[13]
Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed
O. Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed. Beachwood, OH, USA: Institute of Mathematical Statistics, 2007, vol. 56
work page 2007
-
[14]
Empirical risk minimization with relative entropy regularization,
S. M. Perlaza, G. Bisson, I. Esnaola, A. Jean-Marie, and S. Rini, “Empirical risk minimization with relative entropy regularization,”IEEE Transactions on Information Theory, vol. 70, no. 7, pp. 5122–5161, Jul. 2024
work page 2024
-
[15]
Empirical risk minimization with relative entropy regularization Type-II,
F. Daunas, I. Esnaola, S. M. Perlaza, and H. V . Poor, “Empirical risk minimization with relative entropy regularization Type-II,” INRIA, Centre Inria d’Universit´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9508, May 2023
work page 2023
-
[16]
Equivalence of empirical risk minimization to regularization on the family off−divergences,
——, “Equivalence of empirical risk minimization to regularization on the family off−divergences,” inProceedings of the IEEE International Symposium on Information Theory (ISIT), Jul. 2024, pp. 759–764
work page 2024
-
[17]
Empirical risk minimization with f-divergence regularization in statistical learning,
——, “Empirical risk minimization with f-divergence regularization in statistical learning,” INRIA, Centre Inria d’Universit ´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9521, Oct. 2023
work page 2023
-
[18]
Asymmetry of the relative entropy in the regularization of empirical risk minimization,
——, “Asymmetry of the relative entropy in the regularization of empirical risk minimization,”IEEE Transactions on Information Theory, vol. 71, no. 8, pp. 6198–6226, Aug. 2025
work page 2025
-
[19]
Zalinescu,Convex analysis in general vector spaces, 1st ed
C. Zalinescu,Convex analysis in general vector spaces, 1st ed. Singa- pore: World Scientific, 2002
work page 2002
-
[20]
Partially-finite programming inl 1 and the existence of maximum entropy estimates,
J. M. Borwein and A. S. Lewis, “Partially-finite programming inl 1 and the existence of maximum entropy estimates,”SIAM Journal on Optimization, vol. 3, no. 2, pp. 248–267, May 1993
work page 1993
-
[21]
f-divergence for convex bodies,
E. M. Werner, “f-divergence for convex bodies,” inAsymptotic Geo- metric Analysis: Proceedings of the Fall 2010 Fields Institute Thematic Program. New York, NY: Springer, 2013, pp. 381–395
work page 2010
-
[22]
On optimal transport with f-divergence regularization,
M. Nicaise, Y . Bermudez, and S. M. Perlaza, “On optimal transport with f-divergence regularization,” INRIA, Centre Inria d’Universit´e Cˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9607, January 2026
work page 2026
-
[23]
Information-type measures of difference of probability distri- butions and indirect observations,
I. Csisz ´ar, “Information-type measures of difference of probability distri- butions and indirect observations,”Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 299–318, 1967
work page 1967
-
[24]
Optimal bounds between f-divergences and integral probability metrics,
R. Agrawal and T. Horel, “Optimal bounds between f-divergences and integral probability metrics,”Journal of Machine Learning Research, vol. 22, no. 1, pp. 5662–5720, Jan. 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.