Equivalence of optimal transport problems to regularization on the family of f-divergences

Maxime Nicaise; Samir M. Perlaza; Yaiza Bermudez

arxiv: 2604.12996 · v1 · submitted 2026-04-14 · 🧮 math.ST · stat.TH

Equivalence of optimal transport problems to regularization on the family of f-divergences

Maxime Nicaise , Yaiza Bermudez , Samir M. Perlaza This is my paper

Pith reviewed 2026-05-10 13:48 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords optimal transportf-divergenceregularizationequivalencePolish spacesbounded costunique minimizer

0 comments

The pith

An optimal transport problem regularized by one f-divergence shares the same unique minimizer as another regularized by a different g-divergence after transforming the cost function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves that an optimal transport problem regularized by a given f-divergence admits the same solution as another optimal transport problem regularized by a different g-divergence when the cost function is appropriately transformed. The equivalence is established in Polish spaces with bounded cost functions, where the minimizer is assumed unique. A reader would care because it shows that the choice of regularizer from the f-divergence family is interchangeable up to cost adjustment rather than producing fundamentally different outcomes. This unifies a range of regularized transport formulations that might otherwise appear distinct.

Core claim

An OT problem regularized by a given f-divergence admits the same solution as another OT problem regularized by a different g-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

What carries the argument

The cost function transformation that equates the regularized problems so they share the same unique minimizer.

Load-bearing premise

The minimizer is unique and the cost functions are bounded on Polish spaces.

What would settle it

Construct a concrete Polish space example with bounded costs where the transformed cost still produces distinct minimizers for the two regularized problems.

read the original abstract

This work establishes that an optimal transport~(OT) problem regularized by a given $f$-divergence admits the same solution as another OT problem regularized by a different $g$-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that f-divergence regularized OT with cost c shares its unique minimizer with g-regularized OT under a transformed cost, but the result stays inside standard Polish-space assumptions.

read the letter

The central claim is that an OT problem regularized by an f-divergence with cost c has the same unique minimizer as another OT problem regularized by a g-divergence once the cost is replaced by a suitable c'. This is set in Polish spaces with bounded measurable costs and uniqueness of the minimizer. The equivalence is presented as a direct structural identity between the two variational problems rather than an approximation or limit result. From the description, the argument follows from the definitions of the regularized functionals and does not appear to introduce circularity or hidden fitting steps. The stress-test note also flags no internal inconsistency, which aligns with what is visible. The paper therefore gives a clean reparameterization that lets one swap between different divergence regularizers by changing the cost. That is useful for organizing choices inside the OT literature, especially when one regularizer is easier to analyze than another. The setting is standard and the assumptions are stated explicitly, so the math holds up on its own terms. The main limitations are the reliance on uniqueness, which does not always hold without extra conditions, and the restriction to bounded costs. There is also no discussion of how to construct the transformed cost explicitly for concrete f and g, nor any numerical checks or application examples. If the transformation turns out to be as hard to compute as the original problem, the practical payoff shrinks. This is a specialized note that will mainly interest people already working on divergence-regularized transport or related variational problems in statistics and machine learning. A reader looking for new algorithms or statistical rates will not find them here. It is still worth sending to peer review because the claim is precise, the framework is standard, and the equivalence is stated cleanly enough to be checked or extended by others.

Referee Report

0 major / 2 minor

Summary. The paper claims that an optimal transport (OT) problem regularized by a given f-divergence admits the same unique minimizer as another OT problem regularized by a different g-divergence, provided the cost function is appropriately transformed. The result is set in Polish spaces with bounded measurable cost functions and relies on uniqueness of the minimizer; it is presented as a structural identity rather than an approximation.

Significance. If the equivalence holds, it supplies a direct relation between distinct f-divergence regularizations of OT, which may allow results or algorithms developed for one divergence to be transferred to another via cost adjustment. This is a clean variational identity with potential utility in analysis and computation of regularized transport problems.

minor comments (2)

The abstract states the equivalence but does not indicate the explicit form of the cost transformation or the relation between f and g; including a short illustrative example or the key identity would improve accessibility.
The uniqueness assumption on the minimizer is central; a brief discussion of when this holds (e.g., strict convexity of the divergences or strict positivity of the cost) would strengthen the statement of the result.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. We appreciate the recognition that the established equivalence supplies a direct relation between distinct f-divergence regularizations of optimal transport, which may facilitate transferring results or algorithms via cost adjustment. This is presented as a structural identity rather than an approximation, holding in Polish spaces with bounded measurable costs under uniqueness of the minimizer.

Circularity Check

0 steps flagged

No significant circularity; direct structural equivalence

full rationale

The paper establishes a mathematical equivalence: an f-divergence-regularized OT problem with cost c shares the same unique minimizer as a g-divergence-regularized OT problem with transformed cost c'. This is shown directly in Polish spaces with bounded measurable costs under the uniqueness assumption. No derivation step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the result is a variational identity proven from the definitions of the regularized problems. The argument is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes smuggled via prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract identifies no free parameters, no ad-hoc axioms beyond standard OT assumptions (Polish spaces, bounded costs), and no invented entities. The result is framed as a structural equivalence within existing theory.

pith-pipeline@v0.9.0 · 5368 in / 1000 out tokens · 49839 ms · 2026-05-10T13:48:05.633484+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Wasserstein auto-encoders,

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Sch ¨olkopf, “Wasserstein auto-encoders,” inProceedings of the International Conference on Learning Representations (ICLR), May 2018

work page 2018
[2]

Wasserstein GAN,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” inPro- ceedings of the International Conference on Machine Learning (ICML), Jul. 2017, pp. 214–223

work page 2017
[3]

Optimal transport for domain adaptation,

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1852–1865, Sep. 2017

work page 2017
[4]

Learning with a Wasserstein loss,

C. Frogner, C. Zhang, H. Mobahi, M. Araya-Polo, and T. Poggio, “Learning with a Wasserstein loss,” inProceedings of the 29th Interna- tional Conference on Neural Information Processing Systems - Volume 2, Dec. 2015, pp. 2053–2061

work page 2015
[5]

Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,

J. Solomon, F. de Goes, G. Peyr ´e, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, “Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,”ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–11, Jul. 2015

work page 2015
[6]

A survey on optimal transport for machine learning: Theory and applications,

L. M. Pereira and M. H. Amini, “A survey on optimal transport for machine learning: Theory and applications,”IEEE Access, vol. 13, pp. 26 506–26 526, Jan. 2025

work page 2025
[7]

Villani,Optimal Transport: Old and New, 1st ed

C. Villani,Optimal Transport: Old and New, 1st ed. Berlin, Heidelberg: Springer-Verlag, 2009

work page 2009
[8]

Peyr ´e and M

G. Peyr ´e and M. Cuturi,Computational Optimal Transport, 1st ed. Hanover, MA, USA: Foundations and Trends in Machine Learning, 2020

work page 2020
[9]

Smooth and sparse optimal transport,

M. Blondel, V . Seguy, and A. Rolet, “Smooth and sparse optimal transport,” inProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Mar. 2018, pp. 314–323

work page 2018
[10]

Sinkhorn distances: Lightspeed computation of optimal transportation distances,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transportation distances,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), vol. 2, Dec. 2013, pp. 2292–2300

work page 2013
[11]

Optimal transport losses and sinkhorn algorithm with general convex regularization,

S. Di Marino and A. Gerolin, “Optimal transport losses and sinkhorn algorithm with general convex regularization,”arXiv preprint arXiv:2007.00976, Jul. 2020

work page arXiv 2007
[12]

Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,

D. Terj ´ek and D. Gonz ´alez-S´anchez, “Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,” inPro- ceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 151, Mar. 2022, pp. 5135–5165

work page 2022
[13]

Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed

O. Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed. Beachwood, OH, USA: Institute of Mathematical Statistics, 2007, vol. 56

work page 2007
[14]

Empirical risk minimization with relative entropy regularization,

S. M. Perlaza, G. Bisson, I. Esnaola, A. Jean-Marie, and S. Rini, “Empirical risk minimization with relative entropy regularization,”IEEE Transactions on Information Theory, vol. 70, no. 7, pp. 5122–5161, Jul. 2024

work page 2024
[15]

Empirical risk minimization with relative entropy regularization Type-II,

F. Daunas, I. Esnaola, S. M. Perlaza, and H. V . Poor, “Empirical risk minimization with relative entropy regularization Type-II,” INRIA, Centre Inria d’Universit´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9508, May 2023

work page 2023
[16]

Equivalence of empirical risk minimization to regularization on the family off−divergences,

——, “Equivalence of empirical risk minimization to regularization on the family off−divergences,” inProceedings of the IEEE International Symposium on Information Theory (ISIT), Jul. 2024, pp. 759–764

work page 2024
[17]

Empirical risk minimization with f-divergence regularization in statistical learning,

——, “Empirical risk minimization with f-divergence regularization in statistical learning,” INRIA, Centre Inria d’Universit ´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9521, Oct. 2023

work page 2023
[18]

Asymmetry of the relative entropy in the regularization of empirical risk minimization,

——, “Asymmetry of the relative entropy in the regularization of empirical risk minimization,”IEEE Transactions on Information Theory, vol. 71, no. 8, pp. 6198–6226, Aug. 2025

work page 2025
[19]

Zalinescu,Convex analysis in general vector spaces, 1st ed

C. Zalinescu,Convex analysis in general vector spaces, 1st ed. Singa- pore: World Scientific, 2002

work page 2002
[20]

Partially-finite programming inl 1 and the existence of maximum entropy estimates,

J. M. Borwein and A. S. Lewis, “Partially-finite programming inl 1 and the existence of maximum entropy estimates,”SIAM Journal on Optimization, vol. 3, no. 2, pp. 248–267, May 1993

work page 1993
[21]

f-divergence for convex bodies,

E. M. Werner, “f-divergence for convex bodies,” inAsymptotic Geo- metric Analysis: Proceedings of the Fall 2010 Fields Institute Thematic Program. New York, NY: Springer, 2013, pp. 381–395

work page 2010
[22]

On optimal transport with f-divergence regularization,

M. Nicaise, Y . Bermudez, and S. M. Perlaza, “On optimal transport with f-divergence regularization,” INRIA, Centre Inria d’Universit´e Cˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9607, January 2026

work page 2026
[23]

Information-type measures of difference of probability distri- butions and indirect observations,

I. Csisz ´ar, “Information-type measures of difference of probability distri- butions and indirect observations,”Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 299–318, 1967

work page 1967
[24]

Optimal bounds between f-divergences and integral probability metrics,

R. Agrawal and T. Horel, “Optimal bounds between f-divergences and integral probability metrics,”Journal of Machine Learning Research, vol. 22, no. 1, pp. 5662–5720, Jan. 2021

work page 2021

[1] [1]

Wasserstein auto-encoders,

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Sch ¨olkopf, “Wasserstein auto-encoders,” inProceedings of the International Conference on Learning Representations (ICLR), May 2018

work page 2018

[2] [2]

Wasserstein GAN,

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” inPro- ceedings of the International Conference on Machine Learning (ICML), Jul. 2017, pp. 214–223

work page 2017

[3] [3]

Optimal transport for domain adaptation,

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1852–1865, Sep. 2017

work page 2017

[4] [4]

Learning with a Wasserstein loss,

C. Frogner, C. Zhang, H. Mobahi, M. Araya-Polo, and T. Poggio, “Learning with a Wasserstein loss,” inProceedings of the 29th Interna- tional Conference on Neural Information Processing Systems - Volume 2, Dec. 2015, pp. 2053–2061

work page 2015

[5] [5]

Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,

J. Solomon, F. de Goes, G. Peyr ´e, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, “Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains,”ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–11, Jul. 2015

work page 2015

[6] [6]

A survey on optimal transport for machine learning: Theory and applications,

L. M. Pereira and M. H. Amini, “A survey on optimal transport for machine learning: Theory and applications,”IEEE Access, vol. 13, pp. 26 506–26 526, Jan. 2025

work page 2025

[7] [7]

Villani,Optimal Transport: Old and New, 1st ed

C. Villani,Optimal Transport: Old and New, 1st ed. Berlin, Heidelberg: Springer-Verlag, 2009

work page 2009

[8] [8]

Peyr ´e and M

G. Peyr ´e and M. Cuturi,Computational Optimal Transport, 1st ed. Hanover, MA, USA: Foundations and Trends in Machine Learning, 2020

work page 2020

[9] [9]

Smooth and sparse optimal transport,

M. Blondel, V . Seguy, and A. Rolet, “Smooth and sparse optimal transport,” inProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Mar. 2018, pp. 314–323

work page 2018

[10] [10]

Sinkhorn distances: Lightspeed computation of optimal transportation distances,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transportation distances,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), vol. 2, Dec. 2013, pp. 2292–2300

work page 2013

[11] [11]

Optimal transport losses and sinkhorn algorithm with general convex regularization,

S. Di Marino and A. Gerolin, “Optimal transport losses and sinkhorn algorithm with general convex regularization,”arXiv preprint arXiv:2007.00976, Jul. 2020

work page arXiv 2007

[12] [12]

Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,

D. Terj ´ek and D. Gonz ´alez-S´anchez, “Optimal transport with f- divergence regularization and generalized sinkhorn algorithm,” inPro- ceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 151, Mar. 2022, pp. 5135–5165

work page 2022

[13] [13]

Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed

O. Catoni,PAC-Bayesian Supervised Classification: The Thermodynam- ics of Statistical Learning, 1st ed. Beachwood, OH, USA: Institute of Mathematical Statistics, 2007, vol. 56

work page 2007

[14] [14]

Empirical risk minimization with relative entropy regularization,

S. M. Perlaza, G. Bisson, I. Esnaola, A. Jean-Marie, and S. Rini, “Empirical risk minimization with relative entropy regularization,”IEEE Transactions on Information Theory, vol. 70, no. 7, pp. 5122–5161, Jul. 2024

work page 2024

[15] [15]

Empirical risk minimization with relative entropy regularization Type-II,

F. Daunas, I. Esnaola, S. M. Perlaza, and H. V . Poor, “Empirical risk minimization with relative entropy regularization Type-II,” INRIA, Centre Inria d’Universit´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9508, May 2023

work page 2023

[16] [16]

Equivalence of empirical risk minimization to regularization on the family off−divergences,

——, “Equivalence of empirical risk minimization to regularization on the family off−divergences,” inProceedings of the IEEE International Symposium on Information Theory (ISIT), Jul. 2024, pp. 759–764

work page 2024

[17] [17]

Empirical risk minimization with f-divergence regularization in statistical learning,

——, “Empirical risk minimization with f-divergence regularization in statistical learning,” INRIA, Centre Inria d’Universit ´e C ˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9521, Oct. 2023

work page 2023

[18] [18]

Asymmetry of the relative entropy in the regularization of empirical risk minimization,

——, “Asymmetry of the relative entropy in the regularization of empirical risk minimization,”IEEE Transactions on Information Theory, vol. 71, no. 8, pp. 6198–6226, Aug. 2025

work page 2025

[19] [19]

Zalinescu,Convex analysis in general vector spaces, 1st ed

C. Zalinescu,Convex analysis in general vector spaces, 1st ed. Singa- pore: World Scientific, 2002

work page 2002

[20] [20]

Partially-finite programming inl 1 and the existence of maximum entropy estimates,

J. M. Borwein and A. S. Lewis, “Partially-finite programming inl 1 and the existence of maximum entropy estimates,”SIAM Journal on Optimization, vol. 3, no. 2, pp. 248–267, May 1993

work page 1993

[21] [21]

f-divergence for convex bodies,

E. M. Werner, “f-divergence for convex bodies,” inAsymptotic Geo- metric Analysis: Proceedings of the Fall 2010 Fields Institute Thematic Program. New York, NY: Springer, 2013, pp. 381–395

work page 2010

[22] [22]

On optimal transport with f-divergence regularization,

M. Nicaise, Y . Bermudez, and S. M. Perlaza, “On optimal transport with f-divergence regularization,” INRIA, Centre Inria d’Universit´e Cˆote d’Azur, Sophia Antipolis, France, Tech. Rep. RR-9607, January 2026

work page 2026

[23] [23]

Information-type measures of difference of probability distri- butions and indirect observations,

I. Csisz ´ar, “Information-type measures of difference of probability distri- butions and indirect observations,”Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 299–318, 1967

work page 1967

[24] [24]

Optimal bounds between f-divergences and integral probability metrics,

R. Agrawal and T. Horel, “Optimal bounds between f-divergences and integral probability metrics,”Journal of Machine Learning Research, vol. 22, no. 1, pp. 5662–5720, Jan. 2021

work page 2021