arxiv: 2605.05569 · v2 · submitted 2026-05-07 · 🧮 math.OC · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Stability of the Monge Map in Semi-Dual Optimal Transport

Anton Selitskiy , David Millard

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:44 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords optimal transportMonge mapsemi-dual formulationsaddle-point structureconvergence conditionsconstrained optimization

0 comments

The pith

The semi-dual optimal transport formulation allows Monge maps to converge under conditions that do not require the dual potential to be optimal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the semi-dual version of the optimal transport problem and shows it possesses a degenerate saddle-point structure. That structure makes its numerical solution equivalent to solving a constrained optimization problem. The authors then derive necessary and sufficient conditions under which the Monge map converges even when the dual potential has not reached optimality. These conditions clarify why practical algorithms typically need more iterations to stabilize the transport map than to stabilize the potential.

Core claim

The semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and its numerical solution is equivalent to solving a constrained optimization problem. Necessary and sufficient conditions are derived for the convergence of Monge maps without requiring optimality of the dual potential.

What carries the argument

The degenerate saddle-point structure of the semi-dual formulation, which reduces numerical solution to a constrained optimization problem and separates convergence of the Monge map from optimality of the dual potential.

If this is right

Numerical algorithms for semi-dual optimal transport are equivalent to constrained optimization problems.
Monge map convergence can occur independently of dual potential optimality under the derived conditions.
Algorithms require more iterations to update the transport map than the potential because of the degenerate saddle-point structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Algorithms could be redesigned to update the map and potential on separate schedules once the conditions are checked.
The same separation of convergence rates may appear in other saddle-point formulations of transport problems with similar degeneracy.
Numerical tests on standard benchmark costs could verify whether the necessary and sufficient conditions hold in practice.

Load-bearing premise

The cost function and marginal measures satisfy the standard regularity assumptions used in optimal transport.

What would settle it

A concrete numerical example in which the Monge map fails to converge when the stated necessary and sufficient conditions hold, or converges when those conditions are violated.

Figures

Figures reproduced from arXiv: 2605.05569 by Anton Selitskiy, David Millard.

**Figure 1.** Figure 1: Convergence of transport map with divergence of potential. view at source ↗

**Figure 2.** Figure 2: ∥t − T ⋆∥ 2 L2(µ) and ∥∇ψ − ∇ψ ⋆∥ 2 L2(ν) with respect to K and ηψ/ηt. This viewpoint also explains the empirical observation of Makkuva et al. [2020] that unconstrained neural parameterizations of the transport map (e.g., ∇v in view at source ↗

**Figure 5.** Figure 5: E.4 OTM OTM uses the same max-correlation objective, but adds the published gradient-optimality penalty from optimal transport modeling [Rout et al., 2021]. In our setup this penalty has weight 0.1. The results are shown in view at source ↗

**Figure 3.** Figure 3: Convergence behavior of the transport map and potential for the OTP method. view at source ↗

**Figure 4.** Figure 4: Convergence behavior of the transport map and potential for the Monge Map method. view at source ↗

**Figure 5.** Figure 5: Convergence behavior of the transport map and potential for the Max Correlation method. view at source ↗

**Figure 6.** Figure 6: Convergence behavior of the transport map and potential for the OTM method. view at source ↗

read the original abstract

This paper shows that the semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and that its numerical solution is equivalent to solving a constrained optimization problem. We derive necessary and sufficient conditions for the convergence of Monge maps without requiring optimality of the dual potential. This analysis helps explain why, in practice, numerical algorithms often require more iterations to update the transport map than the potential.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames the semi-dual OT problem as a degenerate saddle point whose numerical solution equals a constrained optimization problem, then gives necessary and sufficient conditions for Monge map convergence that do not require dual optimality.

read the letter

The main point is that the semi-dual formulation has a degenerate saddle-point structure, and solving it numerically is equivalent to a constrained optimization problem. From this the authors derive necessary and sufficient conditions for convergence of the Monge map even when the dual potential is not yet optimal. This setup is used to explain why practical algorithms typically need more iterations to stabilize the transport map than the potential itself.

Referee Report

2 major / 1 minor

Summary. This paper shows that the semi-dual formulation of the optimal transport problem has a degenerate saddle-point structure, and that its numerical solution is equivalent to solving a constrained optimization problem. We derive necessary and sufficient conditions for the convergence of Monge maps without requiring optimality of the dual potential. This analysis helps explain why, in practice, numerical algorithms often require more iterations to update the transport map than the potential.

Significance. If the derived conditions hold under the stated regularity assumptions on the cost and marginals, the work could offer a useful theoretical lens on stability and convergence rates in semi-dual OT algorithms. The link between the degenerate saddle-point structure and the observed disparity in iteration counts for the map versus the potential is a concrete practical insight that may inform algorithm design. The manuscript does not mention machine-checked proofs or reproducible code, so those strengths are not credited here.

major comments (2)

Abstract: The abstract asserts a derivation of necessary and sufficient conditions for Monge-map convergence, but supplies no proof outline, no statement of assumptions, and no verification steps; therefore the support for the central claim cannot be assessed.
The equivalence between the numerical solution of the semi-dual problem and the constrained optimization problem is presented as following from the degenerate saddle-point structure, but the precise mechanism by which degeneracy is exploited (and any additional technical conditions required) is not visible in the abstract and must be load-bearing for the main result.

minor comments (1)

The abstract could be expanded to list the key regularity assumptions on the cost function and marginal measures that underpin the derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on the abstract and the presentation of our main results. We address each major comment below and will revise the manuscript accordingly to improve clarity while preserving the paper's focus.

read point-by-point responses

Referee: Abstract: The abstract asserts a derivation of necessary and sufficient conditions for Monge-map convergence, but supplies no proof outline, no statement of assumptions, and no verification steps; therefore the support for the central claim cannot be assessed.

Authors: We agree that the abstract, as a concise summary, omits a proof outline and explicit assumptions. The full manuscript states the necessary and sufficient conditions in Theorem 3.2 under the assumptions of a C^2 strictly convex cost satisfying the twist condition and positive continuous densities for the marginals. The derivation proceeds by linearizing the semi-dual optimality conditions around the degenerate saddle point and identifying the kernel of the Hessian with respect to the map variables. We will revise the abstract to include a brief outline: 'Under standard regularity assumptions on the cost and marginals, we derive necessary and sufficient conditions for Monge map convergence by analyzing the degenerate saddle-point structure, without requiring dual potential optimality.' This revision will allow readers to better assess the central claim. revision: yes
Referee: The equivalence between the numerical solution of the semi-dual problem and the constrained optimization problem is presented as following from the degenerate saddle-point structure, but the precise mechanism by which degeneracy is exploited (and any additional technical conditions required) is not visible in the abstract and must be load-bearing for the main result.

Authors: The equivalence is established in Theorem 2.1 by showing that the semi-dual objective's saddle-point Hessian is singular in the directions of the transport map (which is the gradient of the potential), allowing reduction to a constrained optimization problem over the potential alone. This exploits the fact that variations in the map are constrained by the Monge relation, and holds under the twist condition on the cost (already stated in Section 2). We agree the abstract does not make the mechanism visible and will add a clarifying phrase: 'by exploiting the degeneracy of the saddle-point Hessian to establish equivalence to a constrained optimization problem.' No further technical conditions are required beyond those in the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper derives necessary and sufficient conditions for Monge-map convergence from the degenerate saddle-point structure of the semi-dual formulation and its equivalence to a constrained optimization problem. These steps rely on standard regularity assumptions for costs and measures rather than any fitted parameters, self-definitions, or load-bearing self-citations. No equation or claim reduces to its own inputs by construction, and the central results are presented as consequences of the problem structure without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5350 in / 944 out tokens · 33839 ms · 2026-05-12T01:44:32.852982+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3(iv): F(ψ⋆,T⋆)=F(ψ,T⋆) for all ψ; at the optimal transport map the objective becomes independent of the potential

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 1 internal anchor

[1]

, title =

Monge, G. , title =. Histoire de l’Acad\'emie Royale des Sciences avec les M\'emoires de Math\'ematique & de Physique, Paris , year =

work page
[2]

, title =

Kantorovich, Leonid V. , title =. Dokl. Akad. Nauk SSSR , year =

work page
[3]

Kantorovich, L. V. and Rubinshtein G. Sh. , title =. Dokl. Akad. Nauk SSSR , year =

work page
[4]

Brenier, Yann , title =. Comm. Pure Appl. Math. , year =

work page
[5]

and Brenier, Y

Benamou, J.-D. and Brenier, Y. , title =. Numer. Math. , year =

work page
[6]

Duke Mathematical Journal , year =

McCann, Robert , title =. Duke Mathematical Journal , year =

work page
[7]

Pavliotis, G. A. , title =

work page
[8]

Lipman, Yaron and Chen, Ricky T. Q. and Ben-Hamu, Heli and Nickel, Maximilian and Le Matt , title =. https://arxiv.org/abs/2210.02747 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Tyrrell and Wets Roger J.-B

Rockafellar, R. Tyrrell and Wets Roger J.-B. , title =

work page
[10]

, title =

Santambrogio, F. , title =

work page
[11]

, title =

Chen, Yongxin and Georgiou, Tryphon T. , title =. SIAM Review , year =

work page
[12]

Vershik, A. M. , title =. Russian Mathematical Surveys , volume =

work page
[13]

Kantorovich, L. V. , title =. Uspekhi Matematicheskikh Nauk , volume =

work page
[14]

Bogachev, V. I. and R\". Kolmogorov Problems on Equations for Stationary and Transition Probabilities of Diffusion Processes , journal =. 2023 , doi =

work page 2023
[15]

Annales de l'Institut Henri Poincaré (C) Analyse Non Linéaire , volume =

Pratelli, Aldo , title =. Annales de l'Institut Henri Poincaré (C) Analyse Non Linéaire , volume =

work page
[16]

Bogachev, V. I. and Kalinin, A. N. and Popova, S. N. , title =. Journal of Mathematical Sciences , volume =. 2019 , note =

work page 2019
[17]

and Pratelli, A

Ambrosio, L. and Pratelli, A. , title =. Optimal Transportation and Applications , series =. 2003 , pages =

work page 2003
[18]

, title =

Vaserstein, Leonid N. , title =. Problemy Peredachi Informatsii , volume =

work page
[19]

Dobrushin, R. L. , title =. Theory of Probability and Its Applications , volume =

work page
[20]

Fr\'echet, M , title =. C. R. Acad. Sci. Paris , volume =

work page
[21]

Wasserstein Generative Adversarial Networks , booktitle =

Mart. Wasserstein Generative Adversarial Networks , booktitle =. 2017 , publisher =

work page 2017
[22]

Improved Training of Wasserstein GANs , booktitle =

Ishaan Gulrajani and Faruk Ahmed and Mart. Improved Training of Wasserstein GANs , booktitle =. 2017 , url =

work page 2017
[23]

Brian D. O. Anderson , title =. Stochastic Processes and their Applications , volume =. 1982 , doi =

work page 1982
[24]

Tyrrell and Wets, Roger J.-B

Rockafellar, R. Tyrrell and Wets, Roger J.-B. , title =. Pacific Journal of Mathematics , volume =

work page
[25]

arXiv preprint arXiv:2110.02999 , year =

Generative Modeling with Optimal Transport Maps , author =. arXiv preprint arXiv:2110.02999 , year =

work page arXiv
[26]

Calculus of Variations and Partial Differential Equations , volume =

Existence, Duality, and Cyclical Monotonicity for Weak Transport Costs , author =. Calculus of Variations and Partial Differential Equations , volume =. 2019 , doi =

work page 2019
[27]

Tyrrell , title =

Rockafellar, R. Tyrrell , title =. Nonlinear Operators and the Calculus of Variations , editor =. 1976 , series =

work page 1976
[28]

2002 , series =

Kallenberg, Olav , title =. 2002 , series =

work page 2002
[29]

Advances in Neural Information Processing Systems , volume=

Sinkhorn Distances: Lightspeed Computation of Optimal Transport , author=. Advances in Neural Information Processing Systems , volume=

work page
[30]

The American Mathematical Monthly , volume =

Sinkhorn, Richard , title =. The American Mathematical Monthly , volume =

work page
[31]

Interspeech , year=

Training-Free Voice Conversion with Factorized Optimal Transport , author=. Interspeech , year=

work page
[32]

IEEE TPAMI , year=

Optimal Transport for Domain Adaptation , author=. IEEE TPAMI , year=

work page
[33]

ICASSP , year=

X-vectors: Robust DNN embeddings for speaker recognition , author=. ICASSP , year=

work page
[34]

Advances in Neural Information Processing Systems , volume=

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , author=. Advances in Neural Information Processing Systems , volume=

work page
[35]

International Conference on Learning Representations (ICLR) , year=

DiffWave: A Versatile Diffusion Model for Audio Synthesis , author=. International Conference on Learning Representations (ICLR) , year=

work page
[36]

Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , pages=

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching , author=. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , pages=

work page
[37]

2020 , journal=

Denoising Diffusion Probabilistic Models , author=. 2020 , journal=

work page 2020
[38]

ICLR , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=

work page
[39]

Sitzungsberichte der Preussischen Akademie der Wissenschaften, Physikalisch-mathematische Klasse , year =

Schr. Sitzungsberichte der Preussischen Akademie der Wissenschaften, Physikalisch-mathematische Klasse , year =

work page
[40]

Chetrite, Rapha. E. Schr. The European Physical Journal H , volume =

work page
[41]

Discrete and Continuous Dynamical Systems , year=

A survey of the Schrödinger problem and some of its connections with optimal transport , author=. Discrete and Continuous Dynamical Systems , year=

work page
[42]

Diffusion schrödinger bridge with applications to score-based generative modeling.arXiv preprint arXiv:2106.01357,

Peluchetti, Stefano , title =. arXiv preprint arXiv:2106.01357 , year =

work page arXiv
[43]

Hitchcock, F. L. , title =. Journal of Mathematics and Physics , volume =. 1941 , doi =

work page 1941
[44]

, title =

Kantorovich, Leonid V. , title =. 1939 , publisher =

work page 1939
[45]

, title =

Kantorovich, Leonid V. , title =. Management Science , volume =

work page
[46]

Rubinshtein, G. S. , title =. Vestnik Leningrad University. Mathematics, Mechanics, Astronomy , volume =. 1958 , note =

work page 1958
[47]

, title =

Dantzig, George B. , title =

work page
[48]

A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers , author=. Proc. ICLR , year=

work page
[49]

Optimal Transport

Villani, C\'. Optimal Transport. Old and New , publisher =. 2009 , OPTkey =

work page 2009
[50]

Neural Optimal Transport , author=. Proc. ICLR , year=

work page
[51]

Proceedings of the 37th International Conference on Machine Learning , pages =

Optimal Transport Mapping via Input Convex Neural Networks , author =. Proceedings of the 37th International Conference on Machine Learning , pages =. 2020 , editor =

work page 2020
[52]

NeurIPS , year=

Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark , author=. NeurIPS , year=

work page
[53]

ICML , year=

Optimal Transport Mapping via Input Convex Neural Networks , author=. ICML , year=

work page
[54]

arXiv preprint arXiv:1909.13082 , year=

Wasserstein-2 Generative Networks , author=. arXiv preprint arXiv:1909.13082 , year=

work page arXiv 1909
[55]

SIAM Review , year=

Semi-dual Regularized Optimal Transport , author=. SIAM Review , year=

work page
[56]

arXiv , year=

A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers , author=. arXiv , year=

work page
[57]

arXiv:2112.07275 , year=

Parameter Tuning and Model Selection in Optimal Transport with Semi-dual Brenier Formulation , author=. arXiv:2112.07275 , year=

work page arXiv
[58]

arXiv:2602.03566 , year=

Riemannian Neural Optimal Transport , author=. arXiv:2602.03566 , year=

work page arXiv
[59]

arXiv , year=

Three-Player Wasserstein GAN via Amortised Duality , author=. arXiv , year=

work page
[60]

arXiv , year=

Wasserstein GAN with Quadratic Transport Cost , author=. arXiv , year=

work page
[61]

arXiv preprint arXiv:2106.03812 , year =

Neural Monge Map Estimation and Its Applications , author =. arXiv preprint arXiv:2106.03812 , year =

work page arXiv
[62]

ICLR , year =

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport , author =. ICLR , year =

work page
[63]

& Jalali, A

2-wasserstein approximation via restricted convex potentials with application to improved training for gans , author =. arXiv preprint:1902.07197 , year =

work page arXiv 1902
[64]

, title =

Ambrosio, L. , title =. Optimal Transportation and Applications , series =. 2003 , pages =

work page 2003
[65]

NerISP , year =

Overcoming Spurious Solutions in Semi-Dual Neural Optimal Transport: A Smoothing Approach for Learning the Optimal Transport Plan , author =. NerISP , year =

work page
[66]

Proceedings of the Edinburgh Mathematical Society , author=

On. Proceedings of the Edinburgh Mathematical Society , author=. 2011 , pages=. doi:10.1017/S001309150800117X , number=

work page doi:10.1017/s001309150800117x 2011
[67]

The. Bull. Amer. Math. Soc. , author=. 2014 , pages=. doi:https://doi.org/10.1090/S0273-0979-2014-01459-4 , number=

work page doi:10.1090/s0273-0979-2014-01459-4 2014
[68]

arXiv preprint arXiv:1902.03642 , year=

(q,p)-Wasserstein GANs: Comparing Ground Metrics for Wasserstein GANs , author=. arXiv preprint arXiv:1902.03642 , year=

work page arXiv 1902
[69]

Acta Math

The geometry of optimal transportation , volume=. Acta Math. , author=. 1996 , pages=

work page 1996
[70]

, title =

Vershik, Anatoly M. , title =. The Mathematical Intelligencer , volume =. 2013 , doi =

work page 2013
[71]

Annals of Mathematics , volume =

von Neumann, John , title =. Annals of Mathematics , volume =

work page
[72]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel and Hubert Ramsauer and Thomas Unterthiner and Bernhard Nessler and G. GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium , journal =. 2017 , url =. 1706.08500 , timestamp =

work page Pith review arXiv 2017
[73]

Minimax estimation of smooth optimal transport maps , journal =

H\". Minimax estimation of smooth optimal transport maps , journal =

work page
[74]

and Kolesnikov, Aleksandr V

Bogachev, Vladimir I. and Kolesnikov, Aleksandr V. , title =. Russian Math. Surveys , year =

work page
[75]

, title =

Figalli, Alessio and Kim, Young-Heon and McCann, Robert J. , title =. Archive for Rational Mechanics and Analysis , year =

work page
[76]

, title =

Evans, Lawrence C. , title =. Current developments in mathematics. Papers from the conference held in Cambridge, MA, USA, 1997. , year =

work page 1997
[77]

arXiv preprint arXiv:2201.12220 , year=

Neural optimal transport , author=. arXiv preprint arXiv:2201.12220 , year=

work page arXiv
[78]

Transactions on Machine Learning Research , pages=

Neural monge map estimation and its applications , author=. Transactions on Machine Learning Research , pages=

work page
[79]

arXiv preprint arXiv:2502.01310 , year=

A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers , author=. arXiv preprint arXiv:2502.01310 , year=

work page arXiv