Control, Optimal Transport and Neural Differential Equations in Supervised Learning

Minh-Binh Tran; Minh-Nhat Phung

arxiv: 2503.15105 · v4 · pith:VMNOQKRRnew · submitted 2025-03-19 · 🧮 math.NA · cs.LG· cs.NA· math.OC

Control, Optimal Transport and Neural Differential Equations in Supervised Learning

Minh-Nhat Phung , Minh-Binh Tran This is my paper

Pith reviewed 2026-05-22 23:47 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAmath.OC

keywords unbalanced optimal transportneural differential equationsSinkhorn algorithmcontinuum limittransport dynamicsPearson divergenceconvergence estimates

0 comments

The pith

Neural differential equations are built whose flows converge to the true unbalanced optimal transport dynamics in the continuum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that starts from a discrete unbalanced optimal transport problem using Pearson divergence and generalizes it to the continuum setting. A numerical scheme modeled on the Sinkhorn algorithm solves the resulting minimization problem, with a proof of convergence and explicit error bounds. Numerical solutions supply vector fields that define a transport equation, from which a neural differential equation is constructed so that its flow approaches the true UOT dynamics under a suitable limiting regime. This construction supplies a rigorous bridge between discrete optimal transport computations and continuous neural models used in machine learning and control.

Core claim

We develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics. We design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural ODE whose

What carries the argument

The Sinkhorn-inspired numerical scheme that produces discrete solutions from which vector fields are extracted to define both the transport equation and the limiting neural differential equation.

If this is right

Explicit error estimates from the Sinkhorn scheme give quantitative control on how well the neural ODE approximates the transport.
The derived transport equation supplies a continuous dynamical model that can be inserted into supervised learning pipelines.
The limiting convergence justifies using the neural ODE as a practical surrogate for solving continuum UOT problems.
The same construction extends the classical Sinkhorn method from discrete to continuous unbalanced settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework may allow Neural ODEs to replace separate optimal transport solvers inside end-to-end training loops.
Because the method produces explicit vector fields, it could be combined with control-theoretic objectives that act on the same dynamics.
Numerical checks on low-dimensional examples would directly test whether the proven convergence rate appears in practice.

Load-bearing premise

The vector fields obtained by generalizing the discrete Pearson-divergence UOT problem to the continuum actually generate a neural ODE flow that converges to the true transport dynamics.

What would settle it

Compute the trajectory distance between the flow of the constructed neural ODE and the known solution of the continuous UOT problem on a simple test density; check whether this distance tends to zero under the stated limiting regime.

read the original abstract

We study the fundamental computational problem of approximating optimal transport (OT) equations using neural differential equations (Neural ODEs). More specifically, we develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics, thereby advancing the mathematical foundations of computational transport and machine learning. To this end, we design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural differential equation whose flow converges to the true transport dynamics in an appropriate limiting regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a new framework turning a discrete Pearson UOT problem into continuum Neural ODE dynamics via a Sinkhorn scheme with proofs, but the key generalization step looks thin on justification.

read the letter

The main thing to know is that the authors start from a discrete unbalanced optimal transport problem using Pearson divergence, run a Sinkhorn-style iterative scheme on it, prove convergence with explicit error bounds, then extract vector fields from the numerical output to define a transport equation and finally a Neural ODE whose flow is asserted to recover the true continuum UOT dynamics in a limit. That pipeline is the claimed novelty. The discrete analysis appears to be standard numerical analysis work done carefully, and supplying error estimates is a plus that gives the scheme some concrete value. The construction also tries to make the link to Neural ODEs explicit rather than hand-wavy, which is better than many papers that just invoke the name. The soft spot sits exactly where the stress-test note flags it. The abstract says the vector fields are “constructively designed” to converge to the true dynamics, yet the discrete convergence result does not by itself control the discretization error when passing to the continuum or verify that the resulting fields satisfy the continuous first-order optimality conditions. Without seeing the full derivations it is unclear whether they close that gap with additional estimates or simply assume the limit behaves well. That step carries the load, so if the details are missing or informal the central claim weakens. The paper is aimed at people working at the overlap of optimal transport numerics, Neural ODEs, and supervised learning models that use transport costs. A reader already familiar with Sinkhorn and Neural ODE literature will see the intended contribution quickly. It shows honest engagement with the tools it builds on and does not appear circular or self-referential. I would send it to peer review so that experts can check whether the continuum limit argument is actually carried through with the necessary controls.

Referee Report

1 major / 2 minor

Summary. The manuscript develops a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. It generalizes a discrete UOT problem with Pearson divergence to constructively design vector fields for Neural ODEs claimed to converge to the true UOT dynamics. A Sinkhorn-inspired numerical scheme is designed to solve the minimization problem, with rigorous convergence proofs and explicit error estimates provided. From the numerical solutions, vector fields and the corresponding transport equation are derived, and a Neural ODE is constructed whose flow converges to the true transport dynamics in an appropriate limiting regime.

Significance. If the generalization and convergence results hold, the work would strengthen the mathematical foundations linking discrete optimal transport algorithms to continuous Neural ODE models, with the rigorous convergence proofs and explicit error estimates for the discrete scheme constituting a clear strength. This could have implications for computational transport problems in machine learning.

major comments (1)

[Abstract (generalization from discrete to continuum)] The load-bearing generalization step from the discrete Pearson UOT problem (for which the Sinkhorn-style scheme has convergence and error bounds) to continuum vector fields is not justified in a manner that controls discretization error or verifies that the derived vector fields satisfy the continuum optimality conditions (such as first-order conditions involving the c-transform or unbalanced marginal constraints). The discrete analysis does not automatically transfer to the claimed convergence of the Neural ODE flow to true UOT dynamics.

minor comments (2)

[Abstract] The abstract is information-dense; separating the discrete scheme, continuum generalization, and Neural ODE construction into distinct sentences would improve readability.
Notation for the Pearson divergence and the limiting regime should be introduced with explicit definitions early in the text to aid readers unfamiliar with the specific UOT variant.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on the manuscript. We address the major comment below.

read point-by-point responses

Referee: [Abstract (generalization from discrete to continuum)] The load-bearing generalization step from the discrete Pearson UOT problem (for which the Sinkhorn-style scheme has convergence and error bounds) to continuum vector fields is not justified in a manner that controls discretization error or verifies that the derived vector fields satisfy the continuum optimality conditions (such as first-order conditions involving the c-transform or unbalanced marginal constraints). The discrete analysis does not automatically transfer to the claimed convergence of the Neural ODE flow to true UOT dynamics.

Authors: We agree that the passage from the discrete Pearson UOT problem to the continuum vector fields and Neural ODE flow requires explicit justification, including discretization error control and verification that the derived fields satisfy continuum optimality conditions. The manuscript constructs the vector fields from the discrete dual potentials obtained via the Sinkhorn scheme on successively refined grids, then defines the transport equation and Neural ODE to match the interpolated field, with convergence claimed in the joint limit of grid size to zero and Sinkhorn iterations to infinity. However, the current text does not provide a detailed derivation showing how the discrete first-order conditions (via the Pearson divergence) pass to the continuum c-transform conditions or unbalanced marginal constraints, nor does it supply explicit bounds on the discretization error. We will revise the manuscript by adding a dedicated subsection that derives the continuum optimality conditions from the discrete ones and establishes the necessary error estimates under suitable regularity assumptions on the data. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation chain is self-contained

full rationale

The paper first solves a discrete UOT minimization problem via a Sinkhorn-inspired scheme, proves its convergence with explicit error estimates, then generalizes the obtained solutions to construct continuum vector fields, a transport equation, and finally a Neural ODE whose flow is stated to converge to the true dynamics in a limiting regime. None of these steps reduce the claimed continuum convergence result to the discrete inputs by construction, self-definition, or self-citation chain. The discrete proof stands independently, and the generalization step is presented as a constructive design rather than a tautological renaming or fitted-parameter prediction. The derivation therefore qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, invented entities, or non-standard axioms are mentioned. Standard mathematical assumptions for ODE flows and numerical convergence are implicitly used.

axioms (1)

standard math Standard assumptions on existence, uniqueness, and convergence for solutions of neural differential equations and numerical schemes for minimization problems.
Invoked to support the claimed convergence of the vector fields and the numerical scheme to true UOT dynamics.

pith-pipeline@v0.9.0 · 5685 in / 1336 out tokens · 38372 ms · 2026-05-22T23:47:41.939157+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We fully generalize the UOT problem considered in [59] … to the continuum case … replace KL divergence with Pearson divergence … dC(f,g) := inf … +½F(γx|fL)+½F(γy|gL)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Numerical Sinkhorn-type Algorithm … ∥k∗j−k¯∗j∥L2≲r^{L/2}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 1 internal anchor

[1]

Agrachev and A

A. Agrachev and A. Sarychev. Control on the manifolds of m appings with a view to the deep learning. Journal of Dynamical and Control Systems , 28(4):989–1008, 2022

work page 2022
[2]

Clustering in pure-attention hardmax transformers and its role in sentiment analysis

A. Alcalde, G. Fantuzzi, and E. Zuazua. Clustering in pur e-attention hardmax transformers and its role in sentiment analysis. arXiv preprint arXiv:2407.01602 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Altschuler, F

J. Altschuler, F. Bach, A. Rudi, and J. Niles-Weed. Massi vely scalable sinkhorn distances via the nystr¨ om method. Advances in neural information processing systems , 32, 2019

work page 2019
[4]

Altschuler, J

J. Altschuler, J. Niles-Weed, and P. Rigollet. Near-lin ear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems , 30, 2017

work page 2017
[5]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gen erative adversarial networks. International conference on machine learning , 2017

work page 2017
[6]

Baghel and S

R. Baghel and S. Mondal. Inequality restricted minimum d ensity power divergence estimation for panel count data. arXiv preprint arXiv:2503.21534 , 2024

work page arXiv 2024
[7]

Balaji, R

Y. Balaji, R. Chellappa, and S. Feizi. Robust optimal tra nsport with applications in generative modeling. arXiv preprint, 2020. 42 M.-N. PHUNG AND M.-B. TRAN

work page 2020
[8]

Benamou, B

J.-D. Benamou, B. D. Froese, and A. M. Oberman. Two numeri cal methods for the elliptic Monge-Amp` ere equation. ESAIM: Mathematical Modelling and Numerical Analysis , 44(4):737–758, 2010

work page 2010
[9]

Benamou, B

J.-D. Benamou, B. D Froese, and A. M. Oberman. Numerical s olution of the optimal transportation problem using the monge–amp` ere equation. Journal of Computational Physics , 260:107–126, 2014

work page 2014
[10]

Benning, E

M. Benning, E. Celledoni, Ma. J. Ehrhardt, B. Owren, and C.-B. Sch¨ onlieb. Deep learning as optimal control problems: Models and numerical methods. arXiv preprint arXiv:1904.05657 , 2019

work page arXiv 1904
[11]

R. J. Berman. The sinkhorn algorithm, parabolic optima l transport and geometric monge–amp` ere equations. Numerische Mathematik , 145(4):771–836, 2020

work page 2020
[12]

Blondel, V

M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse opt imal transport. Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics (AIST ATS), 84:880–889, 2018. PMLR

work page 2018
[13]

Y. Brenier. Polar factorization and monotone rearrang ement of vector-valued functions. Communications on Pure and Applied Mathematics , 44(4):375–417, 1991

work page 1991
[14]

Brenner, L.-Y

S. Brenner, L.-Y. Sung, Z. Tan, and H. Zhang. A nonlinear least-squares convexity enforcing co interior penalty method for the monge–amp` ere equation on strictly convex sm ooth planar domains. Communications of the American Mathematical Society , 4(14):607–640, 2024

work page 2024
[15]

Bui-Thanh

T. Bui-Thanh. A uniﬁed and constructive framework for t he universality of neural networks. IMA Journal of Applied Mathematics, 89(1):197–230, 2024

work page 2024
[16]

L. A. Caﬀarelli and R. J. McCann. Free boundaries in opti mal transport and monge-amp` ere obstacle problems. Annals of Mathematics , 171(2):673–730, 2010

work page 2010
[17]

Chapel, R

L. Chapel, R. Flamary, H. Wu, C. F´ evotte, and G. Gasso. U nbalanced optimal transport through non-negative penalized linear regression. Advances in Neural Information Processing Systems , 34:23270–23282, 2021

work page 2021
[18]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duve naud. Neural ordinary diﬀerential equations. Advances in neural information processing systems , 31, 2018

work page 2018
[19]

Chizat, G

L. Chizat, G. Peyr´ e, B. Schmitzer, and F.-X. Vialard. S caling algorithms for unbalanced optimal transport problems. Mathematics of computation , 87(314):2563–2609, 2018

work page 2018
[20]

Chui and Xin Li

Charles K. Chui and Xin Li. Approximation by ridge funct ions and neural networks with one hidden layer. J. Approx. Theory, 70(2):131–141, August 1992

work page 1992
[21]

Csisz´ ar and P

I. Csisz´ ar and P. C. Shields. Information theory and st atistics: A tutorial. Foundations and Trends in Commu- nications and Information Theory , 1(4):417–528, 2004

work page 2004
[22]

M. Cuturi. Sinkhorn distances: Lightspeed computatio n of optimal transport. Advances in neural information processing systems, 26, 2013

work page 2013
[23]

Topics in Optimal Transportation , volume 58

C.Villani. Topics in Optimal Transportation , volume 58. Graduate Studies in Mathematics, 2003

work page 2003
[24]

Optimal Transport: Old and New

C.Villani. Optimal Transport: Old and New . Springer Berlin, Heidelberg, 2008

work page 2008
[25]

Inequ alities for generalized entropy and optimal transporta- tion

D.Cordero-Erausquin, W.Gangbo, and C.Houdr´ e. Inequ alities for generalized entropy and optimal transporta- tion. Contemp. Math. , 353, 05 2003

work page 2003
[26]

De Philippis and A

G. De Philippis and A. Figalli. Second order stability f or the monge–amp` ere equation and strong sobolev con- vergence of optimal transport maps. Analysis and PDE , 6:993–1000, August 2013

work page 2013
[27]

De Philippis and A

G. De Philippis and A. Figalli. W 2, 1 regularity for solutions of monge-amp` ere equation. Inventiones Mathemat- icae, 192:55–60, April 2013

work page 2013
[28]

DiPerna and P.L

R.J. DiPerna and P.L. Lions. Ordinary diﬀerential equa tions, transport theory and sobolev spaces. Inventiones Mathematicae, 98:511–547, October 1989

work page 1989
[29]

Dolbeault, B

J. Dolbeault, B. Nazaret, and G. Savar´ e. A new class of t ransport distances between measures. Calculus of Variations and Partial Diﬀerential Equations , 34(2):193–231, 2009

work page 2009
[30]

Neural ode control for clas siﬁcation, approximation, and transport

D.Ruiz-Balet and E.Zuazua. Neural ode control for clas siﬁcation, approximation, and transport. SIAM Review , 65(3):735–773, 2023

work page 2023
[31]

Control of neural transpor t for normalising ﬂows

D.Ruiz-Balet and E.Zuazua. Control of neural transpor t for normalising ﬂows. Journal de Math´ ematiques Pures et Appliqu´ ees, 181:58–90, 2024

work page 2024
[32]

Duchi and H

J. Duchi and H. Namkoong. Learning models with uniform p erformance via distributionally robust optimization. arXiv preprint arXiv:1810.08750 , 2018

work page arXiv 2018
[33]

Elamvazhuthi, B

K. Elamvazhuthi, B. Gharesifard, A. L. Bertozzi, and S. Osher. Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters , 6:3152–3157, 2022

work page 2022
[34]

Fatras, T

K. Fatras, T. S´ ejourn´ e, R. Flamary, and N. Courty. Unb alanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning , pages 3186–3197. PMLR, 2021

work page 2021
[35]

Finlay, J.-H

C. Finlay, J.-H. Jacobsen, L. Nurbekyan, and A. Oberman . How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning , pages 3154–3164. PMLR, 2020

work page 2020
[36]

The geometry of dissipative evolution equatio ns: The porous medium equation

F.Otto. The geometry of dissipative evolution equatio ns: The porous medium equation. Communications in Partial Diﬀerential Equations , 26(1-2):101–174, 2001

work page 2001
[37]

Frogner, C

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggi o. Learning with a wassenstein loss. Advances in Neural Information Processing Systems , 2015

work page 2015
[38]

Fukunaga and H

T. Fukunaga and H. Kasai. Block-coordinate frank-wolf e algorithm and convergence analysis for semi-relaxed optimal transport problem. In ICASSP 2022-2022 IEEE International Conference on Acousti cs, Speech and Signal Processing (ICASSP) , pages 5433–5437. IEEE, 2022. CONTROLS, OPTIMAL TRANSPORT, NEURAL NETWORKS 43

work page 2022
[39]

Real Analysis: Modern Techniques and Their Application, 2n d Edition

G.B.Folland. Real Analysis: Modern Techniques and Their Application, 2n d Edition . John Wiley & Sons, 1999

work page 1999
[40]

Genevay, M

A. Genevay, M. Cuturi, G. Peyr´ e, and F. Bach. Stochasti c optimization for large-scale optimal transport. Ad- vances in neural information processing systems , 29, 2016

work page 2016
[41]

Gordaliza, E

P. Gordaliza, E. Del Barrio, G. Fabrice, and J.-M. Loube s. Obtaining fairness using optimal transport theory. International Conference on Machine Learning , 2019

work page 2019
[42]

Guminov, P

S. Guminov, P. Dvurechensky, N. Tupitsa, and A. Gasniko v. On a combination of alternating minimization and nesterov’s momentum. In International conference on machine learning , pages 3886–3898. PMLR, 2021

work page 2021
[43]

Haber and L

E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse problems , 34(1):014004, 2017

work page 2017
[44]

T´ emam I

R. T´ emam I. Ekeland.Convex analysis and variational problems . Society for Industrial and Applied Mathematics, 1999

work page 1999
[45]

Jabir, D

J.-F. Jabir, D. Siska, and L. Szpruch. Mean-ﬁeld neural odes via relaxed optimal control. arXiv preprint arXiv:1912.05475, 2019

work page arXiv 1912
[46]

Kobyzev, S

I. Kobyzev, S. J. D. Prince, and M. A. Brubaker. Normaliz ing ﬂows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intellig ence, 43(11):3964–3979, 2020

work page 2020
[47]

Boundary regularity of maps with conve x potentials–ii

L.A.Caﬀarelli. Boundary regularity of maps with conve x potentials–ii. Annals of Mathematics , 144(3):453–496, 1996

work page 1996
[48]

Lan and Y

G. Lan and Y. Zhou. Random gradient extrapolation for di stributed and stochastic optimization. SIAM Journal on Optimization , 28(4):2753–2782, 2018

work page 2018
[49]

T. Le, Y. Yamada, and T. Q. Nguyen. Robustness in optimal transport: Beyond plug-and-play. arXiv preprint , 2021

work page 2021
[50]

J.-D. Lee, C. Lim, and S. J. Wright. On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration. Journal of Mathematical Imaging and Vision , 61(2):236–250, 2019

work page 2019
[51]

Q. Li, L. Chen, and C. Tai. Maximum principle based algor ithms for deep learning. Journal of Machine Learning Research, 18(165):1–29, 2018

work page 2018
[52]

Liero, A

M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger–kantorovich distance and geodesic curves. SIAM Journal on Mathematical Analysis , 48(4):2869–2911, 2016

work page 2016
[53]

Liero, A

M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger-kantorovich distance and the evolution of distributions. Archive for Rational Mechanics and Analysis , 225(1):417–465, 2017

work page 2017
[54]

On the translocation of masses

L.V.Kantorovich. On the translocation of masses. J.Math.Sci., 133:1381–1382, 2006

work page 2006
[55]

Maniglia

S. Maniglia. Probabilistic representation and unique ness results for measure-valued solutions of transport equ a- tions. Journal de Math´ ematiques Pures et Appliqu´ ees, 87(6):601–626, 2007

work page 2007
[56]

Seq uential monte carlo for inclusive kl minimization in amortized variational inference

Declan McNamara, Jackson Loper, and Jeﬀrey Regier. Seq uential monte carlo for inclusive kl minimization in amortized variational inference. In Proceedings of the 41st International Conference on Machin e Learning, 2024

work page 2024
[57]

H. N. Mhaskar. On the degree of approximation in multiva riate weighted approximation. In Martin D. Buhmann and Detlef H. Mache, editors, Advanced Problems in Constructive Approximation , pages 129–141, Basel, 2003. Birkh¨ auser Basel

work page 2003
[58]

Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures

G.Savar´ e M.Liero, A.Mielke. Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures. Invent. math. , 211:969–1117, 03 2018

work page 2018
[59]

Q. M. Nguyen, H. H. Nguyen, Y. Zhou, and L. M. Nguyen. On un balanced optimal transport: Gradient methods, sparsity and approximation error. J. Mach. Learn. Res. , 24:384:1–384:41, 2022

work page 2022
[60]

Nowozin, B

S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training ge nerative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NeurIP S), pages 271–279, 2016

work page 2016
[61]

C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity . Courier Corpora- tion, 1998

work page 1998
[62]

S. E. Reed and R. J. Marks II. On the eﬀectiveness of the pe arson chi-square test for neural network optimization. In Proceedings of the IEEE-INNS-ENNS International Joint Con ference on Neural Networks , volume 6, pages 4025–4029. IEEE, 1999

work page 1999
[63]

Existence and uniqueness of monotone meas ure-preserving maps

R.J.McCann. Existence and uniqueness of monotone meas ure-preserving maps. Duke Math. J. , 80-2:309–323, 11 1995

work page 1995
[64]

M. E. Sander, P. Ablin, M. Blondel, and G. Peyr´ e. Moment um residual neural networks. In International Conference on Machine Learning , pages 9276–9287. PMLR, 2021

work page 2021
[65]

Scetbon, M

M. Scetbon, M. Cuturi, and G. Peyr´ e. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pages 9344–9354. PMLR, 2021

work page 2021
[66]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subrama nian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, L. Lee, J. Chen, J. Brumbaugh, P. Rigollet, K. L. Hoehn, O. Roz enblatt-Rosen, A. Regev, and E. S. Lander. Optimal-transport analysis of single-cell gene expressio n data. Nature, 566(7744):380–385, 2019

work page 2019
[67]

Schmitzer

B. Schmitzer. Stabilized sparse scaling algorithms fo r entropy regularized transport problems. SIAM Journal on Scientiﬁc Computing , 41(3):A1443–A1481, 2019

work page 2019
[68]

S´ ejourn´ e, F.-X

T. S´ ejourn´ e, F.-X. Vialard, and G. Peyr´ e. Faster unbalanced optimal transport: Translation invariant sinkhor n and 1-d frank-wolfe. In International Conference on Artiﬁcial Intelligence and St atistics, pages 4995–5021. PMLR, 2022. 44 M.-N. PHUNG AND M.-B. TRAN

work page 2022
[69]

T. Si, Y. Wang, L. Zhang, E. Richmond, T.-H. Ahn, and H. Go ng. Multivariate time series change-point detection with a novel pearson-like scaled bregman divergence. Stats, 7(2):462–480, 2024

work page 2024
[70]

S. Simon. Minimax and Mononicity . Springer Berlin, Heidelberg, 1998

work page 1998
[71]

Sinkhorn

R. Sinkhorn. Diagonal equivalence to matrices with pre scribed row and column sums. ii. Proceedings of the American Mathematical Society , 45(2):195–198, 1974

work page 1974
[72]

Su and H

X. Su and H. Kasai. Accelerating unbalanced optimal tra nsport problem using dynamic penalty updating. In 2024 International Joint Conference on Neural Networks (IJ CNN), pages 1–6. IEEE, 2024

work page 2024
[73]

Sugiyama, T

M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning . Cambridge University Press, 2012

work page 2012
[74]

Tabuada and B

P. Tabuada and B. Gharesifard. Universal approximatio n power of deep residual neural networks via nonlinear control theory. arXiv preprint arXiv:2007.06007 , 2020

work page arXiv 2007
[75]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf . Wasserstein auto-encoders. International Conference on Learning Representations, 2018

work page 2018
[76]

B. Wang, Z. Shi, and S. Osher. Resnets ensemble via the fe ynman-kac formalism to improve natural and robust accuracies. Advances in Neural Information Processing Systems , 32, 2019

work page 2019
[77]

E. Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Sta- tistics, 5(1):1–11, 2017

work page 2017
[78]

K. D. Yang and C. Uhler. Scalable unbalanced optimal tra nsport using generative adversarial networks. Inter- national Conference on Learning Representations (ICLR) , 2019. OpenReview.net

work page 2019
[79]

Zimmermann, C

H. Zimmermann, C. A. Naesseth, and J.-W. van de Meent. Va riational inference with sequential sample-average approximations. In Advances in Neural Information Processing Systems , 2024

work page 2024
[80]

E. Zuazua. Progress and future directions in machine le arning through control theory. In FGS 2024 French- German-Spanish Conference on Optimization , 2024

work page 2024

Showing first 80 references.

[1] [1]

Agrachev and A

A. Agrachev and A. Sarychev. Control on the manifolds of m appings with a view to the deep learning. Journal of Dynamical and Control Systems , 28(4):989–1008, 2022

work page 2022

[2] [2]

Clustering in pure-attention hardmax transformers and its role in sentiment analysis

A. Alcalde, G. Fantuzzi, and E. Zuazua. Clustering in pur e-attention hardmax transformers and its role in sentiment analysis. arXiv preprint arXiv:2407.01602 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Altschuler, F

J. Altschuler, F. Bach, A. Rudi, and J. Niles-Weed. Massi vely scalable sinkhorn distances via the nystr¨ om method. Advances in neural information processing systems , 32, 2019

work page 2019

[4] [4]

Altschuler, J

J. Altschuler, J. Niles-Weed, and P. Rigollet. Near-lin ear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems , 30, 2017

work page 2017

[5] [5]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gen erative adversarial networks. International conference on machine learning , 2017

work page 2017

[6] [6]

Baghel and S

R. Baghel and S. Mondal. Inequality restricted minimum d ensity power divergence estimation for panel count data. arXiv preprint arXiv:2503.21534 , 2024

work page arXiv 2024

[7] [7]

Balaji, R

Y. Balaji, R. Chellappa, and S. Feizi. Robust optimal tra nsport with applications in generative modeling. arXiv preprint, 2020. 42 M.-N. PHUNG AND M.-B. TRAN

work page 2020

[8] [8]

Benamou, B

J.-D. Benamou, B. D. Froese, and A. M. Oberman. Two numeri cal methods for the elliptic Monge-Amp` ere equation. ESAIM: Mathematical Modelling and Numerical Analysis , 44(4):737–758, 2010

work page 2010

[9] [9]

Benamou, B

J.-D. Benamou, B. D Froese, and A. M. Oberman. Numerical s olution of the optimal transportation problem using the monge–amp` ere equation. Journal of Computational Physics , 260:107–126, 2014

work page 2014

[10] [10]

Benning, E

M. Benning, E. Celledoni, Ma. J. Ehrhardt, B. Owren, and C.-B. Sch¨ onlieb. Deep learning as optimal control problems: Models and numerical methods. arXiv preprint arXiv:1904.05657 , 2019

work page arXiv 1904

[11] [11]

R. J. Berman. The sinkhorn algorithm, parabolic optima l transport and geometric monge–amp` ere equations. Numerische Mathematik , 145(4):771–836, 2020

work page 2020

[12] [12]

Blondel, V

M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse opt imal transport. Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics (AIST ATS), 84:880–889, 2018. PMLR

work page 2018

[13] [13]

Y. Brenier. Polar factorization and monotone rearrang ement of vector-valued functions. Communications on Pure and Applied Mathematics , 44(4):375–417, 1991

work page 1991

[14] [14]

Brenner, L.-Y

S. Brenner, L.-Y. Sung, Z. Tan, and H. Zhang. A nonlinear least-squares convexity enforcing co interior penalty method for the monge–amp` ere equation on strictly convex sm ooth planar domains. Communications of the American Mathematical Society , 4(14):607–640, 2024

work page 2024

[15] [15]

Bui-Thanh

T. Bui-Thanh. A uniﬁed and constructive framework for t he universality of neural networks. IMA Journal of Applied Mathematics, 89(1):197–230, 2024

work page 2024

[16] [16]

L. A. Caﬀarelli and R. J. McCann. Free boundaries in opti mal transport and monge-amp` ere obstacle problems. Annals of Mathematics , 171(2):673–730, 2010

work page 2010

[17] [17]

Chapel, R

L. Chapel, R. Flamary, H. Wu, C. F´ evotte, and G. Gasso. U nbalanced optimal transport through non-negative penalized linear regression. Advances in Neural Information Processing Systems , 34:23270–23282, 2021

work page 2021

[18] [18]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duve naud. Neural ordinary diﬀerential equations. Advances in neural information processing systems , 31, 2018

work page 2018

[19] [19]

Chizat, G

L. Chizat, G. Peyr´ e, B. Schmitzer, and F.-X. Vialard. S caling algorithms for unbalanced optimal transport problems. Mathematics of computation , 87(314):2563–2609, 2018

work page 2018

[20] [20]

Chui and Xin Li

Charles K. Chui and Xin Li. Approximation by ridge funct ions and neural networks with one hidden layer. J. Approx. Theory, 70(2):131–141, August 1992

work page 1992

[21] [21]

Csisz´ ar and P

I. Csisz´ ar and P. C. Shields. Information theory and st atistics: A tutorial. Foundations and Trends in Commu- nications and Information Theory , 1(4):417–528, 2004

work page 2004

[22] [22]

M. Cuturi. Sinkhorn distances: Lightspeed computatio n of optimal transport. Advances in neural information processing systems, 26, 2013

work page 2013

[23] [23]

Topics in Optimal Transportation , volume 58

C.Villani. Topics in Optimal Transportation , volume 58. Graduate Studies in Mathematics, 2003

work page 2003

[24] [24]

Optimal Transport: Old and New

C.Villani. Optimal Transport: Old and New . Springer Berlin, Heidelberg, 2008

work page 2008

[25] [25]

Inequ alities for generalized entropy and optimal transporta- tion

D.Cordero-Erausquin, W.Gangbo, and C.Houdr´ e. Inequ alities for generalized entropy and optimal transporta- tion. Contemp. Math. , 353, 05 2003

work page 2003

[26] [26]

De Philippis and A

G. De Philippis and A. Figalli. Second order stability f or the monge–amp` ere equation and strong sobolev con- vergence of optimal transport maps. Analysis and PDE , 6:993–1000, August 2013

work page 2013

[27] [27]

De Philippis and A

G. De Philippis and A. Figalli. W 2, 1 regularity for solutions of monge-amp` ere equation. Inventiones Mathemat- icae, 192:55–60, April 2013

work page 2013

[28] [28]

DiPerna and P.L

R.J. DiPerna and P.L. Lions. Ordinary diﬀerential equa tions, transport theory and sobolev spaces. Inventiones Mathematicae, 98:511–547, October 1989

work page 1989

[29] [29]

Dolbeault, B

J. Dolbeault, B. Nazaret, and G. Savar´ e. A new class of t ransport distances between measures. Calculus of Variations and Partial Diﬀerential Equations , 34(2):193–231, 2009

work page 2009

[30] [30]

Neural ode control for clas siﬁcation, approximation, and transport

D.Ruiz-Balet and E.Zuazua. Neural ode control for clas siﬁcation, approximation, and transport. SIAM Review , 65(3):735–773, 2023

work page 2023

[31] [31]

Control of neural transpor t for normalising ﬂows

D.Ruiz-Balet and E.Zuazua. Control of neural transpor t for normalising ﬂows. Journal de Math´ ematiques Pures et Appliqu´ ees, 181:58–90, 2024

work page 2024

[32] [32]

Duchi and H

J. Duchi and H. Namkoong. Learning models with uniform p erformance via distributionally robust optimization. arXiv preprint arXiv:1810.08750 , 2018

work page arXiv 2018

[33] [33]

Elamvazhuthi, B

K. Elamvazhuthi, B. Gharesifard, A. L. Bertozzi, and S. Osher. Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters , 6:3152–3157, 2022

work page 2022

[34] [34]

Fatras, T

K. Fatras, T. S´ ejourn´ e, R. Flamary, and N. Courty. Unb alanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning , pages 3186–3197. PMLR, 2021

work page 2021

[35] [35]

Finlay, J.-H

C. Finlay, J.-H. Jacobsen, L. Nurbekyan, and A. Oberman . How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning , pages 3154–3164. PMLR, 2020

work page 2020

[36] [36]

The geometry of dissipative evolution equatio ns: The porous medium equation

F.Otto. The geometry of dissipative evolution equatio ns: The porous medium equation. Communications in Partial Diﬀerential Equations , 26(1-2):101–174, 2001

work page 2001

[37] [37]

Frogner, C

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggi o. Learning with a wassenstein loss. Advances in Neural Information Processing Systems , 2015

work page 2015

[38] [38]

Fukunaga and H

T. Fukunaga and H. Kasai. Block-coordinate frank-wolf e algorithm and convergence analysis for semi-relaxed optimal transport problem. In ICASSP 2022-2022 IEEE International Conference on Acousti cs, Speech and Signal Processing (ICASSP) , pages 5433–5437. IEEE, 2022. CONTROLS, OPTIMAL TRANSPORT, NEURAL NETWORKS 43

work page 2022

[39] [39]

Real Analysis: Modern Techniques and Their Application, 2n d Edition

G.B.Folland. Real Analysis: Modern Techniques and Their Application, 2n d Edition . John Wiley & Sons, 1999

work page 1999

[40] [40]

Genevay, M

A. Genevay, M. Cuturi, G. Peyr´ e, and F. Bach. Stochasti c optimization for large-scale optimal transport. Ad- vances in neural information processing systems , 29, 2016

work page 2016

[41] [41]

Gordaliza, E

P. Gordaliza, E. Del Barrio, G. Fabrice, and J.-M. Loube s. Obtaining fairness using optimal transport theory. International Conference on Machine Learning , 2019

work page 2019

[42] [42]

Guminov, P

S. Guminov, P. Dvurechensky, N. Tupitsa, and A. Gasniko v. On a combination of alternating minimization and nesterov’s momentum. In International conference on machine learning , pages 3886–3898. PMLR, 2021

work page 2021

[43] [43]

Haber and L

E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse problems , 34(1):014004, 2017

work page 2017

[44] [44]

T´ emam I

R. T´ emam I. Ekeland.Convex analysis and variational problems . Society for Industrial and Applied Mathematics, 1999

work page 1999

[45] [45]

Jabir, D

J.-F. Jabir, D. Siska, and L. Szpruch. Mean-ﬁeld neural odes via relaxed optimal control. arXiv preprint arXiv:1912.05475, 2019

work page arXiv 1912

[46] [46]

Kobyzev, S

I. Kobyzev, S. J. D. Prince, and M. A. Brubaker. Normaliz ing ﬂows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intellig ence, 43(11):3964–3979, 2020

work page 2020

[47] [47]

Boundary regularity of maps with conve x potentials–ii

L.A.Caﬀarelli. Boundary regularity of maps with conve x potentials–ii. Annals of Mathematics , 144(3):453–496, 1996

work page 1996

[48] [48]

Lan and Y

G. Lan and Y. Zhou. Random gradient extrapolation for di stributed and stochastic optimization. SIAM Journal on Optimization , 28(4):2753–2782, 2018

work page 2018

[49] [49]

T. Le, Y. Yamada, and T. Q. Nguyen. Robustness in optimal transport: Beyond plug-and-play. arXiv preprint , 2021

work page 2021

[50] [50]

J.-D. Lee, C. Lim, and S. J. Wright. On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration. Journal of Mathematical Imaging and Vision , 61(2):236–250, 2019

work page 2019

[51] [51]

Q. Li, L. Chen, and C. Tai. Maximum principle based algor ithms for deep learning. Journal of Machine Learning Research, 18(165):1–29, 2018

work page 2018

[52] [52]

Liero, A

M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger–kantorovich distance and geodesic curves. SIAM Journal on Mathematical Analysis , 48(4):2869–2911, 2016

work page 2016

[53] [53]

Liero, A

M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger-kantorovich distance and the evolution of distributions. Archive for Rational Mechanics and Analysis , 225(1):417–465, 2017

work page 2017

[54] [54]

On the translocation of masses

L.V.Kantorovich. On the translocation of masses. J.Math.Sci., 133:1381–1382, 2006

work page 2006

[55] [55]

Maniglia

S. Maniglia. Probabilistic representation and unique ness results for measure-valued solutions of transport equ a- tions. Journal de Math´ ematiques Pures et Appliqu´ ees, 87(6):601–626, 2007

work page 2007

[56] [56]

Seq uential monte carlo for inclusive kl minimization in amortized variational inference

Declan McNamara, Jackson Loper, and Jeﬀrey Regier. Seq uential monte carlo for inclusive kl minimization in amortized variational inference. In Proceedings of the 41st International Conference on Machin e Learning, 2024

work page 2024

[57] [57]

H. N. Mhaskar. On the degree of approximation in multiva riate weighted approximation. In Martin D. Buhmann and Detlef H. Mache, editors, Advanced Problems in Constructive Approximation , pages 129–141, Basel, 2003. Birkh¨ auser Basel

work page 2003

[58] [58]

Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures

G.Savar´ e M.Liero, A.Mielke. Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures. Invent. math. , 211:969–1117, 03 2018

work page 2018

[59] [59]

Q. M. Nguyen, H. H. Nguyen, Y. Zhou, and L. M. Nguyen. On un balanced optimal transport: Gradient methods, sparsity and approximation error. J. Mach. Learn. Res. , 24:384:1–384:41, 2022

work page 2022

[60] [60]

Nowozin, B

S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training ge nerative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NeurIP S), pages 271–279, 2016

work page 2016

[61] [61]

C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity . Courier Corpora- tion, 1998

work page 1998

[62] [62]

S. E. Reed and R. J. Marks II. On the eﬀectiveness of the pe arson chi-square test for neural network optimization. In Proceedings of the IEEE-INNS-ENNS International Joint Con ference on Neural Networks , volume 6, pages 4025–4029. IEEE, 1999

work page 1999

[63] [63]

Existence and uniqueness of monotone meas ure-preserving maps

R.J.McCann. Existence and uniqueness of monotone meas ure-preserving maps. Duke Math. J. , 80-2:309–323, 11 1995

work page 1995

[64] [64]

M. E. Sander, P. Ablin, M. Blondel, and G. Peyr´ e. Moment um residual neural networks. In International Conference on Machine Learning , pages 9276–9287. PMLR, 2021

work page 2021

[65] [65]

Scetbon, M

M. Scetbon, M. Cuturi, and G. Peyr´ e. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pages 9344–9354. PMLR, 2021

work page 2021

[66] [66]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subrama nian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, L. Lee, J. Chen, J. Brumbaugh, P. Rigollet, K. L. Hoehn, O. Roz enblatt-Rosen, A. Regev, and E. S. Lander. Optimal-transport analysis of single-cell gene expressio n data. Nature, 566(7744):380–385, 2019

work page 2019

[67] [67]

Schmitzer

B. Schmitzer. Stabilized sparse scaling algorithms fo r entropy regularized transport problems. SIAM Journal on Scientiﬁc Computing , 41(3):A1443–A1481, 2019

work page 2019

[68] [68]

S´ ejourn´ e, F.-X

T. S´ ejourn´ e, F.-X. Vialard, and G. Peyr´ e. Faster unbalanced optimal transport: Translation invariant sinkhor n and 1-d frank-wolfe. In International Conference on Artiﬁcial Intelligence and St atistics, pages 4995–5021. PMLR, 2022. 44 M.-N. PHUNG AND M.-B. TRAN

work page 2022

[69] [69]

T. Si, Y. Wang, L. Zhang, E. Richmond, T.-H. Ahn, and H. Go ng. Multivariate time series change-point detection with a novel pearson-like scaled bregman divergence. Stats, 7(2):462–480, 2024

work page 2024

[70] [70]

S. Simon. Minimax and Mononicity . Springer Berlin, Heidelberg, 1998

work page 1998

[71] [71]

Sinkhorn

R. Sinkhorn. Diagonal equivalence to matrices with pre scribed row and column sums. ii. Proceedings of the American Mathematical Society , 45(2):195–198, 1974

work page 1974

[72] [72]

Su and H

X. Su and H. Kasai. Accelerating unbalanced optimal tra nsport problem using dynamic penalty updating. In 2024 International Joint Conference on Neural Networks (IJ CNN), pages 1–6. IEEE, 2024

work page 2024

[73] [73]

Sugiyama, T

M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning . Cambridge University Press, 2012

work page 2012

[74] [74]

Tabuada and B

P. Tabuada and B. Gharesifard. Universal approximatio n power of deep residual neural networks via nonlinear control theory. arXiv preprint arXiv:2007.06007 , 2020

work page arXiv 2007

[75] [75]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf . Wasserstein auto-encoders. International Conference on Learning Representations, 2018

work page 2018

[76] [76]

B. Wang, Z. Shi, and S. Osher. Resnets ensemble via the fe ynman-kac formalism to improve natural and robust accuracies. Advances in Neural Information Processing Systems , 32, 2019

work page 2019

[77] [77]

E. Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Sta- tistics, 5(1):1–11, 2017

work page 2017

[78] [78]

K. D. Yang and C. Uhler. Scalable unbalanced optimal tra nsport using generative adversarial networks. Inter- national Conference on Learning Representations (ICLR) , 2019. OpenReview.net

work page 2019

[79] [79]

Zimmermann, C

H. Zimmermann, C. A. Naesseth, and J.-W. van de Meent. Va riational inference with sequential sample-average approximations. In Advances in Neural Information Processing Systems , 2024

work page 2024

[80] [80]

E. Zuazua. Progress and future directions in machine le arning through control theory. In FGS 2024 French- German-Spanish Conference on Optimization , 2024

work page 2024