pith. sign in

arxiv: 2503.15105 · v4 · pith:VMNOQKRRnew · submitted 2025-03-19 · 🧮 math.NA · cs.LG· cs.NA· math.OC

Control, Optimal Transport and Neural Differential Equations in Supervised Learning

Pith reviewed 2026-05-22 23:47 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAmath.OC
keywords unbalanced optimal transportneural differential equationsSinkhorn algorithmcontinuum limittransport dynamicsPearson divergenceconvergence estimates
0
0 comments X

The pith

Neural differential equations are built whose flows converge to the true unbalanced optimal transport dynamics in the continuum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that starts from a discrete unbalanced optimal transport problem using Pearson divergence and generalizes it to the continuum setting. A numerical scheme modeled on the Sinkhorn algorithm solves the resulting minimization problem, with a proof of convergence and explicit error bounds. Numerical solutions supply vector fields that define a transport equation, from which a neural differential equation is constructed so that its flow approaches the true UOT dynamics under a suitable limiting regime. This construction supplies a rigorous bridge between discrete optimal transport computations and continuous neural models used in machine learning and control.

Core claim

We develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics. We design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural ODE whose

What carries the argument

The Sinkhorn-inspired numerical scheme that produces discrete solutions from which vector fields are extracted to define both the transport equation and the limiting neural differential equation.

If this is right

  • Explicit error estimates from the Sinkhorn scheme give quantitative control on how well the neural ODE approximates the transport.
  • The derived transport equation supplies a continuous dynamical model that can be inserted into supervised learning pipelines.
  • The limiting convergence justifies using the neural ODE as a practical surrogate for solving continuum UOT problems.
  • The same construction extends the classical Sinkhorn method from discrete to continuous unbalanced settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework may allow Neural ODEs to replace separate optimal transport solvers inside end-to-end training loops.
  • Because the method produces explicit vector fields, it could be combined with control-theoretic objectives that act on the same dynamics.
  • Numerical checks on low-dimensional examples would directly test whether the proven convergence rate appears in practice.

Load-bearing premise

The vector fields obtained by generalizing the discrete Pearson-divergence UOT problem to the continuum actually generate a neural ODE flow that converges to the true transport dynamics.

What would settle it

Compute the trajectory distance between the flow of the constructed neural ODE and the known solution of the continuous UOT problem on a simple test density; check whether this distance tends to zero under the stated limiting regime.

read the original abstract

We study the fundamental computational problem of approximating optimal transport (OT) equations using neural differential equations (Neural ODEs). More specifically, we develop a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. By generalizing a discrete UOT problem with Pearson divergence, we constructively design vector fields for Neural ODEs that converge to the true UOT dynamics, thereby advancing the mathematical foundations of computational transport and machine learning. To this end, we design a numerical scheme inspired by the Sinkhorn algorithm to solve the corresponding minimization problem and rigorously prove its convergence, providing explicit error estimates. From the obtained numerical solutions, we derive vector fields defining the transport dynamics and construct the corresponding transport equation. Finally, from the numerically obtained transport equation, we construct a neural differential equation whose flow converges to the true transport dynamics in an appropriate limiting regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript develops a novel framework for approximating unbalanced optimal transport (UOT) in the continuum using Neural ODEs. It generalizes a discrete UOT problem with Pearson divergence to constructively design vector fields for Neural ODEs claimed to converge to the true UOT dynamics. A Sinkhorn-inspired numerical scheme is designed to solve the minimization problem, with rigorous convergence proofs and explicit error estimates provided. From the numerical solutions, vector fields and the corresponding transport equation are derived, and a Neural ODE is constructed whose flow converges to the true transport dynamics in an appropriate limiting regime.

Significance. If the generalization and convergence results hold, the work would strengthen the mathematical foundations linking discrete optimal transport algorithms to continuous Neural ODE models, with the rigorous convergence proofs and explicit error estimates for the discrete scheme constituting a clear strength. This could have implications for computational transport problems in machine learning.

major comments (1)
  1. [Abstract (generalization from discrete to continuum)] The load-bearing generalization step from the discrete Pearson UOT problem (for which the Sinkhorn-style scheme has convergence and error bounds) to continuum vector fields is not justified in a manner that controls discretization error or verifies that the derived vector fields satisfy the continuum optimality conditions (such as first-order conditions involving the c-transform or unbalanced marginal constraints). The discrete analysis does not automatically transfer to the claimed convergence of the Neural ODE flow to true UOT dynamics.
minor comments (2)
  1. [Abstract] The abstract is information-dense; separating the discrete scheme, continuum generalization, and Neural ODE construction into distinct sentences would improve readability.
  2. Notation for the Pearson divergence and the limiting regime should be introduced with explicit definitions early in the text to aid readers unfamiliar with the specific UOT variant.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on the manuscript. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract (generalization from discrete to continuum)] The load-bearing generalization step from the discrete Pearson UOT problem (for which the Sinkhorn-style scheme has convergence and error bounds) to continuum vector fields is not justified in a manner that controls discretization error or verifies that the derived vector fields satisfy the continuum optimality conditions (such as first-order conditions involving the c-transform or unbalanced marginal constraints). The discrete analysis does not automatically transfer to the claimed convergence of the Neural ODE flow to true UOT dynamics.

    Authors: We agree that the passage from the discrete Pearson UOT problem to the continuum vector fields and Neural ODE flow requires explicit justification, including discretization error control and verification that the derived fields satisfy continuum optimality conditions. The manuscript constructs the vector fields from the discrete dual potentials obtained via the Sinkhorn scheme on successively refined grids, then defines the transport equation and Neural ODE to match the interpolated field, with convergence claimed in the joint limit of grid size to zero and Sinkhorn iterations to infinity. However, the current text does not provide a detailed derivation showing how the discrete first-order conditions (via the Pearson divergence) pass to the continuum c-transform conditions or unbalanced marginal constraints, nor does it supply explicit bounds on the discretization error. We will revise the manuscript by adding a dedicated subsection that derives the continuum optimality conditions from the discrete ones and establishes the necessary error estimates under suitable regularity assumptions on the data. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation chain is self-contained

full rationale

The paper first solves a discrete UOT minimization problem via a Sinkhorn-inspired scheme, proves its convergence with explicit error estimates, then generalizes the obtained solutions to construct continuum vector fields, a transport equation, and finally a Neural ODE whose flow is stated to converge to the true dynamics in a limiting regime. None of these steps reduce the claimed continuum convergence result to the discrete inputs by construction, self-definition, or self-citation chain. The discrete proof stands independently, and the generalization step is presented as a constructive design rather than a tautological renaming or fitted-parameter prediction. The derivation therefore qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, invented entities, or non-standard axioms are mentioned. Standard mathematical assumptions for ODE flows and numerical convergence are implicitly used.

axioms (1)
  • standard math Standard assumptions on existence, uniqueness, and convergence for solutions of neural differential equations and numerical schemes for minimization problems.
    Invoked to support the claimed convergence of the vector fields and the numerical scheme to true UOT dynamics.

pith-pipeline@v0.9.0 · 5685 in / 1336 out tokens · 38372 ms · 2026-05-22T23:47:41.939157+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 1 internal anchor

  1. [1]

    Agrachev and A

    A. Agrachev and A. Sarychev. Control on the manifolds of m appings with a view to the deep learning. Journal of Dynamical and Control Systems , 28(4):989–1008, 2022

  2. [2]

    Clustering in pure-attention hardmax transformers and its role in sentiment analysis

    A. Alcalde, G. Fantuzzi, and E. Zuazua. Clustering in pur e-attention hardmax transformers and its role in sentiment analysis. arXiv preprint arXiv:2407.01602 , 2024

  3. [3]

    Altschuler, F

    J. Altschuler, F. Bach, A. Rudi, and J. Niles-Weed. Massi vely scalable sinkhorn distances via the nystr¨ om method. Advances in neural information processing systems , 32, 2019

  4. [4]

    Altschuler, J

    J. Altschuler, J. Niles-Weed, and P. Rigollet. Near-lin ear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems , 30, 2017

  5. [5]

    Arjovsky, S

    M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gen erative adversarial networks. International conference on machine learning , 2017

  6. [6]

    Baghel and S

    R. Baghel and S. Mondal. Inequality restricted minimum d ensity power divergence estimation for panel count data. arXiv preprint arXiv:2503.21534 , 2024

  7. [7]

    Balaji, R

    Y. Balaji, R. Chellappa, and S. Feizi. Robust optimal tra nsport with applications in generative modeling. arXiv preprint, 2020. 42 M.-N. PHUNG AND M.-B. TRAN

  8. [8]

    Benamou, B

    J.-D. Benamou, B. D. Froese, and A. M. Oberman. Two numeri cal methods for the elliptic Monge-Amp` ere equation. ESAIM: Mathematical Modelling and Numerical Analysis , 44(4):737–758, 2010

  9. [9]

    Benamou, B

    J.-D. Benamou, B. D Froese, and A. M. Oberman. Numerical s olution of the optimal transportation problem using the monge–amp` ere equation. Journal of Computational Physics , 260:107–126, 2014

  10. [10]

    Benning, E

    M. Benning, E. Celledoni, Ma. J. Ehrhardt, B. Owren, and C.-B. Sch¨ onlieb. Deep learning as optimal control problems: Models and numerical methods. arXiv preprint arXiv:1904.05657 , 2019

  11. [11]

    R. J. Berman. The sinkhorn algorithm, parabolic optima l transport and geometric monge–amp` ere equations. Numerische Mathematik , 145(4):771–836, 2020

  12. [12]

    Blondel, V

    M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse opt imal transport. Proceedings of the International Conference on Artificial Intelligence and Statistics (AIST ATS), 84:880–889, 2018. PMLR

  13. [13]

    Y. Brenier. Polar factorization and monotone rearrang ement of vector-valued functions. Communications on Pure and Applied Mathematics , 44(4):375–417, 1991

  14. [14]

    Brenner, L.-Y

    S. Brenner, L.-Y. Sung, Z. Tan, and H. Zhang. A nonlinear least-squares convexity enforcing co interior penalty method for the monge–amp` ere equation on strictly convex sm ooth planar domains. Communications of the American Mathematical Society , 4(14):607–640, 2024

  15. [15]

    Bui-Thanh

    T. Bui-Thanh. A unified and constructive framework for t he universality of neural networks. IMA Journal of Applied Mathematics, 89(1):197–230, 2024

  16. [16]

    L. A. Caffarelli and R. J. McCann. Free boundaries in opti mal transport and monge-amp` ere obstacle problems. Annals of Mathematics , 171(2):673–730, 2010

  17. [17]

    Chapel, R

    L. Chapel, R. Flamary, H. Wu, C. F´ evotte, and G. Gasso. U nbalanced optimal transport through non-negative penalized linear regression. Advances in Neural Information Processing Systems , 34:23270–23282, 2021

  18. [18]

    R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duve naud. Neural ordinary differential equations. Advances in neural information processing systems , 31, 2018

  19. [19]

    Chizat, G

    L. Chizat, G. Peyr´ e, B. Schmitzer, and F.-X. Vialard. S caling algorithms for unbalanced optimal transport problems. Mathematics of computation , 87(314):2563–2609, 2018

  20. [20]

    Chui and Xin Li

    Charles K. Chui and Xin Li. Approximation by ridge funct ions and neural networks with one hidden layer. J. Approx. Theory, 70(2):131–141, August 1992

  21. [21]

    Csisz´ ar and P

    I. Csisz´ ar and P. C. Shields. Information theory and st atistics: A tutorial. Foundations and Trends in Commu- nications and Information Theory , 1(4):417–528, 2004

  22. [22]

    M. Cuturi. Sinkhorn distances: Lightspeed computatio n of optimal transport. Advances in neural information processing systems, 26, 2013

  23. [23]

    Topics in Optimal Transportation , volume 58

    C.Villani. Topics in Optimal Transportation , volume 58. Graduate Studies in Mathematics, 2003

  24. [24]

    Optimal Transport: Old and New

    C.Villani. Optimal Transport: Old and New . Springer Berlin, Heidelberg, 2008

  25. [25]

    Inequ alities for generalized entropy and optimal transporta- tion

    D.Cordero-Erausquin, W.Gangbo, and C.Houdr´ e. Inequ alities for generalized entropy and optimal transporta- tion. Contemp. Math. , 353, 05 2003

  26. [26]

    De Philippis and A

    G. De Philippis and A. Figalli. Second order stability f or the monge–amp` ere equation and strong sobolev con- vergence of optimal transport maps. Analysis and PDE , 6:993–1000, August 2013

  27. [27]

    De Philippis and A

    G. De Philippis and A. Figalli. W 2, 1 regularity for solutions of monge-amp` ere equation. Inventiones Mathemat- icae, 192:55–60, April 2013

  28. [28]

    DiPerna and P.L

    R.J. DiPerna and P.L. Lions. Ordinary differential equa tions, transport theory and sobolev spaces. Inventiones Mathematicae, 98:511–547, October 1989

  29. [29]

    Dolbeault, B

    J. Dolbeault, B. Nazaret, and G. Savar´ e. A new class of t ransport distances between measures. Calculus of Variations and Partial Differential Equations , 34(2):193–231, 2009

  30. [30]

    Neural ode control for clas sification, approximation, and transport

    D.Ruiz-Balet and E.Zuazua. Neural ode control for clas sification, approximation, and transport. SIAM Review , 65(3):735–773, 2023

  31. [31]

    Control of neural transpor t for normalising flows

    D.Ruiz-Balet and E.Zuazua. Control of neural transpor t for normalising flows. Journal de Math´ ematiques Pures et Appliqu´ ees, 181:58–90, 2024

  32. [32]

    Duchi and H

    J. Duchi and H. Namkoong. Learning models with uniform p erformance via distributionally robust optimization. arXiv preprint arXiv:1810.08750 , 2018

  33. [33]

    Elamvazhuthi, B

    K. Elamvazhuthi, B. Gharesifard, A. L. Bertozzi, and S. Osher. Neural ode control for trajectory approximation of continuity equation. IEEE Control Systems Letters , 6:3152–3157, 2022

  34. [34]

    Fatras, T

    K. Fatras, T. S´ ejourn´ e, R. Flamary, and N. Courty. Unb alanced minibatch optimal transport; applications to domain adaptation. In International Conference on Machine Learning , pages 3186–3197. PMLR, 2021

  35. [35]

    Finlay, J.-H

    C. Finlay, J.-H. Jacobsen, L. Nurbekyan, and A. Oberman . How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning , pages 3154–3164. PMLR, 2020

  36. [36]

    The geometry of dissipative evolution equatio ns: The porous medium equation

    F.Otto. The geometry of dissipative evolution equatio ns: The porous medium equation. Communications in Partial Differential Equations , 26(1-2):101–174, 2001

  37. [37]

    Frogner, C

    C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggi o. Learning with a wassenstein loss. Advances in Neural Information Processing Systems , 2015

  38. [38]

    Fukunaga and H

    T. Fukunaga and H. Kasai. Block-coordinate frank-wolf e algorithm and convergence analysis for semi-relaxed optimal transport problem. In ICASSP 2022-2022 IEEE International Conference on Acousti cs, Speech and Signal Processing (ICASSP) , pages 5433–5437. IEEE, 2022. CONTROLS, OPTIMAL TRANSPORT, NEURAL NETWORKS 43

  39. [39]

    Real Analysis: Modern Techniques and Their Application, 2n d Edition

    G.B.Folland. Real Analysis: Modern Techniques and Their Application, 2n d Edition . John Wiley & Sons, 1999

  40. [40]

    Genevay, M

    A. Genevay, M. Cuturi, G. Peyr´ e, and F. Bach. Stochasti c optimization for large-scale optimal transport. Ad- vances in neural information processing systems , 29, 2016

  41. [41]

    Gordaliza, E

    P. Gordaliza, E. Del Barrio, G. Fabrice, and J.-M. Loube s. Obtaining fairness using optimal transport theory. International Conference on Machine Learning , 2019

  42. [42]

    Guminov, P

    S. Guminov, P. Dvurechensky, N. Tupitsa, and A. Gasniko v. On a combination of alternating minimization and nesterov’s momentum. In International conference on machine learning , pages 3886–3898. PMLR, 2021

  43. [43]

    Haber and L

    E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse problems , 34(1):014004, 2017

  44. [44]

    T´ emam I

    R. T´ emam I. Ekeland.Convex analysis and variational problems . Society for Industrial and Applied Mathematics, 1999

  45. [45]

    Jabir, D

    J.-F. Jabir, D. Siska, and L. Szpruch. Mean-field neural odes via relaxed optimal control. arXiv preprint arXiv:1912.05475, 2019

  46. [46]

    Kobyzev, S

    I. Kobyzev, S. J. D. Prince, and M. A. Brubaker. Normaliz ing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intellig ence, 43(11):3964–3979, 2020

  47. [47]

    Boundary regularity of maps with conve x potentials–ii

    L.A.Caffarelli. Boundary regularity of maps with conve x potentials–ii. Annals of Mathematics , 144(3):453–496, 1996

  48. [48]

    Lan and Y

    G. Lan and Y. Zhou. Random gradient extrapolation for di stributed and stochastic optimization. SIAM Journal on Optimization , 28(4):2753–2782, 2018

  49. [49]

    T. Le, Y. Yamada, and T. Q. Nguyen. Robustness in optimal transport: Beyond plug-and-play. arXiv preprint , 2021

  50. [50]

    J.-D. Lee, C. Lim, and S. J. Wright. On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration. Journal of Mathematical Imaging and Vision , 61(2):236–250, 2019

  51. [51]

    Q. Li, L. Chen, and C. Tai. Maximum principle based algor ithms for deep learning. Journal of Machine Learning Research, 18(165):1–29, 2018

  52. [52]

    Liero, A

    M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger–kantorovich distance and geodesic curves. SIAM Journal on Mathematical Analysis , 48(4):2869–2911, 2016

  53. [53]

    Liero, A

    M. Liero, A. Mielke, and G. Savar´ e. Optimal transport i n competition with reaction: The hellinger-kantorovich distance and the evolution of distributions. Archive for Rational Mechanics and Analysis , 225(1):417–465, 2017

  54. [54]

    On the translocation of masses

    L.V.Kantorovich. On the translocation of masses. J.Math.Sci., 133:1381–1382, 2006

  55. [55]

    Maniglia

    S. Maniglia. Probabilistic representation and unique ness results for measure-valued solutions of transport equ a- tions. Journal de Math´ ematiques Pures et Appliqu´ ees, 87(6):601–626, 2007

  56. [56]

    Seq uential monte carlo for inclusive kl minimization in amortized variational inference

    Declan McNamara, Jackson Loper, and Jeffrey Regier. Seq uential monte carlo for inclusive kl minimization in amortized variational inference. In Proceedings of the 41st International Conference on Machin e Learning, 2024

  57. [57]

    H. N. Mhaskar. On the degree of approximation in multiva riate weighted approximation. In Martin D. Buhmann and Detlef H. Mache, editors, Advanced Problems in Constructive Approximation , pages 129–141, Basel, 2003. Birkh¨ auser Basel

  58. [58]

    Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures

    G.Savar´ e M.Liero, A.Mielke. Optimal entropy-transp ort problems and a new hellinger - kantorovich distance between positive measures. Invent. math. , 211:969–1117, 03 2018

  59. [59]

    Q. M. Nguyen, H. H. Nguyen, Y. Zhou, and L. M. Nguyen. On un balanced optimal transport: Gradient methods, sparsity and approximation error. J. Mach. Learn. Res. , 24:384:1–384:41, 2022

  60. [60]

    Nowozin, B

    S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training ge nerative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NeurIP S), pages 271–279, 2016

  61. [61]

    C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity . Courier Corpora- tion, 1998

  62. [62]

    S. E. Reed and R. J. Marks II. On the effectiveness of the pe arson chi-square test for neural network optimization. In Proceedings of the IEEE-INNS-ENNS International Joint Con ference on Neural Networks , volume 6, pages 4025–4029. IEEE, 1999

  63. [63]

    Existence and uniqueness of monotone meas ure-preserving maps

    R.J.McCann. Existence and uniqueness of monotone meas ure-preserving maps. Duke Math. J. , 80-2:309–323, 11 1995

  64. [64]

    M. E. Sander, P. Ablin, M. Blondel, and G. Peyr´ e. Moment um residual neural networks. In International Conference on Machine Learning , pages 9276–9287. PMLR, 2021

  65. [65]

    Scetbon, M

    M. Scetbon, M. Cuturi, and G. Peyr´ e. Low-rank sinkhorn factorization. In International Conference on Machine Learning, pages 9344–9354. PMLR, 2021

  66. [66]

    Schiebinger, J

    G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subrama nian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, L. Lee, J. Chen, J. Brumbaugh, P. Rigollet, K. L. Hoehn, O. Roz enblatt-Rosen, A. Regev, and E. S. Lander. Optimal-transport analysis of single-cell gene expressio n data. Nature, 566(7744):380–385, 2019

  67. [67]

    Schmitzer

    B. Schmitzer. Stabilized sparse scaling algorithms fo r entropy regularized transport problems. SIAM Journal on Scientific Computing , 41(3):A1443–A1481, 2019

  68. [68]

    S´ ejourn´ e, F.-X

    T. S´ ejourn´ e, F.-X. Vialard, and G. Peyr´ e. Faster unbalanced optimal transport: Translation invariant sinkhor n and 1-d frank-wolfe. In International Conference on Artificial Intelligence and St atistics, pages 4995–5021. PMLR, 2022. 44 M.-N. PHUNG AND M.-B. TRAN

  69. [69]

    T. Si, Y. Wang, L. Zhang, E. Richmond, T.-H. Ahn, and H. Go ng. Multivariate time series change-point detection with a novel pearson-like scaled bregman divergence. Stats, 7(2):462–480, 2024

  70. [70]

    S. Simon. Minimax and Mononicity . Springer Berlin, Heidelberg, 1998

  71. [71]

    Sinkhorn

    R. Sinkhorn. Diagonal equivalence to matrices with pre scribed row and column sums. ii. Proceedings of the American Mathematical Society , 45(2):195–198, 1974

  72. [72]

    Su and H

    X. Su and H. Kasai. Accelerating unbalanced optimal tra nsport problem using dynamic penalty updating. In 2024 International Joint Conference on Neural Networks (IJ CNN), pages 1–6. IEEE, 2024

  73. [73]

    Sugiyama, T

    M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning . Cambridge University Press, 2012

  74. [74]

    Tabuada and B

    P. Tabuada and B. Gharesifard. Universal approximatio n power of deep residual neural networks via nonlinear control theory. arXiv preprint arXiv:2007.06007 , 2020

  75. [75]

    Tolstikhin, O

    I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf . Wasserstein auto-encoders. International Conference on Learning Representations, 2018

  76. [76]

    B. Wang, Z. Shi, and S. Osher. Resnets ensemble via the fe ynman-kac formalism to improve natural and robust accuracies. Advances in Neural Information Processing Systems , 32, 2019

  77. [77]

    E. Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Sta- tistics, 5(1):1–11, 2017

  78. [78]

    K. D. Yang and C. Uhler. Scalable unbalanced optimal tra nsport using generative adversarial networks. Inter- national Conference on Learning Representations (ICLR) , 2019. OpenReview.net

  79. [79]

    Zimmermann, C

    H. Zimmermann, C. A. Naesseth, and J.-W. van de Meent. Va riational inference with sequential sample-average approximations. In Advances in Neural Information Processing Systems , 2024

  80. [80]

    E. Zuazua. Progress and future directions in machine le arning through control theory. In FGS 2024 French- German-Spanish Conference on Optimization , 2024

Showing first 80 references.