arxiv: 2603.25182 · v3 · submitted 2026-03-26 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Learning Monge maps with constrained drifting models

Th\'eo Dumont (1) , Th\'eo Lacombe (1) , Fran\c{c}ois-Xavier Vialard (1) ((1) LIGM)

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:43 UTC · model grok-4.3

classification 🧮 math.OC

keywords optimal transportMonge mapgradient flowconstrained optimizationneural networkslog-concave measurenatural gradient descent

0 comments

The pith

A constrained gradient flow on transport maps converges to the optimal Monge map as time goes to infinity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an evolution equation for transport maps that acts as the gradient flow of a lifted user-chosen divergence while staying inside the convex set of optimal transport maps. It proves existence of long-time solutions and convergence to the true OT map when the target measure is log-concave and the divergence meets standard convexity conditions. The flow is discretized in explicit and implicit schemes, with the implicit version proven to converge, and used to train convexity-constrained neural networks. This training is equivalent to natural gradient descent on the network parameters, and experiments show it produces more accurate maps with greater stability than ordinary Euclidean gradient descent or Adam.

Core claim

We propose a gradient flow in the space of transport maps obtained by lifting a divergence functional and constraining the flow to the convex set of Monge maps. Under standard convexity assumptions on the divergence and log-concavity of the target measure, the flow exists globally in time and converges to the optimal transport map. Time discretizations of this flow are used to train convexity-constrained neural networks parameterizing the maps, and the implicit scheme is proven to converge to the OT map.

What carries the argument

The constrained gradient flow of a lifted divergence on the convex set of optimal transport maps, which enforces optimality via the convexity constraint.

If this is right

Long-time solutions to the flow exist and it converges to the OT map under the stated conditions.
The implicit time-discrete scheme converges to the OT map as the time step goes to zero and iterations increase.
Training with the discretizations is equivalent to natural gradient descent of the lifted divergence in the neural network parameter space.
The resulting maps approximate the OT map more accurately than those from Euclidean gradient descent on the same networks.
Training remains more stable and outperforms the Adam optimizer applied to the same convexity-constrained networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The natural-gradient equivalence could let similar drifting-model techniques scale OT map learning to higher dimensions without explicit density estimation.
The method might be combined with other parameterizations beyond neural networks to handle non-log-concave targets by relaxing the constraint.
Links to drifting generative models suggest the flow could be hybridized with diffusion-based samplers for joint OT and sampling tasks.

Load-bearing premise

The target probability measure must be log-concave and the chosen divergence must satisfy standard convexity conditions to guarantee convergence of the flow.

What would settle it

A numerical simulation of the flow for a simple known pair of source and log-concave target measures where the map does not approach the analytically known optimal transport map as time increases would disprove the convergence result.

Figures

Figures reproduced from arXiv: 2603.25182 by Fran\c{c}ois-Xavier Vialard (1) ((1) LIGM), Th\'eo Dumont (1), Th\'eo Lacombe (1).

**Figure 1.** Figure 1: Simplified view of the geometries underlying the Euclidean (eucl.gf) and natural (nat.gf) gradient structures in Θ. (a) For the standard gradient flow, the parameter space Θ is endowed with the Euclidean metric and the optimization takes place in a curved L 2 ϱ0 (R d , R d ). (b) For the L 2 ϱ0 -natural gradient flow, the optimization takes place in a flat L 2 ϱ0 (R d , R d ) and Θ is endowed with the (non… view at source ↗

**Figure 2.** Figure 2: Histograms of the MMD values that result from methods (a, b, c, d). For each one, we plot and display the values of the mean and standard deviation over the 100 seeds. For method (c), values are clipped at 0.13 for the clarity of display (the mean and standard variation are kept unchanged). (iii) Methods to be compared and their parameters. We consider the following methods to be compared. First, our two … view at source ↗

**Figure 3.** Figure 3: (left) Example of an unsatisfying learned OT map, obtained with method (d), with a MMD value of ≈ 0.07. (right) Example of a satisfying learned OT map, obtained with method (b), with a MMD value of ≈ 0.02. Remark 3.13 (From theoretical to numerical convergence guarantees). This numerical illustration aims at providing an elementary proof-of-concept to support the theoretical guarantees of Section 2: showin… view at source ↗

read the original abstract

We study the estimation of optimal transport (OT) maps between an arbitrary source probability measure and a log-concave target probability measure. Our contributions are twofold. First, we propose a new evolution equation in the set of transport maps. It can be seen as the gradient flow of a lift of some user-chosen divergence (e.g., the KL divergence, or relative entropy) to the space of transport maps, constrained to the convex set of optimal transport maps. We prove the existence of long-time solutions to this flow as well as its convergence toward the OT map as time goes to infinity, under standard convexity conditions on the divergence. Second, we study the practical implementation of this constrained gradient flow. We propose two time-discrete computational schemes-one explicit, one implicit-, and we prove the convergence of the latter to the OT map as time goes to infinity. We then parameterize the OT maps with convexity-constrained neural networks and train them with these discretizations of the constrained gradient flow. We show that this is equivalent to performing a natural gradient descent of the lift of the chosen divergence in the neural networks' parameter space, similarly to drifting generative models. Empirically, our scheme outperforms the standard Euclidean gradient descent methods used to train convexity-constrained neural networks in terms of approximation results for the OT map and convergence stability, and it still yields better results than the same approach combined with the widely used Adam optimizer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper frames OT map estimation as a constrained gradient flow on transport maps that equates to natural gradient descent on convexity-constrained networks, with convergence proofs and reported stability gains over plain GD and Adam.

read the letter

The central contribution is a new evolution equation viewed as the gradient flow of a lifted divergence, constrained to the set of OT maps. They prove long-time existence and convergence to the true map when the target is log-concave and the divergence satisfies standard convexity. The practical part discretizes the flow with explicit and implicit schemes, proves convergence for the implicit version, and shows that training convexity-constrained networks this way is exactly natural gradient descent in parameter space. That equivalence is the cleanest part and directly explains the reported improvements in approximation quality and training stability. Empirically the method beats Euclidean gradient descent and Adam on the tasks they test. The theory builds on existing OT gradient-flow arguments but applies them usefully to the constrained neural setting, and the link to drifting models is a reasonable bridge to generative modeling work. The log-concavity assumption on the target is the main limitation; it is standard for these proofs but restricts how far the guarantees extend to arbitrary data. The abstract sketches the discretization error and experimental controls without full detail, so those sections would need checking for tightness and robustness across dimensions. No circularity appears in the construction itself. This is aimed at people working on optimal transport maps, convexity-constrained networks, or natural-gradient methods in generative models. A reader already familiar with OT flows or drifting models will get the most from the equivalence result and the concrete training comparison. The combination of a new flow, a verifiable equivalence, and some empirical evidence is substantive enough to merit detailed referee comments on the proofs and setups. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces a constrained gradient flow on the space of transport maps, defined as the gradient flow of a lifted user-chosen divergence (e.g., KL) restricted to the convex set of optimal transport maps. It proves existence of long-time solutions and convergence to the OT map as t→∞ under log-concavity of the target and standard convexity conditions on the divergence. Two time-discrete schemes (explicit and implicit) are proposed, with a convergence proof given for the implicit scheme; the maps are then parameterized by convexity-constrained neural networks, shown to be equivalent to natural gradient descent in parameter space, and demonstrated empirically to outperform Euclidean gradient descent and Adam on approximation quality and stability.

Significance. If the proofs hold, the work supplies a theoretically grounded continuous-time framework for OT map learning that directly incorporates the OT constraint and connects to drifting generative models. The explicit equivalence to natural gradient descent and the reported gains in convergence stability constitute concrete strengths, especially for high-dimensional settings where standard training of convexity-constrained networks is known to be fragile.

major comments (2)

[discretization analysis] The convergence statement for the implicit scheme (abstract and the discretization section) is load-bearing for the practical claims; the manuscript must clarify whether the implicit update exactly preserves membership in the OT-map set or relies on an approximate projection, and quantify the resulting discretization error relative to the continuous flow.
[neural parameterization] The neural-network parameterization is stated to be equivalent to natural gradient descent of the lifted divergence; the derivation of this equivalence (parameter-space section) should be expanded to show how the convexity constraint on the network is enforced at each step without violating the OT-map convexity requirement.

minor comments (2)

[notation] Notation for the lifted divergence and its gradient flow should be introduced once with a clear reference to the underlying probability measures; repeated re-definition across sections reduces readability.
[experiments] The empirical section would benefit from an additional baseline that uses the same convexity-constrained network but with a standard projected gradient step, to isolate the benefit of the constrained flow from the network architecture itself.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the work, and constructive suggestions. We address each major comment below and will revise the manuscript accordingly to improve clarity on the discretization and parameterization details.

read point-by-point responses

Referee: The convergence statement for the implicit scheme (abstract and the discretization section) is load-bearing for the practical claims; the manuscript must clarify whether the implicit update exactly preserves membership in the OT-map set or relies on an approximate projection, and quantify the resulting discretization error relative to the continuous flow.

Authors: We agree this clarification is needed. In the revised discretization section we will explicitly state that the implicit scheme is the proximal mapping of the lifted divergence onto the convex set of OT maps; by definition of the proximal operator on a closed convex set, the update exactly preserves membership in the OT-map set with no approximate projection. We will also add a quantitative error bound showing that the discretization error to the continuous flow is O(Δt) in the appropriate metric (consistent with standard analysis of implicit Euler schemes on convex sets), which is already implicit in our existing convergence proof but will now be stated explicitly. revision: yes
Referee: The neural-network parameterization is stated to be equivalent to natural gradient descent of the lifted divergence; the derivation of this equivalence (parameter-space section) should be expanded to show how the convexity constraint on the network is enforced at each step without violating the OT-map convexity requirement.

Authors: We will expand the parameter-space section with a detailed derivation. The convexity-constrained network (built from input-convex layers) guarantees that every forward pass produces a convex map, hence an element of the OT-map set, for any parameter vector. The equivalence to natural gradient descent follows because the Riemannian metric induced by the network parameterization automatically respects this architectural constraint; each parameter update therefore corresponds to a tangent step that remains inside the convex set without requiring an extra projection step. We will include the explicit chain-rule computation linking the parameter-space natural gradient to the constrained flow in function space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines a constrained gradient flow directly from a user-chosen divergence lifted to the space of transport maps, with the constraint set being the convex set of OT maps (defined independently via optimal transport). It proves long-time existence and convergence to the OT map using standard convexity assumptions on the divergence and log-concavity of the target measure; these are external conditions, not derived from the flow itself. The neural-network parameterization and equivalence to natural gradient descent in parameter space is a derived computational observation, not a reduction of the central existence/convergence claim to fitted inputs or self-citations. No step reduces by construction to its own inputs, and no load-bearing uniqueness theorem or ansatz is smuggled via self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from optimal transport and convex analysis rather than new free parameters or invented entities.

axioms (2)

domain assumption Target probability measure is log-concave
Invoked in the abstract to guarantee convergence of the constrained flow to the OT map.
domain assumption Divergence satisfies standard convexity conditions
Required for the existence of long-time solutions and convergence as stated.

pith-pipeline@v0.9.0 · 5572 in / 1234 out tokens · 25401 ms · 2026-05-15T00:43:55.230881+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a new evolution equation … gradient flow of a lift of some user-chosen divergence … constrained to the convex set of optimal transport maps … under standard convexity conditions on the divergence.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the parameterized constrained gradient flow … is the natural gradient flow of F … with respect to the L²_ρ0-metric

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 5 internal anchors

[1]

Hessian Riemannian gradient flows in convex programming

[ABB04] Felipe Alvarez, J´ erˆ ome Bolte, and Olivier Brahic. “Hessian Riemannian gradient flows in convex programming”. In:SIAM journal on control and optimization43.2 (2004), pp. 477–501 (cit. on p. 4). [ABS+21] Luigi Ambrosio, Elia Bru´ e, Daniele Semola, et al.Lectures on optimal transport. Vol

work page 2004
[2]

Minimizing flows for the Monge– Kantorovich problem

Springer, 2021 (cit. on p. 12). [AGS08] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar´ e.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008 (cit. on pp. 6–8, 11–13, 15, 31, 33). [AHT03] Sigurd Angenent, Steven Haker, and Allen Tannenbaum. “Minimizing flows for the Monge– Kantorovich proble...

work page 2021
[3]

Supervised training of conditional Monge maps

Springer, 2014 (cit. on p. 8). [BKC22] Charlotte Bunne, Andreas Krause, and Marco Cuturi. “Supervised training of conditional Monge maps”. In:Advances in Neural Information Processing Systems35 (2022), pp. 6859–6872 (cit. on p. 4). [BL19] Sergey Bobkov and Michel Ledoux.One-dimensional empirical measures, order statistics, and Kantorovich transport distances. Vol

work page 2014
[4]

Distribution’s template esti- mate with Wasserstein metrics

American Mathematical Society, 2019 (cit. on p. 6). learning monge maps with constrained drifting models25 [BLGL15] Emmanuel Boissard, Thibaut Le Gouic, and Jean-Michel Loubes. “Distribution’s template esti- mate with Wasserstein metrics”. In:Bernoulli(2015), pp. 740–759 (cit. on p. 4). [BM22] Guillaume Bonnet and Jean-Marie Mirebeau. “Monotone discretiza...

work page 2019
[5]

From optimal transport to generative modeling: the VEGAN cookbook

arXiv: 1705.07642(cit. on p. 4). [BPV25] Rapha¨ el Barboni, Gabriel Peyr´ e, and Fran¸ cois-Xavier Vialard. “Understanding the training of infinitely deep and wide resnets with conditional optimal transport”. In:Communications on Pure and Applied Mathematics78.11 (2025), pp. 2149–2205 (cit. on p. 2). [Br´ e11] Haim Br´ ezis.Functional analysis, Sobolev sp...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

D´ ecomposition polaire et r´ earrangement monotone des champs de vecteurs

Springer, 2011 (cit. on pp. 10, 31, 35). [Bre87] Yann Brenier. “D´ ecomposition polaire et r´ earrangement monotone des champs de vecteurs”. In: CR Acad. Sci. Paris S´ er. I Math.305 (1987), pp. 805–808 (cit. on pp. 2, 5). [BRX25] Qinxun Bai, Steven Rosenberg, and Wei Xu. “Generalized Tangent Kernel: A Unified Geometric Foundation for Natural Gradient and...

work page 2011
[7]

Learning gradients of convex func- tions with monotone gradient networks

arXiv:2603. 01977(cit. on pp. 8, 21). [CPM23] Shreyas Chaudhari, Srinivasa Pranav, and Jos´ e MF Moura. “Learning gradients of convex func- tions with monotone gradient networks”. In:ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1–5 (cit. on p. 4). [CPM25] Shreyas Chaudhari, Srinivasa P...

work page 2023
[8]

Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit¨ at von Markoffschen Ketten

arXiv:2507.13191(cit. on p. 4). [Csi63] Imre Csisz´ ar. “Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit¨ at von Markoffschen Ketten”. In:A Magyar Tudom´ anyos Akad´ emia Matematikai Kutat´ o Int´ ezet´ enek K¨ ozlem´ enyei8.1-2 (1963), pp. 85–108 (cit. on p. 14). [CSS18] Denis Chetverikov, Andres Santos, and Azee...

work page arXiv 1963
[9]

arXiv:2201.12324(cit. on p. 21). [CWL26] Jiarui Cao, Zixuan Wei, and Yuxin Liu.Gradient Flow Drifting: Generative Modeling via Wasser- stein Gradient Flows of KDE-Approximated Divergences

work page arXiv
[10]

arXiv:2603.10592(cit. on p. 4). dumont, lacombe, and vialard26 [Den+26] Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He.Generative Modeling via Drifting

work page arXiv
[11]

Generative Modeling via Drifting

arXiv:2602.04770(cit. on pp. 4, 21). [DG93] Ennio De Giorgi. “New problems on minimizing movements”. In:Ennio de Giorgi: selected papers (1993), pp. 699–713 (cit. on pp. 4, 11). [DM23] Alex Delalande and Quentin Merigot. “Quantitative stability of optimal transport maps under variations of the target measure”. In:Duke Mathematical Journal172.17 (2023), pp...

work page internal anchor Pith review Pith/arXiv arXiv 1993
[12]

Real Analysis and Probability

arXiv:2504. 19779(cit. on p. 4). [Dud45] RM Dudley. “Real Analysis and Probability”. In:American history1861.1900 (1945) (cit. on p. 15). [Fan+23] Jiaojiao Fan, Shu Liu, Shaojun Ma, Hao-Min Zhou, and Yongxin Chen. “Neural Monge map estimation and its applications”. In:Transactions on Machine Learning Research(2023) (cit. on p. 4). [Fey+19] Jean Feydy, Thi...

work page arXiv 1900
[13]

Convergence de la r´ epartition empirique vers la r´ epartition th´ eorique

arXiv:2310.18078(cit. on p. 15). [FM53] Robert Fortet and Edith Mourier. “Convergence de la r´ epartition empirique vers la r´ epartition th´ eorique”. In:Annales scientifiques de l’´Ecole normale sup´ erieure. Vol

work page arXiv
[14]

Convexity in ReLU Neural Networks: beyond ICNNs?

1953, pp. 267–285 (cit. on p. 15). [Gag+25] Anne Gagneux, Mathurin Massias, Emmanuel Soubies, and R´ emi Gribonval. “Convexity in ReLU Neural Networks: beyond ICNNs?” In:Journal of Mathematical Imaging and Vision67.4 (2025), p. 40 (cit. on pp. 4, 17). [Gho+21] Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, and Kannan Ramchandran. “Max- affine re...

work page 1953
[15]

Generative adversarial networks

American Mathematical Society, 2011 (cit. on p. 32). [Goo+20] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative adversarial networks”. In:Communications of the ACM63.11 (2020), pp. 139–144 (cit. on p. 2). [Gre+06] Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bern...

work page 2011
[16]

On differentiability in the Wasserstein space and well- posedness for Hamilton–Jacobi equations

arXiv:2408.06534(cit. on p. 4). [GT19] Wilfrid Gangbo and Adrian Tudorascu. “On differentiability in the Wasserstein space and well- posedness for Hamilton–Jacobi equations”. In:Journal de Math´ ematiques Pures et Appliqu´ ees 125 (2019), pp. 119–174 (cit. on pp. 7, 35). [GTY04] Joan Glaunes, Alain Trouv´ e, and Laurent Younes. “Diffeomorphic matching of ...

work page arXiv 2019
[17]

Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel

Ieee. 2004, pp. II–II (cit. on p. 8). learning monge maps with constrained drifting models27 [Hag+24] Paul Hagemann, Johannes Hertrich, Fabian Altekr¨ uger, Robert Beinert, Jannis Chemseddine, and Gabriele Steidl. “Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel”. In:ICLR. 2024 (cit. on p. 8). [Han92] Leonid G Hanin. “K...

work page 2004
[18]

Generative Sliced MMD Flows with Riesz Kernels

arXiv:2603.12366(cit. on p. 21). [Her+24] Johannes Hertrich, Christian Wald, Fabian Altekr¨ uger, and Paul Hagemann. “Generative Sliced MMD Flows with Riesz Kernels”. In:ICLR. 2024 (cit. on p. 8). [HJA20] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In: Advances in neural information processing systems33 (2020), p...

work page arXiv 2024
[19]

Minimax estimation of smooth optimal transport maps

arXiv:2602.10726(cit. on p. 21). [HR21] Jan-Christian H¨ utter and Philippe Rigollet. “Minimax estimation of smooth optimal transport maps”. In:The Annals of Statistics49.2 (2021), pp. 1166–1194 (cit. on p. 4). [Hur23] Samuel Hurault. “Convergent plug-and-play methods for image inverse problems with explicit and nonconvex deep regularization”. PhD thesis....

work page arXiv 2021
[20]

Representation theorem for convex nonparametric least squares

2023 (cit. on p. 4). [Kul59] Solomon Kullback.Information theory and statistics. Courier Corporation, 1959 (cit. on p. 14). [Kuo08] Timo Kuosmanen. “Representation theorem for convex nonparametric least squares”. In:The Econometrics Journal11.2 (2008), pp. 308–325 (cit. on p. 4). [Laf88] John D Lafferty. “The density manifold and configuration space quant...

work page 2023
[21]

arXiv:1906.09691(cit. on p. 4). [Liu+21] Shu Liu, Shaojun Ma, Yongxin Chen, Hongyuan Zha, and Haomin Zhou.Learning high dimen- sional wasserstein geodesics

work page internal anchor Pith review Pith/arXiv arXiv 1906
[22]

Natural gradient via optimal transport

arXiv:2102.02992(cit. on p. 4). [LM18] Wuchen Li and Guido Mont´ ufar. “Natural gradient via optimal transport”. In:Information Geometry1 (2018), pp. 181–214 (cit. on p. 19). [LM24] Cyril Letrouit and Quentin M´ erigot.Gluing methods for quantitative stability of optimal transport maps

work page arXiv 2018
[23]

The flow map of the Fokker–Planck equation does not provide optimal transport

arXiv:2411.04908(cit. on pp. 14, 15). [LS22] Hugo Lavenant and Filippo Santambrogio. “The flow map of the Fokker–Planck equation does not provide optimal transport”. In:Applied Mathematics Letters133 (2022), p. 108225 (cit. on p. 10). [Lu+20] Guansong Lu, Zhiming Zhou, Jian Shen, Cheng Chen, Weinan Zhang, and Yong Yu.Large-scale optimal transport via adve...

work page arXiv 2022
[24]

Optimal transport map- ping via input convex neural networks

arXiv:2003.06635(cit. on p. 4). [Mak+20] Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. “Optimal transport map- ping via input convex neural networks”. In:International Conference on Machine Learning. PMLR. 2020, pp. 6672–6681 (cit. on p. 4). [Mar10] James Martens. “Deep learning via Hessian-free optimization”. In:Proceedings of the 27th ...

work page arXiv 2003
[25]

The entropic optimal (self-)transport problem: Limit distributions for decreasing regularization with application to score function estimation

arXiv:2412.12007(cit. on p. 21). [Mor62] Jean Jacques Moreau. “D´ ecomposition orthogonale d’un espace hilbertien selon deux cˆ ones mutuellement polaires”. In:Comptes rendus hebdomadaires des s´ eances de l’Acad´ emie des sci- ences255 (1962), pp. 238–240 (cit. on p. 31). [MS20] Matteo Muratori and Giuseppe Savar´ e. “Gradient flows and evolution variati...

work page internal anchor Pith review Pith/arXiv arXiv 1962
[26]

Gradient flows of non convex functionals in Hilbert spaces and applications

arXiv:2111.12187(cit. on p. 4). [RS06] Riccarda Rossi and Giuseppe Savar´ e. “Gradient flows of non convex functionals in Hilbert spaces and applications”. In:ESAIM: Control, Optimisation and Calculus of Variations12.3 (2006), pp. 564–614 (cit. on pp. 4, 6, 11, 12). [RW98] R Tyrrell Rockafellar and Roger JB Wets.Variational analysis. Springer, 1998 (cit. ...

work page arXiv 2006
[27]

12744(cit

arXiv:1910 . 12744(cit. on p. 4). [Sch22] Alexander Schmeding.An introduction to infinite-dimensional differential geometry. Vol

work page 1910
[28]

Cambridge University Press, 2022 (cit. on p. 5). [Seg+17] Vivien Seguy, Bharath Bhushan Damodaran, R´ emi Flamary, Nicolas Courty, Antoine Rolet, and Mathieu Blondel.Large-scale optimal transport and mapping estimation

work page 2022
[29]

Metriz- ing weak convergence with maximum mean discrepancies

arXiv:1711. 02283(cit. on p. 4). [SG+23] Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Sch¨ olkopf, and Lester Mackey. “Metriz- ing weak convergence with maximum mean discrepancies”. In:Journal of Machine Learning Research24.184 (2023), pp. 1–20 (cit. on p. 15). [Son+21] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano E...

work page 2023
[30]

On integral probability metrics, \phi-divergences and binary classification

arXiv: 0901.2698(cit. on p. 15). [Sta59] Aart J Stam. “Some inequalities satisfied by the quantities of information of Fisher and Shan- non”. In:Information and Control2.2 (1959), pp. 101–112 (cit. on p. 8). [Sut+13] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. “On the importance of ini- tialization and momentum in deep learning”. In:I...

work page internal anchor Pith review Pith/arXiv arXiv 1959
[31]

Parameter tuning and model selection in optimal transport with semi-dual Brenier formulation

Springer, 2009 (cit. on pp. 5, 9, 14, 15). [VV22] Adrien Vacher and Fran¸ cois-Xavier Vialard. “Parameter tuning and model selection in optimal transport with semi-dual Brenier formulation”. In:Advances in Neural Information Processing Systems35 (2022), pp. 23098–23108 (cit. on p. 4). [WB19] Jonathan Weed and Francis Bach. “Sharp asymptotic and finite-sam...

work page 2009