Recognition: unknown
Entropic Riemannian Neural Optimal Transport
Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3
The pith
A neural pullback of the target Schrödinger potential recovers the entropic optimal coupling on Riemannian manifolds for any fixed regularization strength.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Entropic RNOT parameterizes the target Schrödinger potential with a neural pullback, recovers the induced Gibbs coupling, and constructs intrinsic surrogates such as barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditionals on stochastically complete manifolds. For fixed regularization ε > 0 the hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. Consequently the barycentric surrogates converge in L² and the heat-smoothed surrogates are stable at fixed heat time while asymptotically unbiased as heat time vanishes, all for compactly supported data on possibly noncompact manifolds.
What carries the argument
The neural pullback parameterization of the single target-side Schrödinger potential, which directly induces the Gibbs coupling used to build all transport surrogates.
If this is right
- The recovered coupling matches the true entropic optimum in strong metrics for any fixed regularization ε.
- Barycentric projections converge in L² to the entropic map on Cartan-Hadamard manifolds.
- Heat-smoothed conditional surrogates remain stable for fixed heat time and converge to the true coupling as heat time approaches zero.
- The method supplies amortized out-of-sample maps that scale better than discrete manifold Sinkhorn while respecting intrinsic geometry.
- Empirical results match or exceed Euclidean, tangent-space, and log-Euclidean baselines on S², SO(3), SPD(3), SE(3), and H².
Where Pith is reading between the lines
- The same pullback idea could be adapted to amortize other intrinsic operations such as geodesic interpolation or curvature-aware regression on the same manifolds.
- The stability guarantees imply that practitioners can safely use the heat-smoothed version even when exact convergence is not needed, by selecting an appropriate fixed heat time.
- Extending the framework to time-varying or multi-marginal problems would be feasible if the potential network can be conditioned on additional inputs without losing the recovery property.
- The approach opens the possibility of jointly learning transport maps and manifold-valued generative models by sharing the same neural potential parameterization.
Load-bearing premise
The neural network hypothesis class can approximate the target Schrödinger potential to arbitrary accuracy on the given manifold.
What would settle it
On a low-dimensional manifold such as the sphere with a small discrete support where the exact entropic coupling is computable by Sinkhorn, compute the total-variation or Wasserstein distance between the neural-recovered coupling and the exact coupling and check whether it is larger than a fixed positive threshold for any chosen ε > 0.
Figures
read the original abstract
Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schr\"odinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon>0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a framework that learns a single target-side Schrödinger potential via neural pullback parameterization on Riemannian manifolds, recovers the induced Gibbs coupling from entropic OT, and constructs intrinsic transport surrogates (barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditionals on stochastically complete manifolds). For fixed regularization ε > 0, it claims to prove that the hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics, implying L² convergence of barycentric surrogates and stability/asymptotic unbiasedness of heat-smoothed surrogates for compactly supported data on possibly noncompact manifolds. Empirical evaluations show competitive or superior performance to Euclidean, tangent-space, and log-Euclidean baselines on S², SO(3), SPD(3), SE(3), and H², with favorable scaling versus discrete manifold Sinkhorn and an application to SE(3) protein-ligand pose refinement.
Significance. If the central recovery guarantees hold, the work meaningfully unifies entropic regularization with amortized neural OT on curved spaces, providing both theoretical convergence results in probabilistic metrics and practical out-of-sample evaluation without per-instance optimization. The fixed-ε setting yields parameter-free aspects in the recovery statements and falsifiable predictions via the surrogate convergence claims; these are explicit strengths. The approach addresses a clear gap between discrete manifold OT scalability and neural amortization, with potential impact on geometric machine learning tasks.
major comments (2)
- [Abstract / recovery theorem] Abstract and the recovery theorem (presumably §4): The strong recovery of the entropic optimal coupling in probabilistic metrics for fixed ε > 0 is load-bearing for all subsequent claims (L² convergence of barycentric surrogates, stability of heat-smoothed surrogates). This recovery requires the neural pullback parameterization to be dense in the space of target-side Schrödinger potentials. Standard NN universality theorems apply to Euclidean domains; the manuscript provides no separate density lemma establishing that the pullback of a Euclidean network through the manifold chart or embedding is dense in C(M) or the relevant weighted continuous functions on a general Riemannian manifold. Without this step, the hypothesis class may lie in a proper closed subspace, undermining the claimed recovery.
- [§3 / heat-smoothed surrogate analysis] §3 (hypothesis class definition) and assumptions for heat-smoothed surrogates: The guarantees for heat-smoothed conditional surrogates invoke stochastic completeness of the manifold and vanishing heat time for asymptotic unbiasedness. However, the neural approximation error in the potential must be propagated through the heat kernel; the manuscript does not appear to supply explicit bounds showing that the approximation error remains controlled uniformly as heat time → 0 while keeping ε fixed. This is load-bearing for the unbiasedness claim.
minor comments (3)
- [Methods / parameterization] Notation in the methods: The pullback map and its interaction with the manifold metric should be defined with an explicit diagram or equation (e.g., Eq. for the composition) to clarify how the Euclidean network output is lifted back to the manifold.
- [Experiments] Empirical section: Tables reporting performance on S², SO(3), etc., would be strengthened by including standard deviations over random seeds or statistical tests against baselines, rather than point estimates alone.
- [Theorem statements] The abstract states that guarantees hold for compactly supported data; the main text should explicitly restate the support assumption in the theorem statements for clarity.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review. The two major comments identify substantive gaps in the supporting analysis for the recovery theorem and the heat-smoothed surrogate guarantees. We agree that both points require additional rigor and will incorporate the requested elements in the revision.
read point-by-point responses
-
Referee: [Abstract / recovery theorem] Abstract and the recovery theorem (presumably §4): The strong recovery of the entropic optimal coupling in probabilistic metrics for fixed ε > 0 is load-bearing for all subsequent claims (L² convergence of barycentric surrogates, stability of heat-smoothed surrogates). This recovery requires the neural pullback parameterization to be dense in the space of target-side Schrödinger potentials. Standard NN universality theorems apply to Euclidean domains; the manuscript provides no separate density lemma establishing that the pullback of a Euclidean network through the manifold chart or embedding is dense in C(M) or the relevant weighted continuous functions on a general Riemannian manifold. Without this step, the hypothesis class may lie in a proper closed subspace, undermining the claimed recovery.
Authors: We agree that an explicit density result is necessary to justify that the neural pullback class is dense in the relevant function space on the manifold. Although the construction uses smooth embeddings or charts (valid for the compactly supported data and the manifolds considered), the manuscript does not contain a dedicated lemma. In the revision we will add Lemma 4.3 in §4, which shows that the pullback of a dense Euclidean NN class through a smooth embedding yields a dense subset of C(K) for compact K ⊂ M, using the Stone-Weierstrass theorem on manifolds together with standard NN universality on the ambient Euclidean space. This lemma will be invoked directly in the proof of the recovery theorem, leaving all other statements unchanged. revision: yes
-
Referee: [§3 / heat-smoothed surrogate analysis] §3 (hypothesis class definition) and assumptions for heat-smoothed surrogates: The guarantees for heat-smoothed conditional surrogates invoke stochastic completeness of the manifold and vanishing heat time for asymptotic unbiasedness. However, the neural approximation error in the potential must be propagated through the heat kernel; the manuscript does not appear to supply explicit bounds showing that the approximation error remains controlled uniformly as heat time → 0 while keeping ε fixed. This is load-bearing for the unbiasedness claim.
Authors: The referee correctly notes the absence of quantitative error propagation through the heat kernel. The current argument relies on qualitative continuity of the heat semigroup under stochastic completeness, but does not bound the effect of a fixed potential approximation error as heat time t → 0. We will add Proposition 3.7 in the revised §3 that supplies an explicit uniform bound: the total-variation distance between the true and approximated heat-smoothed conditionals is controlled by the potential error (from the recovery theorem) times a constant depending only on the heat-kernel Lipschitz constant on the compact support, which remains finite for small t. This establishes that the approximation error stays controlled uniformly down to t = 0, supporting the asymptotic unbiasedness claim without altering the fixed-ε setting. revision: yes
Circularity Check
No circularity: recovery theorem is conditional on external expressiveness assumption with fixed external regularization
full rationale
The paper states a conditional theorem: for fixed ε>0, if the neural pullback hypothesis class is sufficiently expressive to approximate the target Schrödinger potential arbitrarily well, then it recovers the entropic optimal coupling in strong metrics (with consequences for the surrogates). This assumption is listed explicitly among the weakest assumptions and is not derived from the method itself. No step renames a fitted quantity as a prediction, no self-citation is invoked to establish uniqueness or density of the pullback map, and the derivation chain relies on standard entropic OT theory plus manifold regularity conditions rather than reducing to its own inputs by construction. The skeptic concern identifies a possible missing density lemma but does not exhibit any quoted reduction of the claimed result to a tautology or self-fit.
Axiom & Free-Parameter Ledger
free parameters (2)
- neural network weights
- regularization ε
axioms (2)
- domain assumption Standard theory of entropic optimal transport extends to Riemannian manifolds with the stated regularity
- ad hoc to paper The neural hypothesis class is dense enough to recover the true potential in the limit
invented entities (2)
-
neural pullback parameterization of the Schrödinger potential
no independent evidence
-
heat-smoothed conditional surrogates
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Academic Press, San Diego, CA, 1978
Dimitri P Bertsekas and Steven E Shreve.Stochastic Optimal Control: The Discrete-Time Case, volume 139 ofMathematics in Science and Engineering. Academic Press, San Diego, CA, 1978
1978
-
[2]
DiffDock: Diffusion steps, twists, and turns for molecular docking.arXiv [q-bio.BM], October 2022
Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. DiffDock: Diffusion steps, twists, and turns for molecular docking.arXiv [q-bio.BM], October 2022
2022
-
[3]
Sinkhorn distances: Lightspeed computation of optimal trans- port
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans- port. https://proceedings.neurips.cc/paper_files/paper/2013/file/ af21d0c97db2e27e13572cbf59eb343d-Paper.pdf, 2013. Accessed: 2026-5-5
2013
-
[4]
Optimal transport tools (OTT): A JAX toolbox for all things wasserstein.arXiv [cs.LG], January 2022
Marco Cuturi, Laetitia Meng-Papaxanthos, Yingtao Tian, Charlotte Bunne, Geoff Davis, and Olivier Teboul. Optimal transport tools (OTT): A JAX toolbox for all things wasserstein.arXiv [cs.LG], January 2022
2022
-
[5]
Semidual regularized optimal transport.SIAM Rev
Marco Cuturi and Gabriel Peyré. Semidual regularized optimal transport.SIAM Rev. Soc. Ind. Appl. Math., 60(4):941–965, January 2018
2018
-
[6]
3D convolutional neural networks and a CrossDocked dataset for structure-based drug design.ChemRxiv, March 2020
Paul Francoeur, Tomohide Masuda, and David Koes. 3D convolutional neural networks and a CrossDocked dataset for structure-based drug design.ChemRxiv, March 2020
2020
-
[7]
Analytic and geometric background of recurrence and non-explosion of the brownian motion on riemannian manifolds.Bull
Alexander Grigor’yan. Analytic and geometric background of recurrence and non-explosion of the brownian motion on riemannian manifolds.Bull. New Ser. Am. Math. Soc., 36(2):135–249, February 1999
1999
-
[8]
AMS/IP Studies in Advanced Mathematics
Alexander Grigoryan.Heat kernel and analysis on manifolds. AMS/IP Studies in Advanced Mathematics. American Mathematical Society, Providence, RI, January 2013
2013
-
[9]
Filling Riemannian manifolds.J
Mikhael Gromov. Filling Riemannian manifolds.J. Differential Geom., 18(1):1–147, January 1983. 10
1983
-
[10]
Probability and Its Applications
Olav Kallenberg.Foundations of modern probability. Probability and Its Applications. Springer, New York, NY , 2 edition, January 2002
2002
-
[11]
Non-euclidean universal approximation.arXiv [cs.LG], pages 10635–10646, June 2020
Anastasis Kratsios and Eugene Bilokopytov. Non-euclidean universal approximation.arXiv [cs.LG], pages 10635–10646, June 2020
2020
-
[12]
Graduate Texts in Mathematics
John Lee.Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer, New York, NY , 2 edition, May 2012
2012
-
[13]
Graduate Texts in Mathematics
John M Lee.Introduction to Riemannian manifolds. Graduate Texts in Mathematics. Springer International Publishing, Cham, Switzerland, 2 edition, January 2019
2019
-
[14]
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Netw., 6(6):861–867, January 1993
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Netw., 6(6):861–867, January 1993
1993
-
[15]
The expressive power of neural networks: A view from the width.arXiv [cs.LG], September 2017
Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width.arXiv [cs.LG], September 2017
2017
-
[16]
GNINA 1.3: the next increment in molecular docking with deep learning.J
Andrew T McNutt, Yanjing Li, Rocco Meli, Rishal Aggarwal, and David Ryan Koes. GNINA 1.3: the next increment in molecular docking with deep learning.J. Cheminform., 17(1):28, March 2025
2025
-
[17]
Universal kernels.J
C Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels.J. Mach. Learn. Res., 7:2651–2667, December 2006
2006
-
[18]
Riemannian neural optimal transport.arXiv [cs.LG], February 2026
Alessandro Micheli, Yueqi Cao, Anthea Monod, and Samir Bhatt. Riemannian neural optimal transport.arXiv [cs.LG], February 2026
2026
-
[19]
Introduction to entropic optimal transport
Marcel Nutz. Introduction to entropic optimal transport. 2021
2021
-
[20]
Computational optimal transport.arXiv [stat.ML], March 2018
Gabriel Peyré and Marco Cuturi. Computational optimal transport.arXiv [stat.ML], March 2018
2018
-
[21]
Cambridge University Press, Cambridge, England, January 2025
Yury Polyanskiy and Yihong Wu.Information theory: From coding to learning. Cambridge University Press, Cambridge, England, January 2025
2025
-
[22]
Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening.PLoS One, 11(5):e0155183, May 2016
Rodrigo Quiroga and Marcos A Villarreal. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening.PLoS One, 11(5):e0155183, May 2016
2016
-
[23]
Implicit Riemannian Concave Potential Maps
Danilo J Rezende and Sébastien Racanière. Implicit Riemannian Concave Potential Maps. arXiv [stat.ML], October 2021
2021
-
[24]
Probability measures on metric spaces of nonpositive curvature, 2003
Karl-Theodor Sturm. Probability measures on metric spaces of nonpositive curvature, 2003
2003
-
[25]
O(n)-invariant riemannian metrics on SPD matrices
Yann Thanwerdas and Xavier Pennec. O(n)-invariant riemannian metrics on SPD matrices. Linear Algebra Appl., 661:163–201, March 2023
2023
-
[26]
AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J
Oleg Trott and Arthur J Olson. AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J. Comput. Chem., 31(2):455–461, January 2010
2010
-
[27]
Springer, Berlin, 2009
Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematis- chen Wissenschaften. Springer, Berlin, 2009
2009
-
[28]
Ricci and Scalar Curvatures
Ding-Xuan Zhou. Universality of deep convolutional neural networks.Appl. Comput. Harmon. Anal., 48(2):787–794, March 2020. 11 A Review on Entropic Riemannian Optimal Transport Throughout, (M, g) denotes a complete, possibly noncompact, p-dimensional Riemannian manifold with geodesic distance d. We write P(M) for the Borel probability measures on M and vol...
2020
-
[29]
All training inputs are derived from the docking pipeline described above
The weight α is set to the median radius of gyration of training ligands, α= ˜rgyr = 2.44 Å (computed from 117 training complexes), so that a rotation of θ rad 27 contributes approximately rgyr ·θ Å to the distance, making rotational and translational components commensurate with heavy-atom RMSD. All training inputs are derived from the docking pipeline d...
2090
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.