arxiv: 2605.04255 · v1 · submitted 2026-05-05 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

Entropic Riemannian Neural Optimal Transport

Alessandro Micheli , Silvia Sapora , Anthea Monod , Samir Bhatt

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords entropic optimal transportRiemannian manifoldsSchrödinger potentialneural networksbarycentric projectionheat kernel smoothingamortized mapsGibbs coupling

0 comments

The pith

A neural pullback of the target Schrödinger potential recovers the entropic optimal coupling on Riemannian manifolds for any fixed regularization strength.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many machine learning problems require transporting probability distributions supported on curved spaces such as spheres, rotation groups, or hyperbolic spaces, where Euclidean distances produce distorted results. The paper shows that a neural network can learn a single target-side Schrödinger potential through a pullback parameterization, from which the full Gibbs coupling and several intrinsic transport surrogates are derived. When the network class is sufficiently expressive, this recovers the exact entropic optimal coupling in strong probabilistic metrics. The resulting barycentric surrogates converge in L² on Cartan-Hadamard manifolds and the heat-smoothed surrogates remain stable while becoming unbiased as the heat time vanishes, all while supporting out-of-sample evaluation on compactly supported data.

Core claim

Entropic RNOT parameterizes the target Schrödinger potential with a neural pullback, recovers the induced Gibbs coupling, and constructs intrinsic surrogates such as barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditionals on stochastically complete manifolds. For fixed regularization ε > 0 the hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. Consequently the barycentric surrogates converge in L² and the heat-smoothed surrogates are stable at fixed heat time while asymptotically unbiased as heat time vanishes, all for compactly supported data on possibly noncompact manifolds.

What carries the argument

The neural pullback parameterization of the single target-side Schrödinger potential, which directly induces the Gibbs coupling used to build all transport surrogates.

If this is right

The recovered coupling matches the true entropic optimum in strong metrics for any fixed regularization ε.
Barycentric projections converge in L² to the entropic map on Cartan-Hadamard manifolds.
Heat-smoothed conditional surrogates remain stable for fixed heat time and converge to the true coupling as heat time approaches zero.
The method supplies amortized out-of-sample maps that scale better than discrete manifold Sinkhorn while respecting intrinsic geometry.
Empirical results match or exceed Euclidean, tangent-space, and log-Euclidean baselines on S², SO(3), SPD(3), SE(3), and H².

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pullback idea could be adapted to amortize other intrinsic operations such as geodesic interpolation or curvature-aware regression on the same manifolds.
The stability guarantees imply that practitioners can safely use the heat-smoothed version even when exact convergence is not needed, by selecting an appropriate fixed heat time.
Extending the framework to time-varying or multi-marginal problems would be feasible if the potential network can be conditioned on additional inputs without losing the recovery property.
The approach opens the possibility of jointly learning transport maps and manifold-valued generative models by sharing the same neural potential parameterization.

Load-bearing premise

The neural network hypothesis class can approximate the target Schrödinger potential to arbitrary accuracy on the given manifold.

What would settle it

On a low-dimensional manifold such as the sphere with a small discrete support where the exact entropic coupling is computable by Sinkhorn, compute the total-variation or Wasserstein distance between the neural-recovered coupling and the exact coupling and check whether it is larger than a fixed positive threshold for any chosen ε > 0.

Figures

Figures reproduced from arXiv: 2605.04255 by Alessandro Micheli, Anthea Monod, Samir Bhatt, Silvia Sapora.

**Figure 1.** Figure 1: Qualitative transport on S 2 (top) and H2 (bottom). Each row shows the source distribution µ, the target distribution ν, and the corresponding method output. On S 2 , the right panel shows the heat-smoothed output distribution νˆheat. On H2 , the right panel shows the pushforward (Tˆ bar)#µ induced by the barycentric transport map. Orange contours indicate the target distribution, and sparse geodesic curve… view at source ↗

**Figure 2.** Figure 2: Scalability with increasing support size. Left: training wall-clock time. Center: peak GPU view at source ↗

**Figure 3.** Figure 3: Post-docking pose refinement on complex 3HWW (human carbonic anhydrase II). view at source ↗

**Figure 4.** Figure 4: Entropic RNOT versus non-entropic RNOT on view at source ↗

read the original abstract

Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schr\"odinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon>0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a neural parameterization for entropic OT on manifolds plus recovery theorems, but the density of the pullback class is the part that needs verification.

read the letter

The core contribution is a framework that learns a target-side Schrödinger potential on a Riemannian manifold by composing a neural net with a pullback map, then uses the resulting Gibbs coupling to build two kinds of transport surrogates: barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditionals on stochastically complete ones. For fixed regularization they claim strong recovery of the entropic coupling, which implies L2 convergence of the barycentric maps and stability plus asymptotic unbiasedness for the heat-smoothed ones. That combination of intrinsic entropic regularization, amortized neural evaluation, and explicit surrogate constructions with stated guarantees does not appear in the cited prior work, so the unification itself is new. The empirical section reports that the method matches or beats Euclidean, tangent-space, and log-Euclidean baselines on S2, SO(3), SPD(3), SE(3), and H2, and that it scales better than discrete manifold Sinkhorn while also showing a docking refinement example on SE(3). Those are concrete points in its favor if the numbers hold up under scrutiny.

Referee Report

2 major / 3 minor

Summary. The paper introduces Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a framework that learns a single target-side Schrödinger potential via neural pullback parameterization on Riemannian manifolds, recovers the induced Gibbs coupling from entropic OT, and constructs intrinsic transport surrogates (barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditionals on stochastically complete manifolds). For fixed regularization ε > 0, it claims to prove that the hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics, implying L² convergence of barycentric surrogates and stability/asymptotic unbiasedness of heat-smoothed surrogates for compactly supported data on possibly noncompact manifolds. Empirical evaluations show competitive or superior performance to Euclidean, tangent-space, and log-Euclidean baselines on S², SO(3), SPD(3), SE(3), and H², with favorable scaling versus discrete manifold Sinkhorn and an application to SE(3) protein-ligand pose refinement.

Significance. If the central recovery guarantees hold, the work meaningfully unifies entropic regularization with amortized neural OT on curved spaces, providing both theoretical convergence results in probabilistic metrics and practical out-of-sample evaluation without per-instance optimization. The fixed-ε setting yields parameter-free aspects in the recovery statements and falsifiable predictions via the surrogate convergence claims; these are explicit strengths. The approach addresses a clear gap between discrete manifold OT scalability and neural amortization, with potential impact on geometric machine learning tasks.

major comments (2)

[Abstract / recovery theorem] Abstract and the recovery theorem (presumably §4): The strong recovery of the entropic optimal coupling in probabilistic metrics for fixed ε > 0 is load-bearing for all subsequent claims (L² convergence of barycentric surrogates, stability of heat-smoothed surrogates). This recovery requires the neural pullback parameterization to be dense in the space of target-side Schrödinger potentials. Standard NN universality theorems apply to Euclidean domains; the manuscript provides no separate density lemma establishing that the pullback of a Euclidean network through the manifold chart or embedding is dense in C(M) or the relevant weighted continuous functions on a general Riemannian manifold. Without this step, the hypothesis class may lie in a proper closed subspace, undermining the claimed recovery.
[§3 / heat-smoothed surrogate analysis] §3 (hypothesis class definition) and assumptions for heat-smoothed surrogates: The guarantees for heat-smoothed conditional surrogates invoke stochastic completeness of the manifold and vanishing heat time for asymptotic unbiasedness. However, the neural approximation error in the potential must be propagated through the heat kernel; the manuscript does not appear to supply explicit bounds showing that the approximation error remains controlled uniformly as heat time → 0 while keeping ε fixed. This is load-bearing for the unbiasedness claim.

minor comments (3)

[Methods / parameterization] Notation in the methods: The pullback map and its interaction with the manifold metric should be defined with an explicit diagram or equation (e.g., Eq. for the composition) to clarify how the Euclidean network output is lifted back to the manifold.
[Experiments] Empirical section: Tables reporting performance on S², SO(3), etc., would be strengthened by including standard deviations over random seeds or statistical tests against baselines, rather than point estimates alone.
[Theorem statements] The abstract states that guarantees hold for compactly supported data; the main text should explicitly restate the support assumption in the theorem statements for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review. The two major comments identify substantive gaps in the supporting analysis for the recovery theorem and the heat-smoothed surrogate guarantees. We agree that both points require additional rigor and will incorporate the requested elements in the revision.

read point-by-point responses

Referee: [Abstract / recovery theorem] Abstract and the recovery theorem (presumably §4): The strong recovery of the entropic optimal coupling in probabilistic metrics for fixed ε > 0 is load-bearing for all subsequent claims (L² convergence of barycentric surrogates, stability of heat-smoothed surrogates). This recovery requires the neural pullback parameterization to be dense in the space of target-side Schrödinger potentials. Standard NN universality theorems apply to Euclidean domains; the manuscript provides no separate density lemma establishing that the pullback of a Euclidean network through the manifold chart or embedding is dense in C(M) or the relevant weighted continuous functions on a general Riemannian manifold. Without this step, the hypothesis class may lie in a proper closed subspace, undermining the claimed recovery.

Authors: We agree that an explicit density result is necessary to justify that the neural pullback class is dense in the relevant function space on the manifold. Although the construction uses smooth embeddings or charts (valid for the compactly supported data and the manifolds considered), the manuscript does not contain a dedicated lemma. In the revision we will add Lemma 4.3 in §4, which shows that the pullback of a dense Euclidean NN class through a smooth embedding yields a dense subset of C(K) for compact K ⊂ M, using the Stone-Weierstrass theorem on manifolds together with standard NN universality on the ambient Euclidean space. This lemma will be invoked directly in the proof of the recovery theorem, leaving all other statements unchanged. revision: yes
Referee: [§3 / heat-smoothed surrogate analysis] §3 (hypothesis class definition) and assumptions for heat-smoothed surrogates: The guarantees for heat-smoothed conditional surrogates invoke stochastic completeness of the manifold and vanishing heat time for asymptotic unbiasedness. However, the neural approximation error in the potential must be propagated through the heat kernel; the manuscript does not appear to supply explicit bounds showing that the approximation error remains controlled uniformly as heat time → 0 while keeping ε fixed. This is load-bearing for the unbiasedness claim.

Authors: The referee correctly notes the absence of quantitative error propagation through the heat kernel. The current argument relies on qualitative continuity of the heat semigroup under stochastic completeness, but does not bound the effect of a fixed potential approximation error as heat time t → 0. We will add Proposition 3.7 in the revised §3 that supplies an explicit uniform bound: the total-variation distance between the true and approximated heat-smoothed conditionals is controlled by the potential error (from the recovery theorem) times a constant depending only on the heat-kernel Lipschitz constant on the compact support, which remains finite for small t. This establishes that the approximation error stays controlled uniformly down to t = 0, supporting the asymptotic unbiasedness claim without altering the fixed-ε setting. revision: yes

Circularity Check

0 steps flagged

No circularity: recovery theorem is conditional on external expressiveness assumption with fixed external regularization

full rationale

The paper states a conditional theorem: for fixed ε>0, if the neural pullback hypothesis class is sufficiently expressive to approximate the target Schrödinger potential arbitrarily well, then it recovers the entropic optimal coupling in strong metrics (with consequences for the surrogates). This assumption is listed explicitly among the weakest assumptions and is not derived from the method itself. No step renames a fitted quantity as a prediction, no self-citation is invoked to establish uniqueness or density of the pullback map, and the derivation chain relies on standard entropic OT theory plus manifold regularity conditions rather than reducing to its own inputs by construction. The skeptic concern identifies a possible missing density lemma but does not exhibit any quoted reduction of the claimed result to a tautology or self-fit.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claim depends on the expressivity of the neural hypothesis class for the Schrödinger potential and on standard properties of entropic OT and heat kernels on Riemannian manifolds; the neural weights and choice of ε are the main fitted or chosen elements.

free parameters (2)

neural network weights
Parameters of the neural network that parameterizes the target-side Schrödinger potential via pullback; learned from data.
regularization ε
Fixed positive regularization parameter that controls the entropic OT problem; chosen by the user.

axioms (2)

domain assumption Standard theory of entropic optimal transport extends to Riemannian manifolds with the stated regularity
Invoked to guarantee existence of the entropic coupling and the Schrödinger potentials.
ad hoc to paper The neural hypothesis class is dense enough to recover the true potential in the limit
Required for the recovery statement to hold for the proposed method.

invented entities (2)

neural pullback parameterization of the Schrödinger potential no independent evidence
purpose: To learn an intrinsic, amortized representation of the target potential on the manifold
New parameterization introduced to enable neural approximation while respecting geometry.
heat-smoothed conditional surrogates no independent evidence
purpose: To convert possibly atomic target measures into absolutely continuous ones on stochastically complete manifolds
Constructed as a stable transport surrogate using the heat kernel.

pith-pipeline@v0.9.0 · 5628 in / 1722 out tokens · 42707 ms · 2026-05-08T17:21:51.927555+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references

[1]

Academic Press, San Diego, CA, 1978

Dimitri P Bertsekas and Steven E Shreve.Stochastic Optimal Control: The Discrete-Time Case, volume 139 ofMathematics in Science and Engineering. Academic Press, San Diego, CA, 1978

1978
[2]

DiffDock: Diffusion steps, twists, and turns for molecular docking.arXiv [q-bio.BM], October 2022

Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. DiffDock: Diffusion steps, twists, and turns for molecular docking.arXiv [q-bio.BM], October 2022

2022
[3]

Sinkhorn distances: Lightspeed computation of optimal trans- port

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans- port. https://proceedings.neurips.cc/paper_files/paper/2013/file/ af21d0c97db2e27e13572cbf59eb343d-Paper.pdf, 2013. Accessed: 2026-5-5

2013
[4]

Optimal transport tools (OTT): A JAX toolbox for all things wasserstein.arXiv [cs.LG], January 2022

Marco Cuturi, Laetitia Meng-Papaxanthos, Yingtao Tian, Charlotte Bunne, Geoff Davis, and Olivier Teboul. Optimal transport tools (OTT): A JAX toolbox for all things wasserstein.arXiv [cs.LG], January 2022

2022
[5]

Semidual regularized optimal transport.SIAM Rev

Marco Cuturi and Gabriel Peyré. Semidual regularized optimal transport.SIAM Rev. Soc. Ind. Appl. Math., 60(4):941–965, January 2018

2018
[6]

3D convolutional neural networks and a CrossDocked dataset for structure-based drug design.ChemRxiv, March 2020

Paul Francoeur, Tomohide Masuda, and David Koes. 3D convolutional neural networks and a CrossDocked dataset for structure-based drug design.ChemRxiv, March 2020

2020
[7]

Analytic and geometric background of recurrence and non-explosion of the brownian motion on riemannian manifolds.Bull

Alexander Grigor’yan. Analytic and geometric background of recurrence and non-explosion of the brownian motion on riemannian manifolds.Bull. New Ser. Am. Math. Soc., 36(2):135–249, February 1999

1999
[8]

AMS/IP Studies in Advanced Mathematics

Alexander Grigoryan.Heat kernel and analysis on manifolds. AMS/IP Studies in Advanced Mathematics. American Mathematical Society, Providence, RI, January 2013

2013
[9]

Filling Riemannian manifolds.J

Mikhael Gromov. Filling Riemannian manifolds.J. Differential Geom., 18(1):1–147, January 1983. 10

1983
[10]

Probability and Its Applications

Olav Kallenberg.Foundations of modern probability. Probability and Its Applications. Springer, New York, NY , 2 edition, January 2002

2002
[11]

Non-euclidean universal approximation.arXiv [cs.LG], pages 10635–10646, June 2020

Anastasis Kratsios and Eugene Bilokopytov. Non-euclidean universal approximation.arXiv [cs.LG], pages 10635–10646, June 2020

2020
[12]

Graduate Texts in Mathematics

John Lee.Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer, New York, NY , 2 edition, May 2012

2012
[13]

Graduate Texts in Mathematics

John M Lee.Introduction to Riemannian manifolds. Graduate Texts in Mathematics. Springer International Publishing, Cham, Switzerland, 2 edition, January 2019

2019
[14]

Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Netw., 6(6):861–867, January 1993

Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Netw., 6(6):861–867, January 1993

1993
[15]

The expressive power of neural networks: A view from the width.arXiv [cs.LG], September 2017

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width.arXiv [cs.LG], September 2017

2017
[16]

GNINA 1.3: the next increment in molecular docking with deep learning.J

Andrew T McNutt, Yanjing Li, Rocco Meli, Rishal Aggarwal, and David Ryan Koes. GNINA 1.3: the next increment in molecular docking with deep learning.J. Cheminform., 17(1):28, March 2025

2025
[17]

Universal kernels.J

C Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels.J. Mach. Learn. Res., 7:2651–2667, December 2006

2006
[18]

Riemannian neural optimal transport.arXiv [cs.LG], February 2026

Alessandro Micheli, Yueqi Cao, Anthea Monod, and Samir Bhatt. Riemannian neural optimal transport.arXiv [cs.LG], February 2026

2026
[19]

Introduction to entropic optimal transport

Marcel Nutz. Introduction to entropic optimal transport. 2021

2021
[20]

Computational optimal transport.arXiv [stat.ML], March 2018

Gabriel Peyré and Marco Cuturi. Computational optimal transport.arXiv [stat.ML], March 2018

2018
[21]

Cambridge University Press, Cambridge, England, January 2025

Yury Polyanskiy and Yihong Wu.Information theory: From coding to learning. Cambridge University Press, Cambridge, England, January 2025

2025
[22]

Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening.PLoS One, 11(5):e0155183, May 2016

Rodrigo Quiroga and Marcos A Villarreal. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening.PLoS One, 11(5):e0155183, May 2016

2016
[23]

Implicit Riemannian Concave Potential Maps

Danilo J Rezende and Sébastien Racanière. Implicit Riemannian Concave Potential Maps. arXiv [stat.ML], October 2021

2021
[24]

Probability measures on metric spaces of nonpositive curvature, 2003

Karl-Theodor Sturm. Probability measures on metric spaces of nonpositive curvature, 2003

2003
[25]

O(n)-invariant riemannian metrics on SPD matrices

Yann Thanwerdas and Xavier Pennec. O(n)-invariant riemannian metrics on SPD matrices. Linear Algebra Appl., 661:163–201, March 2023

2023
[26]

AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J

Oleg Trott and Arthur J Olson. AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J. Comput. Chem., 31(2):455–461, January 2010

2010
[27]

Springer, Berlin, 2009

Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematis- chen Wissenschaften. Springer, Berlin, 2009

2009
[28]

Ricci and Scalar Curvatures

Ding-Xuan Zhou. Universality of deep convolutional neural networks.Appl. Comput. Harmon. Anal., 48(2):787–794, March 2020. 11 A Review on Entropic Riemannian Optimal Transport Throughout, (M, g) denotes a complete, possibly noncompact, p-dimensional Riemannian manifold with geodesic distance d. We write P(M) for the Borel probability measures on M and vol...

2020
[29]

All training inputs are derived from the docking pipeline described above

The weight α is set to the median radius of gyration of training ligands, α= ˜rgyr = 2.44 Å (computed from 117 training complexes), so that a rotation of θ rad 27 contributes approximately rgyr ·θ Å to the distance, making rotational and translational components commensurate with heavy-atom RMSD. All training inputs are derived from the docking pipeline d...

2090