Onsager-Machlup Posterior Transport for Deep Gaussian Processes
Pith reviewed 2026-06-30 16:06 UTC · model grok-4.3
The pith
Deep Gaussian process inference is recast as learning a deterministic transport map from a reference measure to inducing variables, regularized by the Onsager-Machlup action on a Doob-bridged diffusion path.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OM-Path realises posterior transport for DGPs by solving Song's probability-flow ODE on DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients, the path regulariser is the Onsager-Machlup action, and the finite-ε objective is the negative log unnormalised density of the tempered Doob-bridge path posterior whose small-noise limit is the MAP path under the Freidlin-Wentzell LDP.
What carries the argument
The Onsager-Machlup action serving as path regulariser inside the probability-flow ODE sampler derived from the Doob-bridged forward SDE.
If this is right
- OM-Path records statistically significant wins over DBVI on the power data set (NLL 0.012, p=0.014) and on the protein data set (RMSE 0.716 vs 0.764, NLL 1.086 vs 1.149, p=0.002).
- The method ties DBVI on yacht and qsar and loses to DBVI on the three smallest noisy data sets.
- The two strict path-space ELBO ablations (FFJORD log-det and OM-regularised CNF) fail to beat DBVI on any UCI metric.
- In the reported regime, lowering variance of the path objective is more effective than exact density tracking.
Where Pith is reading between the lines
- The closed-form bridge drift may extend to other latent-variable models that already employ Doob bridges, removing the need for score matching in those settings.
- Because the method is deterministic, it could be combined with existing inducing-point selection heuristics to scale beyond the current UCI benchmarks without additional stochasticity.
- If the Onsager-Machlup regulariser remains stable under changes of the reference diffusion, the same transport construction could be tested on DGP classification or on non-Gaussian likelihoods.
Load-bearing premise
At the finite noise level used for training, the objective exactly equals the negative log unnormalised density of the tempered Doob-bridge path posterior.
What would settle it
An explicit counter-example in which the small-noise limit of the OM-Path objective fails to recover the MAP path of the tempered Doob-bridge posterior would falsify Theorem 1.
Figures
read the original abstract
Approximate inference over inducing variables is the central computational bottleneck of Deep Gaussian Processes (DGPs). Existing methods either fit an explicit density $q_\phi(\bU)$ by an ELBO (DSVI, IPVI, DDVI, DBVI) or sample by MCMC (SGHMC). We instead frame DGP inference as \emph{posterior transport}: learn a deterministic sampler that maps a tractable reference measure to posterior-relevant inducing variables, regularised by a path prior derived from the Doob-bridged reference diffusion. Our realisation, \textbf{OM-Path} (formally FBVI-bridge-Path), uses Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients (no score matching) and the path regulariser is the \textbf{Onsager--Machlup action}. At the finite-$\epsilon$ value used at training, the objective is the negative log unnormalised density of a tempered Doob-bridge path posterior, and Theorem 1 identifies it with the same posterior's small-noise MAP path via the Freidlin--Wentzell LDP. Two strict path-space ELBO variants on the same bridge backbone (FFJORD log-det; OM-regularised CNF) are derived as ablations. Under a matched-seed paired Wilcoxon test against DBVI on seven UCI regression benchmarks, OM-Path delivers statistically significant wins on the two largest datasets (\textit{power}: $p\!=\!0.014$, NLL $\mathbf{0.012}$ matching the DSVI baseline of $0.017$; \textit{protein}: $p\!=\!0.002$, RMSE $\mathbf{0.716}$ vs.\ $0.764$, NLL $\mathbf{1.086}$ vs.\ $1.149$), statistical ties on \textit{yacht} / \textit{qsar}, and concedes \textit{boston} / \textit{energy} / \textit{concrete} to DBVI on small-$N$ noisy data. The strict-ELBO variants do not clear DBVI on any UCI metric: in this regime, reducing the variance of the path objective dominates exact-density tracking.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OM-Path (FBVI-bridge-Path), a posterior transport approach to approximate inference for deep Gaussian processes. It learns a deterministic map from a reference measure to inducing variables via Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE, with closed-form reference drift and Onsager-Machlup action as the path regularizer. At the finite-ε training value the objective equals the negative log unnormalised density of a tempered Doob-bridge path posterior; Theorem 1 identifies this with the small-noise MAP path under the Freidlin-Wentzell LDP. On seven UCI regression benchmarks, matched-seed paired Wilcoxon tests versus DBVI yield statistically significant wins on the two largest datasets (power, protein), ties on two others, and losses on the three smallest noisy sets; two strict path-space ELBO ablations on the same backbone fail to beat DBVI on any metric.
Significance. If the finite-ε identification and LDP link hold, the work supplies a targeted, variance-reducing regularizer for path-space DGP inference that improves upon density-tracking baselines on larger data while reusing existing bridge machinery. The explicit reporting of per-dataset wins/ties/losses together with p-values and the ablation result that exact-density tracking underperforms the regularized objective constitute concrete, falsifiable contributions.
minor comments (2)
- The abstract states that the finite-ε objective equals the negative log unnormalised density of the tempered Doob-bridge path posterior, but the precise cancellation of terms between the Onsager-Machlup regularizer and the probability-flow ODE Jacobian at finite ε is not shown in the provided summary; a short derivation or reference to the relevant equation would clarify this equivalence.
- The two strict path-space ELBO ablations (FFJORD log-det and OM-regularised CNF) are mentioned only by name; a one-sentence statement of how each differs from the OM-Path objective in the loss or in the density estimator would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed summary of our work on OM-Path and for the positive assessment of its contributions, including the explicit per-dataset results and ablation findings. The recommendation of minor revision is noted, and we are happy to address any editorial or minor points that may arise. No specific major comments were raised in the report.
Circularity Check
No significant circularity; derivation self-contained against external benchmarks
full rationale
The paper re-uses the Doob-bridged SDE and marginal coefficients from prior DBVI work (cited, not self-derived here) to obtain a closed-form reference drift, then introduces the Onsager-Machlup action as an independent path regularizer. The finite-ε objective is explicitly set to the negative log density of the tempered Doob-bridge path posterior, with Theorem 1 invoking the standard Freidlin-Wentzell LDP (external mathematical fact) to link it to the MAP path; this is a justification, not a reduction by construction. Empirical claims rest on matched-seed Wilcoxon tests against DBVI on seven UCI datasets with reported p-values, wins, ties, and losses. No quoted equation or step equates a prediction or central result to a fitted input or self-citation chain. The central contribution (OM regularizer on the bridge backbone) retains independent content.
Axiom & Free-Parameter Ledger
free parameters (1)
- finite-epsilon
axioms (2)
- domain assumption Freidlin-Wentzell large deviation principle identifies the finite-epsilon objective with the small-noise MAP path
- standard math Doob bridge marginal coefficients yield a closed-form reference drift without score matching
Forward citations
Cited by 2 Pith papers
-
What Do Flow-Based Inverse Solvers Approximate? A Posterior-Transport View
Flow-based inverse solvers approximate posterior transport via source reweighting; guidance methods incur large Wasserstein bias while a new velocity-correction solver produces diverse samples with correlated uncertai...
-
What Do Flow-Based Inverse Solvers Approximate? A Posterior-Transport View
Provides a posterior-transport analysis of flow-based inverse solvers, demonstrating that source reweighting yields exact posteriors while trajectory guidance methods are zeroth-order/Gaussian/proximal approximations ...
Reference graph
Works this paper leans on
-
[1]
URLhttps://openreview.net/forum?id=zyRmy0Ch9a. Jongmin Yoon and Juho Lee. Sequential flow straightening for generative modeling.arXiv preprint arXiv:2402.06461, 2024. Haibin Yu, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, and Zhongxiang Dai. Implicit posterior variational inference for deep gaussian processes.Advances in neural information pro- c...
-
[2]
Resize each image to224×224and apply the standard ImageNet mean/std normalisation (Fashion-MNIST is replicated from grayscale to 3 channels for ResNet compatibility)
-
[3]
Forward through an ImageNet-pretrained ResNet-50 (IMAGENET1K V2weights,80.86% ImageNet top-1) with the final classification head removed, yielding a2048-dimensional penultimate feature per image; this is done once and cached
-
[4]
tied-best
Train a 2-layer DGP head (M= 128inducing, hidden width64) end-to-end on the cached features with each of the four VI methods (DSVI, DBVI, FBVI, FBVI-bridge),T= 50 epochs, Adam at10 −2, batch size1024. The feature extractor is identical across methods, so this experiment isolates the contribution of the variational head. We use the same ResNet-50 V2 weight...
2048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.