pith. sign in

arxiv: 2605.23434 · v2 · pith:HN4RIII5new · submitted 2026-05-22 · 💻 cs.LG

Onsager-Machlup Posterior Transport for Deep Gaussian Processes

Pith reviewed 2026-06-30 16:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords deep gaussian processesposterior transportonsager-machlup actiondoob bridgevariational inferenceinducing variablesprobability flow ode
0
0 comments X

The pith

Deep Gaussian process inference is recast as learning a deterministic transport map from a reference measure to inducing variables, regularized by the Onsager-Machlup action on a Doob-bridged diffusion path.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to treat approximate inference over inducing variables in deep Gaussian processes as a posterior transport problem instead of fitting an explicit variational density. A deterministic sampler is learned that pushes a tractable reference measure forward to posterior-relevant inducing variables while being regularized by a path prior constructed from the Doob-bridged reference diffusion. The concrete method, OM-Path, applies the probability-flow ODE to the Doob-bridged forward SDE; its reference drift is obtained in closed form from bridge marginal coefficients and the path regularizer is the Onsager-Machlup action. At the finite noise level used during training the objective equals the negative log unnormalised density of a tempered Doob-bridge path posterior; Theorem 1 shows that this objective identifies, in the small-noise limit, with the MAP path of the same posterior under the Freidlin-Wentzell large-deviation principle. On seven UCI regression benchmarks the resulting sampler yields statistically significant gains over the DBVI baseline on the two largest data sets while the stricter path-space ELBO ablations do not.

Core claim

OM-Path realises posterior transport for DGPs by solving Song's probability-flow ODE on DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients, the path regulariser is the Onsager-Machlup action, and the finite-ε objective is the negative log unnormalised density of the tempered Doob-bridge path posterior whose small-noise limit is the MAP path under the Freidlin-Wentzell LDP.

What carries the argument

The Onsager-Machlup action serving as path regulariser inside the probability-flow ODE sampler derived from the Doob-bridged forward SDE.

If this is right

  • OM-Path records statistically significant wins over DBVI on the power data set (NLL 0.012, p=0.014) and on the protein data set (RMSE 0.716 vs 0.764, NLL 1.086 vs 1.149, p=0.002).
  • The method ties DBVI on yacht and qsar and loses to DBVI on the three smallest noisy data sets.
  • The two strict path-space ELBO ablations (FFJORD log-det and OM-regularised CNF) fail to beat DBVI on any UCI metric.
  • In the reported regime, lowering variance of the path objective is more effective than exact density tracking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The closed-form bridge drift may extend to other latent-variable models that already employ Doob bridges, removing the need for score matching in those settings.
  • Because the method is deterministic, it could be combined with existing inducing-point selection heuristics to scale beyond the current UCI benchmarks without additional stochasticity.
  • If the Onsager-Machlup regulariser remains stable under changes of the reference diffusion, the same transport construction could be tested on DGP classification or on non-Gaussian likelihoods.

Load-bearing premise

At the finite noise level used for training, the objective exactly equals the negative log unnormalised density of the tempered Doob-bridge path posterior.

What would settle it

An explicit counter-example in which the small-noise limit of the OM-Path objective fails to recover the MAP path of the tempered Doob-bridge posterior would falsify Theorem 1.

Figures

Figures reproduced from arXiv: 2605.23434 by Delu Zeng, Jian Xu, John Paisley, Qibin Zhao.

Figure 1
Figure 1. Figure 1: Bridge marginal coefficients ϕ(s) (mean attenuation) and κ(s) (variance) from the closed￾form ODE system in Prop. 1, for three diffusion strengths λ. With our default λ = g = σ0 = 1 we obtain ϕ(1)≈0.37 and κ(1)≈0.50, both of which appear in the reference drift equation 6 and in the bridge initial distribution sampled in line 4 of Algorithm 1. For posterior transport the situation is reversed: the unnormali… view at source ↗
Figure 2
Figure 2. Figure 2: Per-dataset test RMSE at L = 2 (mean±std, 10 seeds). Bars exceeding 1.5 are clipped and the actual value annotated above. FBVI-bridge-Path and DBVI are jointly the strongest meth￾ods on the small/medium UCI datasets, with the per-dataset winner alternating between them; see Appendix Q for the matched-seed Wilcoxon verdict. 10 20 30 40 50 60 70 80 90 100 Epoch 1.05 1.10 1.15 1.20 1.25 1.30 1.35 Test NLL Tes… view at source ↗
Figure 3
Figure 3. Figure 3: Test-NLL trajectories on protein (L = 2). Solid lines are seed-mean; shaded bands are ±1 std. FBVI-bridge-Path (OM, our main method, 10 seeds) and FBVI-bridge (implicit-q, 3 seeds re￾run under matched protocol) converge to NLL ≈1.10 by epoch 100, with the OM variant nominally lowest (1.086±0.034); DSVI plateaus higher at ≈1.16; unbridged FBVI has a single-seed instability event near epoch 68. SGHMC / IPVI … view at source ↗
Figure 4
Figure 4. Figure 4: Depth-scaling test RMSE on the seven small/medium UCI datasets (mean over 10 seeds; [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Few-step inference: test RMSE as a function of Euler integration steps at evaluation time [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Simple regret vs. BO iteration on four synthetic black-box functions, mean [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
read the original abstract

Approximate inference over inducing variables is the central computational bottleneck of Deep Gaussian Processes (DGPs). Existing methods either fit an explicit density $q_\phi(\bU)$ by an ELBO (DSVI, IPVI, DDVI, DBVI) or sample by MCMC (SGHMC). We instead frame DGP inference as \emph{posterior transport}: learn a deterministic sampler that maps a tractable reference measure to posterior-relevant inducing variables, regularised by a path prior derived from the Doob-bridged reference diffusion. Our realisation, \textbf{OM-Path} (formally FBVI-bridge-Path), uses Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients (no score matching) and the path regulariser is the \textbf{Onsager--Machlup action}. At the finite-$\epsilon$ value used at training, the objective is the negative log unnormalised density of a tempered Doob-bridge path posterior, and Theorem 1 identifies it with the same posterior's small-noise MAP path via the Freidlin--Wentzell LDP. Two strict path-space ELBO variants on the same bridge backbone (FFJORD log-det; OM-regularised CNF) are derived as ablations. Under a matched-seed paired Wilcoxon test against DBVI on seven UCI regression benchmarks, OM-Path delivers statistically significant wins on the two largest datasets (\textit{power}: $p\!=\!0.014$, NLL $\mathbf{0.012}$ matching the DSVI baseline of $0.017$; \textit{protein}: $p\!=\!0.002$, RMSE $\mathbf{0.716}$ vs.\ $0.764$, NLL $\mathbf{1.086}$ vs.\ $1.149$), statistical ties on \textit{yacht} / \textit{qsar}, and concedes \textit{boston} / \textit{energy} / \textit{concrete} to DBVI on small-$N$ noisy data. The strict-ELBO variants do not clear DBVI on any UCI metric: in this regime, reducing the variance of the path objective dominates exact-density tracking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces OM-Path (FBVI-bridge-Path), a posterior transport approach to approximate inference for deep Gaussian processes. It learns a deterministic map from a reference measure to inducing variables via Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE, with closed-form reference drift and Onsager-Machlup action as the path regularizer. At the finite-ε training value the objective equals the negative log unnormalised density of a tempered Doob-bridge path posterior; Theorem 1 identifies this with the small-noise MAP path under the Freidlin-Wentzell LDP. On seven UCI regression benchmarks, matched-seed paired Wilcoxon tests versus DBVI yield statistically significant wins on the two largest datasets (power, protein), ties on two others, and losses on the three smallest noisy sets; two strict path-space ELBO ablations on the same backbone fail to beat DBVI on any metric.

Significance. If the finite-ε identification and LDP link hold, the work supplies a targeted, variance-reducing regularizer for path-space DGP inference that improves upon density-tracking baselines on larger data while reusing existing bridge machinery. The explicit reporting of per-dataset wins/ties/losses together with p-values and the ablation result that exact-density tracking underperforms the regularized objective constitute concrete, falsifiable contributions.

minor comments (2)
  1. The abstract states that the finite-ε objective equals the negative log unnormalised density of the tempered Doob-bridge path posterior, but the precise cancellation of terms between the Onsager-Machlup regularizer and the probability-flow ODE Jacobian at finite ε is not shown in the provided summary; a short derivation or reference to the relevant equation would clarify this equivalence.
  2. The two strict path-space ELBO ablations (FFJORD log-det and OM-regularised CNF) are mentioned only by name; a one-sentence statement of how each differs from the OM-Path objective in the loss or in the density estimator would aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of our work on OM-Path and for the positive assessment of its contributions, including the explicit per-dataset results and ablation findings. The recommendation of minor revision is noted, and we are happy to address any editorial or minor points that may arise. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained against external benchmarks

full rationale

The paper re-uses the Doob-bridged SDE and marginal coefficients from prior DBVI work (cited, not self-derived here) to obtain a closed-form reference drift, then introduces the Onsager-Machlup action as an independent path regularizer. The finite-ε objective is explicitly set to the negative log density of the tempered Doob-bridge path posterior, with Theorem 1 invoking the standard Freidlin-Wentzell LDP (external mathematical fact) to link it to the MAP path; this is a justification, not a reduction by construction. Empirical claims rest on matched-seed Wilcoxon tests against DBVI on seven UCI datasets with reported p-values, wins, ties, and losses. No quoted equation or step equates a prediction or central result to a fitted input or self-citation chain. The central contribution (OM regularizer on the bridge backbone) retains independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central construction rests on the Freidlin-Wentzell large-deviation principle applied to the tempered Doob-bridge path measure and on the existence of a closed-form reference drift for the Doob bridge; no new entities are postulated and the only free parameter mentioned is the finite training epsilon.

free parameters (1)
  • finite-epsilon
    The temperature-like epsilon used to define the tempered Doob-bridge path posterior at training time.
axioms (2)
  • domain assumption Freidlin-Wentzell large deviation principle identifies the finite-epsilon objective with the small-noise MAP path
    Invoked by Theorem 1 to equate the training objective to the MAP path of the posterior.
  • standard math Doob bridge marginal coefficients yield a closed-form reference drift without score matching
    Stated as enabling the probability-flow ODE implementation.

pith-pipeline@v0.9.1-grok · 5958 in / 1612 out tokens · 52900 ms · 2026-06-30T16:06:19.492224+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. What Do Flow-Based Inverse Solvers Approximate? A Posterior-Transport View

    cs.CV 2026-06 unverdicted novelty 7.0

    Flow-based inverse solvers approximate posterior transport via source reweighting; guidance methods incur large Wasserstein bias while a new velocity-correction solver produces diverse samples with correlated uncertai...

  2. What Do Flow-Based Inverse Solvers Approximate? A Posterior-Transport View

    cs.CV 2026-06 unverdicted novelty 7.0

    Provides a posterior-transport analysis of flow-based inverse solvers, demonstrating that source reweighting yields exact posteriors while trajectory guidance methods are zeroth-order/Gaussian/proximal approximations ...

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages · cited by 1 Pith paper

  1. [1]

    pointwise

    URLhttps://openreview.net/forum?id=zyRmy0Ch9a. Jongmin Yoon and Juho Lee. Sequential flow straightening for generative modeling.arXiv preprint arXiv:2402.06461, 2024. Haibin Yu, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, and Zhongxiang Dai. Implicit posterior variational inference for deep gaussian processes.Advances in neural information pro- c...

  2. [2]

    Resize each image to224×224and apply the standard ImageNet mean/std normalisation (Fashion-MNIST is replicated from grayscale to 3 channels for ResNet compatibility)

  3. [3]

    Forward through an ImageNet-pretrained ResNet-50 (IMAGENET1K V2weights,80.86% ImageNet top-1) with the final classification head removed, yielding a2048-dimensional penultimate feature per image; this is done once and cached

  4. [4]

    tied-best

    Train a 2-layer DGP head (M= 128inducing, hidden width64) end-to-end on the cached features with each of the four VI methods (DSVI, DBVI, FBVI, FBVI-bridge),T= 50 epochs, Adam at10 −2, batch size1024. The feature extractor is identical across methods, so this experiment isolates the contribution of the variational head. We use the same ResNet-50 V2 weight...