pith. sign in

arxiv: 2605.05606 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· math.PR

Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows

Pith reviewed 2026-05-08 05:46 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PR
keywords stochastic differential equationsvariational inferenceneural networkssmoothingsparse observationsKolmogorov equationposterior samplingevidence lower bound
0
0 comments X

The pith

A neural network solves a Kolmogorov backward equation with jumps to define the posterior SDE for smoothing sparse observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a variational method for inferring both latent state trajectories and parameters in stochastic differential equations when only sparse noisy observations are available. It characterizes the posterior dynamics through a conditional backward score that is the gradient of a function solving the Kolmogorov backward equation, with multiplicative updates applied at each observation time. Neural networks are trained to satisfy both the continuous PDE and the discrete jump conditions, yielding a posterior SDE whose drift is adjusted by the learned score while diffusion remains unchanged. This construction supports direct sampling of trajectories and supplies an evidence lower bound that is maximized in an EM-style procedure to estimate the original SDE parameters. The resulting procedure avoids path degeneracy and scales better than MCMC on nonlinear test systems with very few observations.

Core claim

The posterior SDE for the latent process is given by the same diffusion coefficient as the prior but a modified drift equal to the prior drift plus the diffusion coefficient times the conditional backward-in-time score. This score is the gradient of the solution to the Kolmogorov backward equation subject to multiplicative jump conditions at observation times. The score function is approximated by a neural network trained to satisfy both the governing PDE and the jump conditions, which integrates the continuous-time dynamics with the discrete Bayesian updates induced by the data. The same approximation supplies a likelihood-based objective whose maximization yields an evidence lower bound on

What carries the argument

The conditional backward-in-time score, defined as the gradient of the solution to the Kolmogorov backward equation with multiplicative jump conditions at observation times, which is learned by a neural network and used to modify the drift of the posterior SDE.

If this is right

  • Posterior trajectories can be sampled directly from the induced SDE without the path-degeneracy problems of particle-based smoothers.
  • SDE parameters are learned by maximizing the evidence lower bound using Monte Carlo samples from the approximate posterior.
  • Continuous-time dynamics and discrete observation updates are handled in a single training objective that scales to nonlinear systems with very few data points.
  • The same neural score can be reused across multiple observation sequences once the network is trained for a given prior SDE.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The PDE-based training procedure could be extended to irregular or asynchronous observation times by adjusting the jump condition schedule accordingly.
  • Replacing the neural network with other function approximators might reduce training cost while preserving the same posterior SDE guarantee.
  • The framework naturally suggests a score-based generative model for imputing missing segments in time series governed by known SDEs.

Load-bearing premise

A neural network can be trained to accurately approximate the solution of the Kolmogorov backward equation with multiplicative jump conditions at observation times, and that this approximation produces a valid posterior SDE whose samples yield a tight evidence lower bound.

What would settle it

On a linear Gaussian SDE whose exact smoothing distribution is known in closed form, generate posterior trajectories from the learned neural score and check whether their empirical mean and covariance match the analytical Kalman smoother results to within Monte Carlo error.

Figures

Figures reproduced from arXiv: 2605.05606 by Arnab Ganguly, Yu Wang.

Figure 2
Figure 2. Figure 2: Parameter inference for the 4D ring-coupled double well system. view at source ↗
Figure 1
Figure 1. Figure 1: Parameter inference and trajectory results for the stochastic Michaelis–Menten system. view at source ↗
Figure 3
Figure 3. Figure 3: Trajectory samples for the 4D double well system. view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Smoothings Parameter inference: Proposed method. We next estimate the drift parameter κ using the proposed variational method view at source ↗
Figure 5
Figure 5. Figure 5: Parameter inferences via our proposed method. view at source ↗
Figure 6
Figure 6. Figure 6: Parameter inferences via FDM-based method. view at source ↗
read the original abstract

Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, noisy observations. Classical smoothing methods for this problem are often limited by path degeneracy and poor scalability. In this work, we developed a novel method based on characterization of the posterior SDE in terms of conditional backward-in-time score defined as the gradient of a function solving a Kolmogorov backward equation with multiplicative updates at observation times. We learn this conditional score using neural networks trained to satisfy both the governing PDE and the observation-induced jump conditions, thereby integrating continuous-time dynamics with discrete Bayesian updates. The resulting score induces a posterior SDE with the same diffusion coefficient but a modified drift, enabling efficient posterior trajectory sampling. We further derive a likelihood-based objective for learning the SDE parameters, yielding an evidence lower bound (ELBO) for joint state smoothing and parameter estimation. This leads to a variational EM-style procedure, where the neural conditional score is optimized to approximate the smoothing distribution, followed by a maximization step over the SDE parameters using samples from the induced posterior. Experiments on nonlinear systems demonstrate accurate and stable inference with a very few observations demonstrating significant improved scalability compared to classical MCMC methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a variational approach to smoothing and parameter estimation for SDEs observed at sparse, noisy times. It characterizes the posterior process via a conditional score (gradient of a function solving the backward Kolmogorov PDE between observations, with multiplicative likelihood jumps at data times), approximates this score by a neural network trained on a residual loss enforcing the PDE and jumps, induces a posterior SDE with unmodified diffusion but corrected drift, and derives an ELBO for joint state-parameter learning that is optimized in a variational EM loop. Experiments on nonlinear examples claim accurate trajectories and better scalability than MCMC.

Significance. If the neural approximation to the score is sufficiently accurate, the method would supply a scalable, differentiable alternative to particle MCMC or forward-filtering backward-sampling for nonlinear SDE inference from very few observations, while preserving the exact diffusion coefficient of the prior. The explicit coupling of continuous-time PDE residuals with discrete Bayesian updates is technically distinctive and could generalize to other jump-diffusion or marked-point-process settings.

major comments (2)
  1. [§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.
  2. [§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.
minor comments (2)
  1. [Introduction] The acronym 'Dynamic Neural Flow' is used without a precise definition distinguishing it from neural ODEs or continuous normalizing flows; a short clarifying paragraph would help readers.
  2. [§2] Notation for the jump operator at observation times (multiplicative update) should be introduced once with an explicit equation rather than inline text.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major concerns point by point below. Where appropriate, we have made revisions to strengthen the presentation of the method's limitations and the experimental validation.

read point-by-point responses
  1. Referee: [§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.

    Authors: We agree that the manuscript lacks a priori error bounds, convergence rates, or a detailed analysis of how residual errors propagate to the score approximation and subsequently to the ELBO. The method relies on the neural network minimizing the residual loss to approximate the true conditional score, which in turn defines the posterior SDE. While the variational framework ensures that the ELBO is a valid lower bound for any approximate score (as it corresponds to a valid variational distribution induced by the approximate drift), the tightness depends on the approximation quality. In the revised version, we will include a new subsection discussing these limitations, particularly the challenges in nonlinear systems with sparse data where sharp features may arise. We will also add empirical evaluations of the residual norms on test trajectories to provide quantitative evidence of approximation accuracy. However, establishing general theoretical convergence rates for arbitrary nonlinear SDEs is a significant theoretical undertaking that we consider beyond the current scope. revision: partial

  2. Referee: [§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.

    Authors: We concur that incorporating quantitative diagnostics would enhance the credibility of the experimental results. In the updated manuscript, we will augment the experimental section with: (i) plots and statistics of the residual loss evaluated at additional test points not used during training; (ii) for the linear-Gaussian SDE example (which we will add if not already present, or emphasize), direct comparisons of the learned score against the analytically computable exact score; and (iii) effective sample size (ESS) metrics for the trajectories sampled from the induced posterior SDE to assess the efficiency and quality of the variational approximation. These additions will allow readers to better evaluate the approximation quality and distinguish between successful fitting and true methodological advantages over MCMC. revision: yes

standing simulated objections not resolved
  • Providing a priori error bounds or convergence rates for the neural approximation of the conditional score in general nonlinear SDEs.

Circularity Check

0 steps flagged

No significant circularity: standard posterior SDE characterization plus PINN-style approximation and variational ELBO

full rationale

The derivation begins from the known characterization of the posterior SDE for diffusion processes (same diffusion, drift modified by the conditional score), where the score is the gradient of the solution to the backward Kolmogorov PDE with multiplicative jump conditions at observation times. Neural networks are trained via residual loss to approximate this PDE solution, which is an independent approximation step (physics-informed NN) rather than a redefinition. The ELBO is obtained by applying the standard variational inference bound to the induced posterior process for joint smoothing and parameter estimation, yielding a variational EM procedure. No equation reduces by construction to a fitted input, no load-bearing self-citation is invoked to justify uniqueness or the core identity, and the central claims remain falsifiable via approximation error on the PDE and tightness of the ELBO on held-out data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The method rests on the existence of a well-defined conditional backward score for the posterior SDE and on the capacity of neural networks to approximate solutions to the associated Kolmogorov PDE with jumps. No new physical entities are postulated.

free parameters (1)
  • Neural network weights
    Parameters of the dynamic neural flow trained to match the score PDE and jump conditions.
axioms (1)
  • domain assumption The posterior process admits a characterization via the gradient of a function solving the Kolmogorov backward equation with multiplicative updates at observation times.
    Invoked in the abstract as the starting point for the neural approximation.
invented entities (1)
  • Dynamic Neural Flow no independent evidence
    purpose: Neural network architecture that approximates the conditional backward-in-time score.
    Introduced as the trainable model for the score function.

pith-pipeline@v0.9.0 · 5532 in / 1393 out tokens · 26672 ms · 2026-05-08T05:46:29.857852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

    Yacine Aït-Sahalia. Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

  2. [2]

    Closed-form likelihood expansions for multivariate diffusions.Ann

    Yacine Aït-Sahalia. Closed-form likelihood expansions for multivariate diffusions.Ann. Statist., 36(2):906–937, 2008

  3. [3]

    Approximate inference for continuous-time Markov processes

    Cédric Archambeau and Manfred Opper. Approximate inference for continuous-time Markov processes. InBayesian time series models, pages 125–140. Cambridge Univ. Press, Cambridge, 2011

  4. [4]

    Roberts, and Paul Fearnhead

    Alexandros Beskos, Omiros Papaspiliopoulos, Gareth O. Roberts, and Paul Fearnhead. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes.J. R. Stat. Soc. Ser. B Stat. Methodol., 68(3):333–382, 2006. With discussions and a reply by the authors

  5. [5]

    MCMC methods for diffusion bridges.Stoch

    Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen V oss. MCMC methods for diffusion bridges.Stoch. Dyn., 8(3):319–350, 2008

  6. [6]

    Jaya P. N. Bishwal.Parameter estimation in stochastic volatility models. Springer, Cham, 2022

  7. [7]

    Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

    Mogens Bladt and Michael Sø rensen. Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

  8. [8]

    A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

    Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

  9. [9]

    On the approximate maximum likelihood estimation for diffusion processes.Ann

    Jinyuan Chang and Song Xi Chen. On the approximate maximum likelihood estimation for diffusion processes.Ann. Statist., 39(6):2820–2851, 2011

  10. [10]

    Xiaoli Chen, Liu Yang, Jinqiao Duan, and George Em Karniadakis. Solving inverse stochastic problems from discrete particle observations using the fokker–planck equation and physics- informed neural networks.SIAM Journal on Scientific Computing, 43(3):B811–B830, 2021

  11. [11]

    Approximate inference in latent gaussian-markov models from continuous time observations

    Botond Cseke, Manfred Opper, and Guido Sanguinetti. Approximate inference in latent gaussian-markov models from continuous time observations. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 971–979. Curran Associates, Inc., 2013

  12. [12]

    Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process

    Bernard Delyon and Ying Hu. Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process. Appl., 116(11):1660–1675, 2006

  13. [13]

    Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

    Ola Elerian, Siddhartha Chib, and Neil Shephard. Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

  14. [14]

    Paul Fearnhead, Omiros Papaspiliopoulos, and Gareth O. Roberts. Particle filters for partially observed diffusions.J. R. Stat. Soc. Ser. B Stat. Methodol., 70(4):755–777, 2008

  15. [15]

    Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J

    Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J. Mach. Learn. Res., 24:Paper No. [159], 39, 2023

  16. [16]

    Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

    Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

  17. [17]

    Golightly and D

    A. Golightly and D. J. Wilkinson. Bayesian inference for stochastic kinetic models using a diffusion approximation.Biometrics, 61(3):781–788, 2005

  18. [18]

    Golightly and D

    A. Golightly and D. J. Wilkinson. Bayesian inference for nonlinear multivariate diffusion models observed with error.Comput. Statist. Data Anal., 52(3):1674–1693, 2008

  19. [19]

    Wilkinson

    Andrew Golightly and Darren J. Wilkinson. Bayesian parameter inference for stochastic biochemical network models using particle markov chain monte carlo.Interface Focus, 1(6):807– 820, 2011. 10

  20. [20]

    Iacus.Simulation and inference for stochastic differential equations

    Stefano M. Iacus.Simulation and inference for stochastic differential equations. Springer Series in Statistics. Springer, New York, 2008. With R examples

  21. [21]

    Estimation of an ergodic diffusion from discrete observations.Scand

    Mathieu Kessler. Estimation of an ergodic diffusion from discrete observations.Scand. J. Statist., 24(2):211–229, 1997

  22. [22]

    Kutoyants.Statistical inference for ergodic diffusion processes

    Yury A. Kutoyants.Statistical inference for ergodic diffusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004

  23. [23]

    Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann

    Chenxu Li. Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann. Statist., 41(3):1350–1380, 2013

  24. [24]

    On generating Monte Carlo samples of continuous diffusion bridges.J

    Ming Lin, Rong Chen, and Per Mykland. On generating Monte Carlo samples of continuous diffusion bridges.J. Amer. Statist. Assoc., 105(490):820–838, 2010

  25. [25]

    Margossian, Loucas Pillaud-Vivien, and Lawrence K

    Charles C. Margossian, Loucas Pillaud-Vivien, and Lawrence K. Saul. Variational inference for uncertainty quantification: an analysis of trade-offs.J. Mach. Learn. Res., 26:1–41, 2025

  26. [26]

    Margossian, Yuling Yao, Robert M

    Chirag Modi, Charles C. Margossian, Yuling Yao, Robert M. Gower, David M. Blei, and Lawrence K. Saul. Variational inference with Gaussian score matching. InAdvances in Neural Information Processing Systems (NeurIPS) 36, pages 29935–29950, 2023

  27. [27]

    The variational Gaussian approximation revisited

    Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786–792, 2009

  28. [28]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

  29. [29]

    G. O. Roberts and O. Stramer. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm.Biometrika, 88(3):603–621, 2001

  30. [30]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  31. [31]

    A variational approach to path estimation and parameter inference of hidden diffusion processes.J

    Tobias Sutter, Arnab Ganguly, and Heinz Koeppl. A variational approach to path estimation and parameter inference of hidden diffusion processes.J. Mach. Learn. Res., 17:Paper No. 190, 37, 2016

  32. [32]

    Neural Stochastic Differ- ential Equations: Deep Latent Gaussian Models in the Diffu- sion Limit, 2019

    Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883, 2019

  33. [33]

    Theoretical guarantees for sampling and inference in generative models with latent diffusions

    Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. InConference on Learning Theory, pages 3084–3114. PMLR, 2019

  34. [34]

    Whitaker, Andrew Golightly, Richard J

    Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Bayesian inference for diffusion-driven mixed-effects models.Bayesian Anal., 12(2):435–463, 2017

  35. [35]

    Whitaker, Andrew Golightly, Richard J

    Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Improved bridge constructs for stochastic differential equations.Stat. Comput., 27(4):885–900, 2017

  36. [36]

    Z T 0 ∥u(X1(t))∥2dt # .(21) In particular, when the initial distributions coincide, the KL divergence reduces to KL(Π1 ∥Π 0) = 1 2 EΠ1

    Nakahiro Yoshida. Estimation for diffusion processes from discrete observation.J. Multivariate Anal., 41(2):220–242, 1992. 11 A Kolmogorov Forward and Backward Equations The generatorAof a diffusion processX(t)is given by Af(x) = dX i=1 bi(x) ∂f(x) ∂xi + 1 2 dX i,j=1 aij(x) ∂2f(x) ∂xi∂xj , f∈C 2(Rd,R),(18) witha(x) def =σ(x)σ(x) ⊤. For notational convenie...