Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows
Pith reviewed 2026-05-08 05:46 UTC · model grok-4.3
The pith
A neural network solves a Kolmogorov backward equation with jumps to define the posterior SDE for smoothing sparse observations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The posterior SDE for the latent process is given by the same diffusion coefficient as the prior but a modified drift equal to the prior drift plus the diffusion coefficient times the conditional backward-in-time score. This score is the gradient of the solution to the Kolmogorov backward equation subject to multiplicative jump conditions at observation times. The score function is approximated by a neural network trained to satisfy both the governing PDE and the jump conditions, which integrates the continuous-time dynamics with the discrete Bayesian updates induced by the data. The same approximation supplies a likelihood-based objective whose maximization yields an evidence lower bound on
What carries the argument
The conditional backward-in-time score, defined as the gradient of the solution to the Kolmogorov backward equation with multiplicative jump conditions at observation times, which is learned by a neural network and used to modify the drift of the posterior SDE.
If this is right
- Posterior trajectories can be sampled directly from the induced SDE without the path-degeneracy problems of particle-based smoothers.
- SDE parameters are learned by maximizing the evidence lower bound using Monte Carlo samples from the approximate posterior.
- Continuous-time dynamics and discrete observation updates are handled in a single training objective that scales to nonlinear systems with very few data points.
- The same neural score can be reused across multiple observation sequences once the network is trained for a given prior SDE.
Where Pith is reading between the lines
- The PDE-based training procedure could be extended to irregular or asynchronous observation times by adjusting the jump condition schedule accordingly.
- Replacing the neural network with other function approximators might reduce training cost while preserving the same posterior SDE guarantee.
- The framework naturally suggests a score-based generative model for imputing missing segments in time series governed by known SDEs.
Load-bearing premise
A neural network can be trained to accurately approximate the solution of the Kolmogorov backward equation with multiplicative jump conditions at observation times, and that this approximation produces a valid posterior SDE whose samples yield a tight evidence lower bound.
What would settle it
On a linear Gaussian SDE whose exact smoothing distribution is known in closed form, generate posterior trajectories from the learned neural score and check whether their empirical mean and covariance match the analytical Kalman smoother results to within Monte Carlo error.
Figures
read the original abstract
Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, noisy observations. Classical smoothing methods for this problem are often limited by path degeneracy and poor scalability. In this work, we developed a novel method based on characterization of the posterior SDE in terms of conditional backward-in-time score defined as the gradient of a function solving a Kolmogorov backward equation with multiplicative updates at observation times. We learn this conditional score using neural networks trained to satisfy both the governing PDE and the observation-induced jump conditions, thereby integrating continuous-time dynamics with discrete Bayesian updates. The resulting score induces a posterior SDE with the same diffusion coefficient but a modified drift, enabling efficient posterior trajectory sampling. We further derive a likelihood-based objective for learning the SDE parameters, yielding an evidence lower bound (ELBO) for joint state smoothing and parameter estimation. This leads to a variational EM-style procedure, where the neural conditional score is optimized to approximate the smoothing distribution, followed by a maximization step over the SDE parameters using samples from the induced posterior. Experiments on nonlinear systems demonstrate accurate and stable inference with a very few observations demonstrating significant improved scalability compared to classical MCMC methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a variational approach to smoothing and parameter estimation for SDEs observed at sparse, noisy times. It characterizes the posterior process via a conditional score (gradient of a function solving the backward Kolmogorov PDE between observations, with multiplicative likelihood jumps at data times), approximates this score by a neural network trained on a residual loss enforcing the PDE and jumps, induces a posterior SDE with unmodified diffusion but corrected drift, and derives an ELBO for joint state-parameter learning that is optimized in a variational EM loop. Experiments on nonlinear examples claim accurate trajectories and better scalability than MCMC.
Significance. If the neural approximation to the score is sufficiently accurate, the method would supply a scalable, differentiable alternative to particle MCMC or forward-filtering backward-sampling for nonlinear SDE inference from very few observations, while preserving the exact diffusion coefficient of the prior. The explicit coupling of continuous-time PDE residuals with discrete Bayesian updates is technically distinctive and could generalize to other jump-diffusion or marked-point-process settings.
major comments (2)
- [§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.
- [§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.
minor comments (2)
- [Introduction] The acronym 'Dynamic Neural Flow' is used without a precise definition distinguishing it from neural ODEs or continuous normalizing flows; a short clarifying paragraph would help readers.
- [§2] Notation for the jump operator at observation times (multiplicative update) should be introduced once with an explicit equation rather than inline text.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major concerns point by point below. Where appropriate, we have made revisions to strengthen the presentation of the method's limitations and the experimental validation.
read point-by-point responses
-
Referee: [§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.
Authors: We agree that the manuscript lacks a priori error bounds, convergence rates, or a detailed analysis of how residual errors propagate to the score approximation and subsequently to the ELBO. The method relies on the neural network minimizing the residual loss to approximate the true conditional score, which in turn defines the posterior SDE. While the variational framework ensures that the ELBO is a valid lower bound for any approximate score (as it corresponds to a valid variational distribution induced by the approximate drift), the tightness depends on the approximation quality. In the revised version, we will include a new subsection discussing these limitations, particularly the challenges in nonlinear systems with sparse data where sharp features may arise. We will also add empirical evaluations of the residual norms on test trajectories to provide quantitative evidence of approximation accuracy. However, establishing general theoretical convergence rates for arbitrary nonlinear SDEs is a significant theoretical undertaking that we consider beyond the current scope. revision: partial
-
Referee: [§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.
Authors: We concur that incorporating quantitative diagnostics would enhance the credibility of the experimental results. In the updated manuscript, we will augment the experimental section with: (i) plots and statistics of the residual loss evaluated at additional test points not used during training; (ii) for the linear-Gaussian SDE example (which we will add if not already present, or emphasize), direct comparisons of the learned score against the analytically computable exact score; and (iii) effective sample size (ESS) metrics for the trajectories sampled from the induced posterior SDE to assess the efficiency and quality of the variational approximation. These additions will allow readers to better evaluate the approximation quality and distinguish between successful fitting and true methodological advantages over MCMC. revision: yes
- Providing a priori error bounds or convergence rates for the neural approximation of the conditional score in general nonlinear SDEs.
Circularity Check
No significant circularity: standard posterior SDE characterization plus PINN-style approximation and variational ELBO
full rationale
The derivation begins from the known characterization of the posterior SDE for diffusion processes (same diffusion, drift modified by the conditional score), where the score is the gradient of the solution to the backward Kolmogorov PDE with multiplicative jump conditions at observation times. Neural networks are trained via residual loss to approximate this PDE solution, which is an independent approximation step (physics-informed NN) rather than a redefinition. The ELBO is obtained by applying the standard variational inference bound to the induced posterior process for joint smoothing and parameter estimation, yielding a variational EM procedure. No equation reduces by construction to a fitted input, no load-bearing self-citation is invoked to justify uniqueness or the core identity, and the central claims remain falsifiable via approximation error on the PDE and tightness of the ELBO on held-out data.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network weights
axioms (1)
- domain assumption The posterior process admits a characterization via the gradient of a function solving the Kolmogorov backward equation with multiplicative updates at observation times.
invented entities (1)
-
Dynamic Neural Flow
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Yacine Aït-Sahalia. Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002
work page 2002
-
[2]
Closed-form likelihood expansions for multivariate diffusions.Ann
Yacine Aït-Sahalia. Closed-form likelihood expansions for multivariate diffusions.Ann. Statist., 36(2):906–937, 2008
work page 2008
-
[3]
Approximate inference for continuous-time Markov processes
Cédric Archambeau and Manfred Opper. Approximate inference for continuous-time Markov processes. InBayesian time series models, pages 125–140. Cambridge Univ. Press, Cambridge, 2011
work page 2011
-
[4]
Alexandros Beskos, Omiros Papaspiliopoulos, Gareth O. Roberts, and Paul Fearnhead. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes.J. R. Stat. Soc. Ser. B Stat. Methodol., 68(3):333–382, 2006. With discussions and a reply by the authors
work page 2006
-
[5]
MCMC methods for diffusion bridges.Stoch
Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen V oss. MCMC methods for diffusion bridges.Stoch. Dyn., 8(3):319–350, 2008
work page 2008
-
[6]
Jaya P. N. Bishwal.Parameter estimation in stochastic volatility models. Springer, Cham, 2022
work page 2022
-
[7]
Mogens Bladt and Michael Sø rensen. Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014
work page 2014
-
[8]
Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024
work page 2024
-
[9]
On the approximate maximum likelihood estimation for diffusion processes.Ann
Jinyuan Chang and Song Xi Chen. On the approximate maximum likelihood estimation for diffusion processes.Ann. Statist., 39(6):2820–2851, 2011
work page 2011
-
[10]
Xiaoli Chen, Liu Yang, Jinqiao Duan, and George Em Karniadakis. Solving inverse stochastic problems from discrete particle observations using the fokker–planck equation and physics- informed neural networks.SIAM Journal on Scientific Computing, 43(3):B811–B830, 2021
work page 2021
-
[11]
Approximate inference in latent gaussian-markov models from continuous time observations
Botond Cseke, Manfred Opper, and Guido Sanguinetti. Approximate inference in latent gaussian-markov models from continuous time observations. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 971–979. Curran Associates, Inc., 2013
work page 2013
-
[12]
Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process
Bernard Delyon and Ying Hu. Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process. Appl., 116(11):1660–1675, 2006
work page 2006
-
[13]
Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001
Ola Elerian, Siddhartha Chib, and Neil Shephard. Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001
work page 2001
-
[14]
Paul Fearnhead, Omiros Papaspiliopoulos, and Gareth O. Roberts. Particle filters for partially observed diffusions.J. R. Stat. Soc. Ser. B Stat. Methodol., 70(4):755–777, 2008
work page 2008
-
[15]
Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J. Mach. Learn. Res., 24:Paper No. [159], 39, 2023
work page 2023
-
[16]
Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025
Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025
work page 2025
-
[17]
A. Golightly and D. J. Wilkinson. Bayesian inference for stochastic kinetic models using a diffusion approximation.Biometrics, 61(3):781–788, 2005
work page 2005
-
[18]
A. Golightly and D. J. Wilkinson. Bayesian inference for nonlinear multivariate diffusion models observed with error.Comput. Statist. Data Anal., 52(3):1674–1693, 2008
work page 2008
- [19]
-
[20]
Iacus.Simulation and inference for stochastic differential equations
Stefano M. Iacus.Simulation and inference for stochastic differential equations. Springer Series in Statistics. Springer, New York, 2008. With R examples
work page 2008
-
[21]
Estimation of an ergodic diffusion from discrete observations.Scand
Mathieu Kessler. Estimation of an ergodic diffusion from discrete observations.Scand. J. Statist., 24(2):211–229, 1997
work page 1997
-
[22]
Kutoyants.Statistical inference for ergodic diffusion processes
Yury A. Kutoyants.Statistical inference for ergodic diffusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004
work page 2004
-
[23]
Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann
Chenxu Li. Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann. Statist., 41(3):1350–1380, 2013
work page 2013
-
[24]
On generating Monte Carlo samples of continuous diffusion bridges.J
Ming Lin, Rong Chen, and Per Mykland. On generating Monte Carlo samples of continuous diffusion bridges.J. Amer. Statist. Assoc., 105(490):820–838, 2010
work page 2010
-
[25]
Margossian, Loucas Pillaud-Vivien, and Lawrence K
Charles C. Margossian, Loucas Pillaud-Vivien, and Lawrence K. Saul. Variational inference for uncertainty quantification: an analysis of trade-offs.J. Mach. Learn. Res., 26:1–41, 2025
work page 2025
-
[26]
Margossian, Yuling Yao, Robert M
Chirag Modi, Charles C. Margossian, Yuling Yao, Robert M. Gower, David M. Blei, and Lawrence K. Saul. Variational inference with Gaussian score matching. InAdvances in Neural Information Processing Systems (NeurIPS) 36, pages 29935–29950, 2023
work page 2023
-
[27]
The variational Gaussian approximation revisited
Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786–792, 2009
work page 2009
-
[28]
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019
work page 2019
-
[29]
G. O. Roberts and O. Stramer. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm.Biometrika, 88(3):603–621, 2001
work page 2001
-
[30]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
work page 2021
-
[31]
A variational approach to path estimation and parameter inference of hidden diffusion processes.J
Tobias Sutter, Arnab Ganguly, and Heinz Koeppl. A variational approach to path estimation and parameter inference of hidden diffusion processes.J. Mach. Learn. Res., 17:Paper No. 190, 37, 2016
work page 2016
-
[32]
Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883, 2019
-
[33]
Theoretical guarantees for sampling and inference in generative models with latent diffusions
Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. InConference on Learning Theory, pages 3084–3114. PMLR, 2019
work page 2019
-
[34]
Whitaker, Andrew Golightly, Richard J
Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Bayesian inference for diffusion-driven mixed-effects models.Bayesian Anal., 12(2):435–463, 2017
work page 2017
-
[35]
Whitaker, Andrew Golightly, Richard J
Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Improved bridge constructs for stochastic differential equations.Stat. Comput., 27(4):885–900, 2017
work page 2017
-
[36]
Nakahiro Yoshida. Estimation for diffusion processes from discrete observation.J. Multivariate Anal., 41(2):220–242, 1992. 11 A Kolmogorov Forward and Backward Equations The generatorAof a diffusion processX(t)is given by Af(x) = dX i=1 bi(x) ∂f(x) ∂xi + 1 2 dX i,j=1 aij(x) ∂2f(x) ∂xi∂xj , f∈C 2(Rd,R),(18) witha(x) def =σ(x)σ(x) ⊤. For notational convenie...
work page 1992
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.