Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows

Arnab Ganguly; Yu Wang

arxiv: 2605.05606 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· math.PR

Variational Smoothing and Inference for SDEs from Sparse Data with Dynamic Neural Flows

Yu Wang , Arnab Ganguly This is my paper

Pith reviewed 2026-05-08 05:46 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PR

keywords stochastic differential equationsvariational inferenceneural networkssmoothingsparse observationsKolmogorov equationposterior samplingevidence lower bound

0 comments

The pith

A neural network solves a Kolmogorov backward equation with jumps to define the posterior SDE for smoothing sparse observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a variational method for inferring both latent state trajectories and parameters in stochastic differential equations when only sparse noisy observations are available. It characterizes the posterior dynamics through a conditional backward score that is the gradient of a function solving the Kolmogorov backward equation, with multiplicative updates applied at each observation time. Neural networks are trained to satisfy both the continuous PDE and the discrete jump conditions, yielding a posterior SDE whose drift is adjusted by the learned score while diffusion remains unchanged. This construction supports direct sampling of trajectories and supplies an evidence lower bound that is maximized in an EM-style procedure to estimate the original SDE parameters. The resulting procedure avoids path degeneracy and scales better than MCMC on nonlinear test systems with very few observations.

Core claim

The posterior SDE for the latent process is given by the same diffusion coefficient as the prior but a modified drift equal to the prior drift plus the diffusion coefficient times the conditional backward-in-time score. This score is the gradient of the solution to the Kolmogorov backward equation subject to multiplicative jump conditions at observation times. The score function is approximated by a neural network trained to satisfy both the governing PDE and the jump conditions, which integrates the continuous-time dynamics with the discrete Bayesian updates induced by the data. The same approximation supplies a likelihood-based objective whose maximization yields an evidence lower bound on

What carries the argument

The conditional backward-in-time score, defined as the gradient of the solution to the Kolmogorov backward equation with multiplicative jump conditions at observation times, which is learned by a neural network and used to modify the drift of the posterior SDE.

If this is right

Posterior trajectories can be sampled directly from the induced SDE without the path-degeneracy problems of particle-based smoothers.
SDE parameters are learned by maximizing the evidence lower bound using Monte Carlo samples from the approximate posterior.
Continuous-time dynamics and discrete observation updates are handled in a single training objective that scales to nonlinear systems with very few data points.
The same neural score can be reused across multiple observation sequences once the network is trained for a given prior SDE.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The PDE-based training procedure could be extended to irregular or asynchronous observation times by adjusting the jump condition schedule accordingly.
Replacing the neural network with other function approximators might reduce training cost while preserving the same posterior SDE guarantee.
The framework naturally suggests a score-based generative model for imputing missing segments in time series governed by known SDEs.

Load-bearing premise

A neural network can be trained to accurately approximate the solution of the Kolmogorov backward equation with multiplicative jump conditions at observation times, and that this approximation produces a valid posterior SDE whose samples yield a tight evidence lower bound.

What would settle it

On a linear Gaussian SDE whose exact smoothing distribution is known in closed form, generate posterior trajectories from the learned neural score and check whether their empirical mean and covariance match the analytical Kalman smoother results to within Monte Carlo error.

Figures

Figures reproduced from arXiv: 2605.05606 by Arnab Ganguly, Yu Wang.

**Figure 2.** Figure 2: Parameter inference for the 4D ring-coupled double well system. view at source ↗

**Figure 1.** Figure 1: Parameter inference and trajectory results for the stochastic Michaelis–Menten system. view at source ↗

**Figure 3.** Figure 3: Trajectory samples for the 4D double well system. view at source ↗

**Figure 4.** Figure 4: Comparison of Smoothings Parameter inference: Proposed method. We next estimate the drift parameter κ using the proposed variational method view at source ↗

**Figure 5.** Figure 5: Parameter inferences via our proposed method. view at source ↗

**Figure 6.** Figure 6: Parameter inferences via FDM-based method. view at source ↗

read the original abstract

Stochastic differential equations (SDEs) provide a flexible framework for modeling temporal dynamics in partially observed systems. A central task is to calibrate such models from data, which requires inferring latent trajectories and parameters from sparse, noisy observations. Classical smoothing methods for this problem are often limited by path degeneracy and poor scalability. In this work, we developed a novel method based on characterization of the posterior SDE in terms of conditional backward-in-time score defined as the gradient of a function solving a Kolmogorov backward equation with multiplicative updates at observation times. We learn this conditional score using neural networks trained to satisfy both the governing PDE and the observation-induced jump conditions, thereby integrating continuous-time dynamics with discrete Bayesian updates. The resulting score induces a posterior SDE with the same diffusion coefficient but a modified drift, enabling efficient posterior trajectory sampling. We further derive a likelihood-based objective for learning the SDE parameters, yielding an evidence lower bound (ELBO) for joint state smoothing and parameter estimation. This leads to a variational EM-style procedure, where the neural conditional score is optimized to approximate the smoothing distribution, followed by a maximization step over the SDE parameters using samples from the induced posterior. Experiments on nonlinear systems demonstrate accurate and stable inference with a very few observations demonstrating significant improved scalability compared to classical MCMC methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Neural approximation of the backward Kolmogorov score with jumps gives a clean variational route to SDE posterior sampling, but the lack of error bounds on the residual training leaves the validity of the induced drift unclear.

read the letter

The paper's core move is to characterize the smoothing distribution via the gradient of a function that solves the backward Kolmogorov equation between observations and receives multiplicative likelihood jumps at the data times. A neural net is trained on a residual loss to satisfy both the PDE and the jumps; the resulting score modifies only the drift of the original SDE, so posterior trajectories can be sampled by integrating the adjusted process forward. They then close the loop with a likelihood-based ELBO that supports joint state and parameter estimation in a variational EM loop. That combination of PDE residual training with jump conditions and the induced posterior SDE is the concrete new piece relative to standard particle or score-based smoothing work. It directly targets the scalability bottleneck that MCMC faces on sparse nonlinear trajectories, and the abstract claims stable results with very few observations, which would be practically useful if it holds. The experiments are presented as demonstrating improved scaling, so the method at least reaches the stage where someone can test it on their own SDE models. The soft spot is exactly where the stress-test note lands: residual training gives no a priori guarantee that the network stays close to the true solution, especially when the backward function develops sharp features between distant observations. Any leftover PDE error feeds straight into the drift and can bias the sampled paths, which in turn loosens the ELBO and undermines the parameter updates. The abstract does not report quantitative checks on approximation error, posterior coverage, or how much the ELBO gap shrinks with network size, so it is still possible the method works mainly on the examples shown and degrades elsewhere. This is aimed at people who already work with latent SDEs in physics or biology and need something faster than particle MCMC for calibration. A reader who knows the backward smoothing literature will see the technical step clearly and can judge whether the neural PDE solver is worth trying. It deserves a serious referee because the procedure is well-motivated, the variational construction is internally consistent, and the practical payoff is real if the numerics can be made reliable; the paper is not ready as-is but is worth the review time to sort out the approximation quality.

Referee Report

2 major / 2 minor

Summary. The paper proposes a variational approach to smoothing and parameter estimation for SDEs observed at sparse, noisy times. It characterizes the posterior process via a conditional score (gradient of a function solving the backward Kolmogorov PDE between observations, with multiplicative likelihood jumps at data times), approximates this score by a neural network trained on a residual loss enforcing the PDE and jumps, induces a posterior SDE with unmodified diffusion but corrected drift, and derives an ELBO for joint state-parameter learning that is optimized in a variational EM loop. Experiments on nonlinear examples claim accurate trajectories and better scalability than MCMC.

Significance. If the neural approximation to the score is sufficiently accurate, the method would supply a scalable, differentiable alternative to particle MCMC or forward-filtering backward-sampling for nonlinear SDE inference from very few observations, while preserving the exact diffusion coefficient of the prior. The explicit coupling of continuous-time PDE residuals with discrete Bayesian updates is technically distinctive and could generalize to other jump-diffusion or marked-point-process settings.

major comments (2)

[§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.
[§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.

minor comments (2)

[Introduction] The acronym 'Dynamic Neural Flow' is used without a precise definition distinguishing it from neural ODEs or continuous normalizing flows; a short clarifying paragraph would help readers.
[§2] Notation for the jump operator at observation times (multiplicative update) should be introduced once with an explicit equation rather than inline text.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major concerns point by point below. Where appropriate, we have made revisions to strengthen the presentation of the method's limitations and the experimental validation.

read point-by-point responses

Referee: [§3] §3 (score approximation and posterior SDE construction): the claim that the trained network yields a valid posterior SDE whose samples produce a tight ELBO rests on the residual loss being small enough that the induced drift error does not corrupt the variational objective. No a priori error bounds, convergence rates, or residual-to-score propagation analysis is supplied; in nonlinear systems the solution can develop sharp features between sparse observations, so residual training alone does not guarantee that the sampled trajectories remain consistent with the true smoothing distribution.

Authors: We agree that the manuscript lacks a priori error bounds, convergence rates, or a detailed analysis of how residual errors propagate to the score approximation and subsequently to the ELBO. The method relies on the neural network minimizing the residual loss to approximate the true conditional score, which in turn defines the posterior SDE. While the variational framework ensures that the ELBO is a valid lower bound for any approximate score (as it corresponds to a valid variational distribution induced by the approximate drift), the tightness depends on the approximation quality. In the revised version, we will include a new subsection discussing these limitations, particularly the challenges in nonlinear systems with sparse data where sharp features may arise. We will also add empirical evaluations of the residual norms on test trajectories to provide quantitative evidence of approximation accuracy. However, establishing general theoretical convergence rates for arbitrary nonlinear SDEs is a significant theoretical undertaking that we consider beyond the current scope. revision: partial
Referee: [§4] §4 (experimental validation): the reported accuracy and stability on nonlinear test systems are presented without quantitative diagnostics of approximation quality (e.g., residual norms at test points, comparison against exact score on linear-Gaussian cases, or effective sample size of the induced posterior). Without such checks it is impossible to separate genuine improvement over MCMC from cases where the network happens to fit well.

Authors: We concur that incorporating quantitative diagnostics would enhance the credibility of the experimental results. In the updated manuscript, we will augment the experimental section with: (i) plots and statistics of the residual loss evaluated at additional test points not used during training; (ii) for the linear-Gaussian SDE example (which we will add if not already present, or emphasize), direct comparisons of the learned score against the analytically computable exact score; and (iii) effective sample size (ESS) metrics for the trajectories sampled from the induced posterior SDE to assess the efficiency and quality of the variational approximation. These additions will allow readers to better evaluate the approximation quality and distinguish between successful fitting and true methodological advantages over MCMC. revision: yes

standing simulated objections not resolved

Providing a priori error bounds or convergence rates for the neural approximation of the conditional score in general nonlinear SDEs.

Circularity Check

0 steps flagged

No significant circularity: standard posterior SDE characterization plus PINN-style approximation and variational ELBO

full rationale

The derivation begins from the known characterization of the posterior SDE for diffusion processes (same diffusion, drift modified by the conditional score), where the score is the gradient of the solution to the backward Kolmogorov PDE with multiplicative jump conditions at observation times. Neural networks are trained via residual loss to approximate this PDE solution, which is an independent approximation step (physics-informed NN) rather than a redefinition. The ELBO is obtained by applying the standard variational inference bound to the induced posterior process for joint smoothing and parameter estimation, yielding a variational EM procedure. No equation reduces by construction to a fitted input, no load-bearing self-citation is invoked to justify uniqueness or the core identity, and the central claims remain falsifiable via approximation error on the PDE and tightness of the ELBO on held-out data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The method rests on the existence of a well-defined conditional backward score for the posterior SDE and on the capacity of neural networks to approximate solutions to the associated Kolmogorov PDE with jumps. No new physical entities are postulated.

free parameters (1)

Neural network weights
Parameters of the dynamic neural flow trained to match the score PDE and jump conditions.

axioms (1)

domain assumption The posterior process admits a characterization via the gradient of a function solving the Kolmogorov backward equation with multiplicative updates at observation times.
Invoked in the abstract as the starting point for the neural approximation.

invented entities (1)

Dynamic Neural Flow no independent evidence
purpose: Neural network architecture that approximates the conditional backward-in-time score.
Introduced as the trainable model for the score function.

pith-pipeline@v0.9.0 · 5532 in / 1393 out tokens · 26672 ms · 2026-05-08T05:46:29.857852+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

Yacine Aït-Sahalia. Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

work page 2002
[2]

Closed-form likelihood expansions for multivariate diffusions.Ann

Yacine Aït-Sahalia. Closed-form likelihood expansions for multivariate diffusions.Ann. Statist., 36(2):906–937, 2008

work page 2008
[3]

Approximate inference for continuous-time Markov processes

Cédric Archambeau and Manfred Opper. Approximate inference for continuous-time Markov processes. InBayesian time series models, pages 125–140. Cambridge Univ. Press, Cambridge, 2011

work page 2011
[4]

Roberts, and Paul Fearnhead

Alexandros Beskos, Omiros Papaspiliopoulos, Gareth O. Roberts, and Paul Fearnhead. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes.J. R. Stat. Soc. Ser. B Stat. Methodol., 68(3):333–382, 2006. With discussions and a reply by the authors

work page 2006
[5]

MCMC methods for diffusion bridges.Stoch

Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen V oss. MCMC methods for diffusion bridges.Stoch. Dyn., 8(3):319–350, 2008

work page 2008
[6]

Jaya P. N. Bishwal.Parameter estimation in stochastic volatility models. Springer, Cham, 2022

work page 2022
[7]

Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

Mogens Bladt and Michael Sø rensen. Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

work page 2014
[8]

A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

work page 2024
[9]

On the approximate maximum likelihood estimation for diffusion processes.Ann

Jinyuan Chang and Song Xi Chen. On the approximate maximum likelihood estimation for diffusion processes.Ann. Statist., 39(6):2820–2851, 2011

work page 2011
[10]

Xiaoli Chen, Liu Yang, Jinqiao Duan, and George Em Karniadakis. Solving inverse stochastic problems from discrete particle observations using the fokker–planck equation and physics- informed neural networks.SIAM Journal on Scientific Computing, 43(3):B811–B830, 2021

work page 2021
[11]

Approximate inference in latent gaussian-markov models from continuous time observations

Botond Cseke, Manfred Opper, and Guido Sanguinetti. Approximate inference in latent gaussian-markov models from continuous time observations. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 971–979. Curran Associates, Inc., 2013

work page 2013
[12]

Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process

Bernard Delyon and Ying Hu. Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process. Appl., 116(11):1660–1675, 2006

work page 2006
[13]

Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

Ola Elerian, Siddhartha Chib, and Neil Shephard. Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

work page 2001
[14]

Paul Fearnhead, Omiros Papaspiliopoulos, and Gareth O. Roberts. Particle filters for partially observed diffusions.J. R. Stat. Soc. Ser. B Stat. Methodol., 70(4):755–777, 2008

work page 2008
[15]

Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J

Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J. Mach. Learn. Res., 24:Paper No. [159], 39, 2023

work page 2023
[16]

Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

work page 2025
[17]

Golightly and D

A. Golightly and D. J. Wilkinson. Bayesian inference for stochastic kinetic models using a diffusion approximation.Biometrics, 61(3):781–788, 2005

work page 2005
[18]

Golightly and D

A. Golightly and D. J. Wilkinson. Bayesian inference for nonlinear multivariate diffusion models observed with error.Comput. Statist. Data Anal., 52(3):1674–1693, 2008

work page 2008
[19]

Wilkinson

Andrew Golightly and Darren J. Wilkinson. Bayesian parameter inference for stochastic biochemical network models using particle markov chain monte carlo.Interface Focus, 1(6):807– 820, 2011. 10

work page 2011
[20]

Iacus.Simulation and inference for stochastic differential equations

Stefano M. Iacus.Simulation and inference for stochastic differential equations. Springer Series in Statistics. Springer, New York, 2008. With R examples

work page 2008
[21]

Estimation of an ergodic diffusion from discrete observations.Scand

Mathieu Kessler. Estimation of an ergodic diffusion from discrete observations.Scand. J. Statist., 24(2):211–229, 1997

work page 1997
[22]

Kutoyants.Statistical inference for ergodic diffusion processes

Yury A. Kutoyants.Statistical inference for ergodic diffusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004

work page 2004
[23]

Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann

Chenxu Li. Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann. Statist., 41(3):1350–1380, 2013

work page 2013
[24]

On generating Monte Carlo samples of continuous diffusion bridges.J

Ming Lin, Rong Chen, and Per Mykland. On generating Monte Carlo samples of continuous diffusion bridges.J. Amer. Statist. Assoc., 105(490):820–838, 2010

work page 2010
[25]

Margossian, Loucas Pillaud-Vivien, and Lawrence K

Charles C. Margossian, Loucas Pillaud-Vivien, and Lawrence K. Saul. Variational inference for uncertainty quantification: an analysis of trade-offs.J. Mach. Learn. Res., 26:1–41, 2025

work page 2025
[26]

Margossian, Yuling Yao, Robert M

Chirag Modi, Charles C. Margossian, Yuling Yao, Robert M. Gower, David M. Blei, and Lawrence K. Saul. Variational inference with Gaussian score matching. InAdvances in Neural Information Processing Systems (NeurIPS) 36, pages 29935–29950, 2023

work page 2023
[27]

The variational Gaussian approximation revisited

Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786–792, 2009

work page 2009
[28]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

work page 2019
[29]

G. O. Roberts and O. Stramer. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm.Biometrika, 88(3):603–621, 2001

work page 2001
[30]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021
[31]

A variational approach to path estimation and parameter inference of hidden diffusion processes.J

Tobias Sutter, Arnab Ganguly, and Heinz Koeppl. A variational approach to path estimation and parameter inference of hidden diffusion processes.J. Mach. Learn. Res., 17:Paper No. 190, 37, 2016

work page 2016
[32]

Neural Stochastic Differ- ential Equations: Deep Latent Gaussian Models in the Diffu- sion Limit, 2019

Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883, 2019

work page arXiv 1905
[33]

Theoretical guarantees for sampling and inference in generative models with latent diffusions

Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. InConference on Learning Theory, pages 3084–3114. PMLR, 2019

work page 2019
[34]

Whitaker, Andrew Golightly, Richard J

Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Bayesian inference for diffusion-driven mixed-effects models.Bayesian Anal., 12(2):435–463, 2017

work page 2017
[35]

Whitaker, Andrew Golightly, Richard J

Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Improved bridge constructs for stochastic differential equations.Stat. Comput., 27(4):885–900, 2017

work page 2017
[36]

Z T 0 ∥u(X1(t))∥2dt # .(21) In particular, when the initial distributions coincide, the KL divergence reduces to KL(Π1 ∥Π 0) = 1 2 EΠ1

Nakahiro Yoshida. Estimation for diffusion processes from discrete observation.J. Multivariate Anal., 41(2):220–242, 1992. 11 A Kolmogorov Forward and Backward Equations The generatorAof a diffusion processX(t)is given by Af(x) = dX i=1 bi(x) ∂f(x) ∂xi + 1 2 dX i,j=1 aij(x) ∂2f(x) ∂xi∂xj , f∈C 2(Rd,R),(18) witha(x) def =σ(x)σ(x) ⊤. For notational convenie...

work page 1992

[1] [1]

Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

Yacine Aït-Sahalia. Maximum likelihood estimation of discretely sampled diffusions: a closed- form approximation approach.Econometrica, 70(1):223–262, 2002

work page 2002

[2] [2]

Closed-form likelihood expansions for multivariate diffusions.Ann

Yacine Aït-Sahalia. Closed-form likelihood expansions for multivariate diffusions.Ann. Statist., 36(2):906–937, 2008

work page 2008

[3] [3]

Approximate inference for continuous-time Markov processes

Cédric Archambeau and Manfred Opper. Approximate inference for continuous-time Markov processes. InBayesian time series models, pages 125–140. Cambridge Univ. Press, Cambridge, 2011

work page 2011

[4] [4]

Roberts, and Paul Fearnhead

Alexandros Beskos, Omiros Papaspiliopoulos, Gareth O. Roberts, and Paul Fearnhead. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes.J. R. Stat. Soc. Ser. B Stat. Methodol., 68(3):333–382, 2006. With discussions and a reply by the authors

work page 2006

[5] [5]

MCMC methods for diffusion bridges.Stoch

Alexandros Beskos, Gareth Roberts, Andrew Stuart, and Jochen V oss. MCMC methods for diffusion bridges.Stoch. Dyn., 8(3):319–350, 2008

work page 2008

[6] [6]

Jaya P. N. Bishwal.Parameter estimation in stochastic volatility models. Springer, Cham, 2022

work page 2022

[7] [7]

Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

Mogens Bladt and Michael Sø rensen. Simple simulation of diffusion bridges with application to likelihood inference for diffusions.Bernoulli, 20(2):645–675, 2014

work page 2014

[8] [8]

A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

work page 2024

[9] [9]

On the approximate maximum likelihood estimation for diffusion processes.Ann

Jinyuan Chang and Song Xi Chen. On the approximate maximum likelihood estimation for diffusion processes.Ann. Statist., 39(6):2820–2851, 2011

work page 2011

[10] [10]

Xiaoli Chen, Liu Yang, Jinqiao Duan, and George Em Karniadakis. Solving inverse stochastic problems from discrete particle observations using the fokker–planck equation and physics- informed neural networks.SIAM Journal on Scientific Computing, 43(3):B811–B830, 2021

work page 2021

[11] [11]

Approximate inference in latent gaussian-markov models from continuous time observations

Botond Cseke, Manfred Opper, and Guido Sanguinetti. Approximate inference in latent gaussian-markov models from continuous time observations. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26, pages 971–979. Curran Associates, Inc., 2013

work page 2013

[12] [12]

Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process

Bernard Delyon and Ying Hu. Simulation of conditioned diffusion and application to parameter estimation.Stochastic Process. Appl., 116(11):1660–1675, 2006

work page 2006

[13] [13]

Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

Ola Elerian, Siddhartha Chib, and Neil Shephard. Likelihood inference for discretely observed nonlinear diffusions.Econometrica, 69(4):959–993, 2001

work page 2001

[14] [14]

Paul Fearnhead, Omiros Papaspiliopoulos, and Gareth O. Roberts. Particle filters for partially observed diffusions.J. R. Stat. Soc. Ser. B Stat. Methodol., 70(4):755–777, 2008

work page 2008

[15] [15]

Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J

Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations.J. Mach. Learn. Res., 24:Paper No. [159], 39, 2023

work page 2023

[16] [16]

Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

Arnab Ganguly, Riten Mitra, and Jinpu Zhou. Nonparametric learning of stochastic differential equations from sparse and noisy data, 2025

work page 2025

[17] [17]

Golightly and D

A. Golightly and D. J. Wilkinson. Bayesian inference for stochastic kinetic models using a diffusion approximation.Biometrics, 61(3):781–788, 2005

work page 2005

[18] [18]

Golightly and D

A. Golightly and D. J. Wilkinson. Bayesian inference for nonlinear multivariate diffusion models observed with error.Comput. Statist. Data Anal., 52(3):1674–1693, 2008

work page 2008

[19] [19]

Wilkinson

Andrew Golightly and Darren J. Wilkinson. Bayesian parameter inference for stochastic biochemical network models using particle markov chain monte carlo.Interface Focus, 1(6):807– 820, 2011. 10

work page 2011

[20] [20]

Iacus.Simulation and inference for stochastic differential equations

Stefano M. Iacus.Simulation and inference for stochastic differential equations. Springer Series in Statistics. Springer, New York, 2008. With R examples

work page 2008

[21] [21]

Estimation of an ergodic diffusion from discrete observations.Scand

Mathieu Kessler. Estimation of an ergodic diffusion from discrete observations.Scand. J. Statist., 24(2):211–229, 1997

work page 1997

[22] [22]

Kutoyants.Statistical inference for ergodic diffusion processes

Yury A. Kutoyants.Statistical inference for ergodic diffusion processes. Springer Series in Statistics. Springer-Verlag London, Ltd., London, 2004

work page 2004

[23] [23]

Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann

Chenxu Li. Maximum-likelihood estimation for diffusion processes via closed-form density expansions.Ann. Statist., 41(3):1350–1380, 2013

work page 2013

[24] [24]

On generating Monte Carlo samples of continuous diffusion bridges.J

Ming Lin, Rong Chen, and Per Mykland. On generating Monte Carlo samples of continuous diffusion bridges.J. Amer. Statist. Assoc., 105(490):820–838, 2010

work page 2010

[25] [25]

Margossian, Loucas Pillaud-Vivien, and Lawrence K

Charles C. Margossian, Loucas Pillaud-Vivien, and Lawrence K. Saul. Variational inference for uncertainty quantification: an analysis of trade-offs.J. Mach. Learn. Res., 26:1–41, 2025

work page 2025

[26] [26]

Margossian, Yuling Yao, Robert M

Chirag Modi, Charles C. Margossian, Yuling Yao, Robert M. Gower, David M. Blei, and Lawrence K. Saul. Variational inference with Gaussian score matching. InAdvances in Neural Information Processing Systems (NeurIPS) 36, pages 29935–29950, 2023

work page 2023

[27] [27]

The variational Gaussian approximation revisited

Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786–792, 2009

work page 2009

[28] [28]

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

work page 2019

[29] [29]

G. O. Roberts and O. Stramer. On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm.Biometrika, 88(3):603–621, 2001

work page 2001

[30] [30]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

work page 2021

[31] [31]

A variational approach to path estimation and parameter inference of hidden diffusion processes.J

Tobias Sutter, Arnab Ganguly, and Heinz Koeppl. A variational approach to path estimation and parameter inference of hidden diffusion processes.J. Mach. Learn. Res., 17:Paper No. 190, 37, 2016

work page 2016

[32] [32]

Neural Stochastic Differ- ential Equations: Deep Latent Gaussian Models in the Diffu- sion Limit, 2019

Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit.arXiv preprint arXiv:1905.09883, 2019

work page arXiv 1905

[33] [33]

Theoretical guarantees for sampling and inference in generative models with latent diffusions

Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. InConference on Learning Theory, pages 3084–3114. PMLR, 2019

work page 2019

[34] [34]

Whitaker, Andrew Golightly, Richard J

Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Bayesian inference for diffusion-driven mixed-effects models.Bayesian Anal., 12(2):435–463, 2017

work page 2017

[35] [35]

Whitaker, Andrew Golightly, Richard J

Gavin A. Whitaker, Andrew Golightly, Richard J. Boys, and Chris Sherlock. Improved bridge constructs for stochastic differential equations.Stat. Comput., 27(4):885–900, 2017

work page 2017

[36] [36]

Z T 0 ∥u(X1(t))∥2dt # .(21) In particular, when the initial distributions coincide, the KL divergence reduces to KL(Π1 ∥Π 0) = 1 2 EΠ1

Nakahiro Yoshida. Estimation for diffusion processes from discrete observation.J. Multivariate Anal., 41(2):220–242, 1992. 11 A Kolmogorov Forward and Backward Equations The generatorAof a diffusion processX(t)is given by Af(x) = dX i=1 bi(x) ∂f(x) ∂xi + 1 2 dX i,j=1 aij(x) ∂2f(x) ∂xi∂xj , f∈C 2(Rd,R),(18) witha(x) def =σ(x)σ(x) ⊤. For notational convenie...

work page 1992