High-dimensional Bayesian filtering through deep density approximation

Filip Rydin; Kasper B{\aa}gmark

arxiv: 2511.07261 · v2 · submitted 2025-11-10 · 🧮 math.NA · cs.NA· stat.CO· stat.ML

High-dimensional Bayesian filtering through deep density approximation

Kasper B{\aa}gmark , Filip Rydin This is my paper

Pith reviewed 2026-05-17 23:28 UTC · model grok-4.3

classification 🧮 math.NA cs.NAstat.COstat.ML

keywords Bayesian filteringdeep neural networksstochastic differential equationsFokker-Planck equationLorenz-96 modelparticle filtersdensity approximationhigh-dimensional systems

0 comments

The pith

In a 100-dimensional Lorenz-96 model the logarithmic deep backward SDE filter produces reliable density estimates where particle-based methods fail and reduces inference time by two to five orders of magnitude.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper benchmarks deep neural approximations to filtering densities for nonlinear stochastic differential equations observed at discrete times. The methods solve associated Fokker-Planck or backward SDE problems via Feynman-Kac formulas and neural networks, with logarithmic versions added for stability and positivity. Low-dimensional tests show particle filters performing well, but scaling to a partially observed 100-dimensional Lorenz-96 system causes particle and ensemble Kalman filters to break down. The logarithmic deep backward SDE filter maintains performance and achieves large computational savings.

Core claim

The paper establishes that the logarithmic deep backward stochastic differential equation filter, built from Feynman-Kac representations, Euler-Maruyama discretizations, and neural network solvers, delivers accurate filtering density approximations in high dimensions where classical particle filters suffer from degeneracy, while also providing inference times reduced by roughly two to five orders of magnitude.

What carries the argument

The logarithmic deep backward SDE filter that approximates the solution to the backward stochastic differential equation for the filtering density using neural networks.

If this is right

Filtering densities can be tracked accurately without the degeneracy issues that affect particle filters in high dimensions.
The computational efficiency gains allow for real-time inference in systems previously intractable for particle methods.
Positivity-preserving approximations become feasible through the logarithmic transformation even as state dimension grows.
Bayesian updates integrate naturally with the continuous-time density evolution between observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar deep density techniques might improve performance in other high-dimensional inverse problems such as parameter estimation in complex physical systems.
Integration with existing data assimilation frameworks could lead to hybrid methods that combine the speed of deep filters with the robustness of ensemble approaches.
Testing on models with even higher dimensions or different observation structures would clarify the scalability limits of this approach.

Load-bearing premise

The neural network solutions to the discretized Fokker-Planck or backward SDE equations remain sufficiently accurate and stable that the logarithmic transformation does not introduce significant bias or instability in the high-dimensional density estimates.

What would settle it

If the deep filter's approximated filtering density produces posterior statistics that deviate by more than a small threshold from those computed by a converged high-particle-count reference solution in the 100-dimensional Lorenz-96 model, the superiority claim would be falsified.

Figures

Figures reproduced from arXiv: 2511.07261 by Filip Rydin, Kasper B{\aa}gmark.

**Figure 1.** Figure 1: On the left and right panels the results for the Ornstein–Uhlenbeck process and the bistable process are depicted, respectively. From top to bottom the rMAE, FME, and KLD metrics are illustrated. In [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

**Figure 2.** Figure 2: On the left and right panels the results for the long-horizon 10- dimensional Ornstein–Uhlenbeck process and the short-horizon 100-dimensional Ornstein–Uhlenbeck process are depicted respectively. From top to bottom the rMAE, FME, and KLD metrics are illustrated. The computational gain from using the LogBSDEF compared to the underperforming PF with 106 particles, is even higher than for the one-dimensional… view at source ↗

**Figure 3.** Figure 3: On the left and right panels the results for the 10-dimensional and 100-dimensional linear spring-mass models are depicted respectively. From top to bottom the rMAE, FME, and KLD metrics are illustrated. experiments, and should afterwards be seen as deterministic constants. We let d ′ = r and define the observation process through the measurement function h(x) = Hx with H = [Ir×r 0r×r], that is, relative p… view at source ↗

**Figure 4.** Figure 4: Metrics for the Schl¨ogl model, shown left to right: rMAE, FME, and KLD. 5.5. Lorenz-96. In our final example, we tackle a strongly nonlinear, high-dimensional system, precisely the regime where classical methods succumb to the curse of dimensionality. The Lorenz96 model is a high-dimensional chaotic dynamical system originally introduced in [39] as a testbed for numerical weather prediction. It captures … view at source ↗

**Figure 5.** Figure 5: Metrics for the four-dimensional Lorenz-96 model, shown left to right: rMAE, FME, and KLD. We continue by increasing the state dimension d = [4, 10, 20, 40, 100]. In addition, we only have partial observations with d ′ = [4, 5, 5, 10, 25], where we observe every, every second or every fourth position, respectively. More precisely, the measurement function is defined by [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 6.** Figure 6: The averaged MAE and NLL metrics evaluated over increasing state dimension d = [4, 10, 20, 40, 100]. At each time step the NLL value is capped at | log(10−200)| ≈ 460. 5.6. Computational efficiency. Finally, we focus more closely on the inference time of the methods and how it scales with the state dimension. More specifically, we display the time required to i) estimate one whole state trajectory by the f… view at source ↗

**Figure 7.** Figure 7: On the left, we display the average time for estimating one whole trajectory in the Ornstein–Uhlenbeck case. On the right, we display the average time for evaluating filtering densities for all observation times in 1000 points. Note that this includes the time to obtain normalization constants for the BSDEF. In both plots, the time it takes to propagate particles for the EnKF and PF is included. For BSDEF,… view at source ↗

**Figure 8.** Figure 8: The computational time, from initializing each method to evaluation of 1000 spatial points in each observation time, over increasing number of sequences. The intersection between the PFs and deep density methods’ computational time occurs at 1300 samples for the bistable example and at 430 samples for the 100- dimensional Lorenz-96 example. 6. Conclusion and discussion This work benchmarked deep filtering … view at source ↗

**Figure 9.** Figure 9: The standard FCN architecture used in the implementation. In the figure, x denotes the state value in which the density is evaluated and o1:k denotes the available observations. The input is padded with zeros so that it has a constant dimension with respect to k. the same observation sequence. Moreover, to prevent an explosion of the number of parameters, all LSTM encoders for (vk,n) N−1 n=0 , with a fixed… view at source ↗

**Figure 10.** Figure 10: The LSTM-based architecture tested in the long-horizon Ornstein– Uhlenbeck example in 10 dimensions. A token zero input 0 ∈ R d ′ is used in the first LSTM cell, while subsequent ones take as input the observation chain o1:k. Appendix D. Training D.1. Training the deep density methods. For DSF and LogDSF, we train in epochs over a fixed dataset: we pre-generate 20 000 mini-batches and iterate over them fo… view at source ↗

**Figure 11.** Figure 11: One trajectory for the long-horizon Ornstein–Uhlenbeck problem in 10 dimensions. The top row shows the sample path with corresponding filter mean estimates. The bottom row displays the filtering densities at the final time T = 10. A. Reactions in (14) model inflow and outflow of S through coupling with reservoir B, maintaining a constant supply and removal of the species. Starting from the chemical master… view at source ↗

**Figure 12.** Figure 12: Marginal densities over time in the 10-dimensional linear spring-mass example for selected components. Recall that only positions are observed [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: One trajectory and filter estimates for selected components in the 10-dimensional linear spring-mass example. −10 0 10 −10 0 10 −10 0 10 −10 0 10 LogDBSDEF EKF reference S [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: One realization of the state process S, starting at S0 ∼ N [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

read the original abstract

In this work, we systematically benchmark two recently developed deep density methods for nonlinear filtering. We model the filtering density of a discretely observed stochastic differential equation through the associated Fokker--Planck equation, coupled with Bayesian updates at discrete observation times. The two filters: the deep splitting filter and the deep backward stochastic differential equation filter, are both based on Feynman--Kac formulas, Euler--Maruyama discretizations and neural networks. The two methods are extended to logarithmic formulations providing sound, robust, and positivity-preserving density approximations in increasing state dimension. Comparing to the classical bootstrap particle filter and an ensemble Kalman filter, we benchmark the methods on numerous examples. In the low-dimensional examples the particle filters work well, but when we scale up to a partially observed $100$-dimensional Lorenz-96 model, the particle-based methods fail and the logarithmic deep backward stochastic differential equation filter prevails. In terms of computational efficiency, the deep density methods reduce inference time by roughly two to five orders of magnitude relative to the particle-based filters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Log versions of deep density filters run on a 100D Lorenz-96 filtering task where particles collapse, but the lack of ground truth leaves accuracy claims hard to verify.

read the letter

The main thing to know is that logarithmic reformulations of the deep splitting and deep BSDE filters let them handle filtering densities on a partially observed 100-dimensional Lorenz-96 model, where the bootstrap particle filter and EnKF break down, while cutting inference time by two to five orders of magnitude. The paper extends recent deep density work by adding the log transform to keep approximations positive and stable as dimension increases, then runs the same methods on lower-dimensional test cases first to show the scaling contrast. The Feynman-Kac and Euler-Maruyama backbone is the usual one, but the log step appears to give practical robustness without changing the core setup. The benchmarking against established baselines is straightforward and the speed numbers are the clearest takeaway. The soft spot is validation once particles fail. Without an independent high-accuracy reference for the 100D densities or moments, it is difficult to separate a genuinely accurate approximation from one that stays stable and fast but carries bias from the neural network, the log change of variables, or the discretization. Lower-dimensional cases probably have error checks, but those do not automatically carry over. The work is aimed at people who need fast uncertainty propagation in high-dimensional SDEs, such as in robotics or climate applications. Readers looking for computational alternatives to particle methods will see the practical gains even if the theoretical side stays standard. I would send this to referees. The numerical evidence on scaling is worth a detailed look, and the log extension is a modest but usable tweak that deserves checking in the full manuscript.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces and benchmarks two deep density approximation methods—the deep splitting filter and the deep backward stochastic differential equation filter—for nonlinear filtering of discretely observed SDEs. Both rely on Feynman-Kac representations, Euler-Maruyama discretizations, and neural networks; logarithmic reformulations are added to enforce positivity and stability in high dimensions. Systematic comparisons against the bootstrap particle filter and ensemble Kalman filter are presented across low-dimensional test cases and a partially observed 100-dimensional Lorenz-96 model, where the particle-based methods are reported to fail while the logarithmic deep BSDE filter succeeds and reduces inference time by two to five orders of magnitude.

Significance. If the accuracy of the high-dimensional approximations can be substantiated, the work supplies computationally tractable alternatives for Bayesian filtering problems in which classical particle methods suffer from degeneracy. The logarithmic extensions for robust density approximation and the emphasis on reproducible numerical benchmarking constitute clear strengths.

major comments (2)

[§5] §5 (high-dimensional Lorenz-96 experiments): the central claim that the logarithmic deep BSDE filter produces accurate filtering densities rests on the observation that particle filters and EnKF fail, yet no independent high-accuracy reference solution, moment comparison, or likelihood benchmark is supplied. Without such a ground truth it is impossible to separate genuine accuracy from stable but systematically biased approximations arising from the neural-network representation, the logarithmic transformation, or the Euler-Maruyama time discretization.
[Numerical results] Implementation and reporting sections: quantitative error metrics, number of independent runs, network architectures, training hyperparameters, and stopping criteria are not reported for the 100D case (or for the low-dimensional benchmarks). This absence prevents verification of the reported performance gains and reproducibility of the claimed orders-of-magnitude speed-up.

minor comments (2)

[Abstract] The abstract states that the methods are benchmarked on “numerous examples” but does not list the state dimensions or observation models; adding a short table or explicit enumeration would improve clarity.
[Notation] Notation for the filtering density p_t and its logarithmic transform should be introduced once and used consistently; occasional switches between p and log p in the text can be confusing.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive criticism of our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the presentation and reproducibility.

read point-by-point responses

Referee: [§5] §5 (high-dimensional Lorenz-96 experiments): the central claim that the logarithmic deep BSDE filter produces accurate filtering densities rests on the observation that particle filters and EnKF fail, yet no independent high-accuracy reference solution, moment comparison, or likelihood benchmark is supplied. Without such a ground truth it is impossible to separate genuine accuracy from stable but systematically biased approximations arising from the neural-network representation, the logarithmic transformation, or the Euler-Maruyama time discretization.

Authors: We agree that an independent high-accuracy reference would provide stronger validation. In the revised manuscript we have added moment comparisons (means and covariances) of the logarithmic deep BSDE filter against long-run Monte Carlo trajectories of the underlying Lorenz-96 dynamics, which serve as a partial benchmark. We have also inserted a new paragraph in §5 explicitly discussing the possibility of systematic bias from the neural-network approximation, the log-transform, and the Euler-Maruyama scheme, together with a brief sensitivity study with respect to time-step size. A complete, independent density reference remains unavailable for the same reason the particle filter degenerates; we now state this limitation clearly rather than implying the method is fully validated by the failure of alternatives alone. revision: partial
Referee: [Numerical results] Implementation and reporting sections: quantitative error metrics, number of independent runs, network architectures, training hyperparameters, and stopping criteria are not reported for the 100D case (or for the low-dimensional benchmarks). This absence prevents verification of the reported performance gains and reproducibility of the claimed orders-of-magnitude speed-up.

Authors: We regret the incomplete reporting. The revised version now contains a dedicated subsection on implementation details that reports: (i) quantitative error metrics (KL divergence to a high-resolution particle reference in low dimensions and moment errors in 100D), (ii) the number of independent runs (ten for the 100D experiments), (iii) network architectures (depth, width, and activation functions), (iv) training hyperparameters (learning rate schedule, batch size, optimizer), and (v) stopping criteria (validation-loss plateau). These additions allow direct reproduction of the reported inference-time reductions. revision: yes

standing simulated objections not resolved

Supplying a fully independent, high-accuracy reference solution for the filtering density in 100 dimensions; such a reference is computationally intractable with existing methods, which is the central motivation for the proposed approach.

Circularity Check

0 steps flagged

Numerical benchmarks against external particle and ensemble filters exhibit no circularity

full rationale

The paper reports direct empirical comparisons of deep splitting and deep BSDE filters (with logarithmic extensions) to the bootstrap particle filter and EnKF on multiple SDE examples, including the 100D partially observed Lorenz-96 model. Performance metrics such as inference time reductions (two to five orders of magnitude) and relative success in high dimensions are obtained from explicit simulation runs and timing measurements against these independent classical baselines. The underlying representations rely on standard Feynman-Kac formulas and Euler-Maruyama discretizations, which are externally motivated and not redefined in terms of the paper's own outputs or fitted quantities. No equations or claims reduce the reported results to self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The study is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claims rest on standard numerical assumptions about neural network approximation power for PDEs and SDEs plus the adequacy of Euler-Maruyama discretization; no new entities are postulated and no free parameters are explicitly fitted to the target performance metrics.

axioms (2)

domain assumption Neural networks can accurately approximate solutions to the Fokker-Planck equation and backward SDEs arising from the filtering problem.
Invoked to justify replacing particle representations with learned density functions.
domain assumption Euler-Maruyama discretization of the underlying SDE is sufficiently accurate for the time scales and dimensions considered.
Standard assumption in numerical SDE filtering literature.

pith-pipeline@v0.9.0 · 5480 in / 1393 out tokens · 47534 ms · 2026-05-17T23:28:49.073578+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

log density v=−log p_k satisfying (5)–(6) with f_log(x,u,w)=−½∥σ⊤w∥²−f(x,1,−w); deep splitting/BSDE optimization (8)–(9) with Euler–Maruyama (7)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

high-dimensional Lorenz-96 (d=100) where particle methods fail; LogBSDEF prevails via neural density approximation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

Andersson, A

K. Andersson, A. Andersson, and C. W. Oosterlee. The deep multi-FBSDE method: a robust deep learning method for coupled FBSDEs.arXiv:2503.13193, 2025

work page arXiv 2025
[2]

B˚ agmark, A

K. B˚ agmark, A. Andersson, and S. Larsson. An energy-based deep splitting method for the nonlinear filtering problem.Partial Differ. Equ. Appl., 4, 2023

work page 2023
[3]

Nonlinear filtering based on density approximation and deep BSDE prediction

K. B˚ agmark, A. Andersson, and S. Larsson. Nonlinear filtering based on density approximation and deep BSDE prediction.arXiv:2508.10630, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

A convergent scheme for the Bayesian filtering problem based on the Fokker--Planck equation and deep splitting

K. B˚ agmark, A. Andersson, S. Larsson, and F. Rydin. A convergent scheme for the Bayesian filtering problem based on the Fokker–Planck equation and deep splitting.arXiv:2409.14585, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

F. Bao, Z. Zhang, and G. Zhang. A score-based filter for nonlinear data assimilation.J. Comput. Phys., 514:Paper No. 113207, 16, 2024

work page 2024
[6]

Bar-Shalom, X

Y. Bar-Shalom, X. R. Li, and T. Kirubarajan.Estimation with Applications to Tracking and Navigation. John Wiley & Sons, 2001

work page 2001
[7]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep learning based numerical approxima- tion algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems. arXiv:2012.01194, 2020

work page arXiv 2012
[8]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep splitting method for parabolic PDEs.SIAM J. Sci. Comput., 43:A3135–A3154, 2021

work page 2021
[9]

C. Beck, W. E, and A. Jentzen. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations.J. Nonlinear Sci., 29(4):1563–1619, 2019

work page 2019
[10]

Bickel, B

P. Bickel, B. Li, and T. Bengtsson. Sharp failure rates for the bootstrap particle filter in high dimensions. In Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh, volume 3 ofInst. Math. Stat. (IMS) Collect., pages 318–329. Inst. Math. Statist., Beachwood, OH, 2008

work page 2008
[11]

S. S. Blackman and R. Popoli.Design and Analysis of Modern Tracking Systems. Artech House Publishers, 1999. HIGH-DIMENSIONAL BAYESIAN FILTERING THROUGH DEEP DENSITY APPROXIMATION 19

work page 1999
[12]

Brajard, A

J. Brajard, A. Carrassi, M. Bocquet, and L. Bertino. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model.J. Comput. Sci., 44:101171, 11, 2020

work page 2020
[13]

Burgers, P

G. Burgers, P. J. van Leeuwen, and G. Evensen. Analysis scheme in the ensemble Kalman filter.Mon. Wea. Rev., 126(6):1719 – 1724, 1998

work page 1998
[14]

Chan-Wai-Nam, J

Q. Chan-Wai-Nam, J. Mikael, and X. Warin. Machine learning for semi linear PDEs.J. Sci. Comput., 79(3):1667–1712, 2019

work page 2019
[15]

N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist., 32(6):2385–2411, 2004

work page 2004
[16]

T. M. Cover and J. A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006

work page 2006
[17]

Crisan and A

D. Crisan and A. Doucet. A survey of convergence results on particle filtering methods for practitioners.IEEE Trans. Signal Process., 50(3):736–746, 2002

work page 2002
[18]

N. Cui, L. Hong, and J. R. Layne. A comparison of nonlinear filtering approaches with an application to ground target tracking.Signal Processing, 85:1469–1492, 2005

work page 2005
[19]

Del Moral.Feynman-Kac formulae: Genealogical and interacting particle systems with applications

P. Del Moral.Feynman-Kac formulae: Genealogical and interacting particle systems with applications. Springer-Verlag, New York, 2004

work page 2004
[20]

L. Duc, T. Kuroda, K. Saito, and T. Fujita. Ensemble Kalman filter data assimilation and storm surge experi- ments of tropical cyclone nargis.Tellus A, 67:25941, 2015

work page 2015
[21]

W. E, J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Commun. Math. Stat, 5:349–380, Nov. 2017

work page 2017
[22]

Ehrendorfer

M. Ehrendorfer. A review of issues in ensemble-based Kalman filtering.Meteorol. Z., 16, 2007

work page 2007
[23]

G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.J. Geophys. Res., 99(C5):10143–10162, 1994

work page 1994
[24]

G. Evensen. The ensemble Kalman filter: Theoretical formulation and practical implementation.Ocean Dyn., 53(4):343–367, 2003

work page 2003
[25]

Frey and V

R. Frey and V. K¨ ock. Convergence analysis of the deep splitting scheme: the case of partial integro-differential equations and the associated forward backward SDEs with jumps.SIAM J. Sci. Comput., 47(1):A527–A552, 2025

work page 2025
[26]

Galanis, P

G. Galanis, P. Louka, P. Katsafados, I. Pytharoulis, and G. Kallos. Applications of Kalman filters based on non-linear functions to numerical weather predictions.Ann. Geophys, 24:1–10, 2006

work page 2006
[27]

Germain, H

M. Germain, H. Pham, and X. Warin. Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J. Sci. Comput., 44(1):A28–A56, 2022

work page 2022
[28]

D. T. Gillespie. The chemical Langevin equation.J. Chem. Phys., 113(1):297–306, 2000

work page 2000
[29]

I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen.Mathematics of Data Fusion, volume 37 ofTheory and Decision Library. Series B: Mathematical and Statistical Methods. Kluwer Academic Publishers Group, Dordrecht, 1997

work page 1997
[30]

N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation.IEEE Proceedings F (Radar and Signal Processing), 140(2):107–113, 1993

work page 1993
[31]

J. Han, A. Jentzen, and W. E. A brief review of the deep BSDE method for solving high-dimensional partial differential equations.arXiv:2505.17032, 2025

work page arXiv 2025
[32]

Han and J

J. Han and J. Long. Convergence of the deep BSDE method for coupled FBSDEs.Probab. Uncertain. Quant. Risk, 5:Paper No. 5, 33, 2020

work page 2020
[33]

Han and X

X. Han and X. Li. An evaluation of the nonlinear/non-Gaussian filters for the sequential data assimilation. Remote Sens. Environ., 112(4):1434–1449, 2008

work page 2008
[34]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735––1780, 1997

work page 1997
[35]

M. S. Johannes and N. G. Polson. MCMC Methods for Continuous-Time Financial Econometrics. InHandbook of Financial Econometrics, pages 1–72. Elsevier, 2009

work page 2009
[36]

Kamino, N

K. Kamino, N. Kadakia, F. Avgidis, Z.-X. Liu, K. Aoki, T. S. Shimizu, and T. Emonet. Optimal inference of molecular interaction dynamics in FRET microscopy.Proc. Natl. Acad. Sci. U.S.A., 120(15):e2211807120, 2023

work page 2023
[37]

Karimi and M

A. Karimi and M. R. Paul. Extensive chaos in the Lorenz-96 model.Chaos, 20(4):043105, 2010

work page 2010
[38]

Katzfuss, J

M. Katzfuss, J. R. Stroud, and C. K. Wikle. Understanding the ensemble Kalman filter.Amer. Statist., 70(4):350–357, 2016

work page 2016
[39]

E. N. Lorenz. Predictability: A problem partly solved. InSeminar on Predictability, Vol. I, pages 1–18. ECMWF, Reading, Berkshire, UK, 1996

work page 1996
[40]

Virtanen, et al

P. Virtanen, et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python.Nature Methods, 17(3):261–272, 2020

work page 2020
[41]

Schl¨ ogl

F. Schl¨ ogl. Chemical reaction models for non-equilibrium phase transitions.Zeitschrift f¨ ur physik, 253(2):147– 161, 1972

work page 1972
[42]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein. Deep information propagation. InProc. Int. Conf. Learn. Represent., 2017

work page 2017
[43]

D. W. Scott.Multivariate density estimation. John Wiley & Sons, Inc., New York, 1992

work page 1992
[44]

Silverman.Density Estimation for Statistics and Data Analysis

B. Silverman.Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986

work page 1986
[45]

C. Snyder. Particle filters, the “optimal” proposal and high-dimensional systems. InProceedings of the ECMWF Seminar on Data Assimilation for atmosphere and ocean, pages 1–10, 2011

work page 2011
[46]

Snyder, T

C. Snyder, T. Bengtsson, and M. Morzfeld. Performance bounds for particle filters using the optimal proposal. Mon. Weather Rev., 143:4750–4761, 2015. 20 K. B ˚AGMARK AND F. RYDIN

work page 2015
[47]

D. L. van Kekem and A. E. Sterk. Symmetries in the Lorenz-96 model.Internat. J. Bifur. Chaos Appl. Sci. Engrg., 29(1):1950008, 18, 2019

work page 2019
[48]

Vellela and H

M. Vellela and H. Qian. Stochastic dynamics and non-equilibrium thermodynamics of a bistable chemical system: the Schl¨ ogl model revisited.J. R. Soc. Interface, 6(39):925–940, 2009

work page 2009
[49]

Vlysidis and Y

M. Vlysidis and Y. N. Kaznessis. Solving stochastic reaction networks with maximum entropy lagrange multi- pliers.Entropy, 20(9):678, 2018

work page 2018
[50]

D. S. Wilks. Effects of stochastic parametrizations in the Lorenz-96 system.Q. J. R. Meteorol. Soc., 131:389– 407, 2005. AppendixA.Proof of Theorem 2.1 For simplicity, we hidet,x, andk, in the notation and writep=p k(t, x). The initial condition follows by insertion and simplification. It remains to derive the log-transformed equation from the original Fo...

work page 2005
[51]

In the figure,xdenotes the state value in which the density is evaluated ando 1:k denotes the available observations

FC ReLU FC exp /Linear ×L o1:k x Stk ok Figure 9.The standard FCN architecture used in the implementation. In the figure,xdenotes the state value in which the density is evaluated ando 1:k denotes the available observations. The input is padded with zeros so that it has a constant dimension with respect tok. the same observation sequence. Moreover, to pre...

work page 2048

[1] [1]

Andersson, A

K. Andersson, A. Andersson, and C. W. Oosterlee. The deep multi-FBSDE method: a robust deep learning method for coupled FBSDEs.arXiv:2503.13193, 2025

work page arXiv 2025

[2] [2]

B˚ agmark, A

K. B˚ agmark, A. Andersson, and S. Larsson. An energy-based deep splitting method for the nonlinear filtering problem.Partial Differ. Equ. Appl., 4, 2023

work page 2023

[3] [3]

Nonlinear filtering based on density approximation and deep BSDE prediction

K. B˚ agmark, A. Andersson, and S. Larsson. Nonlinear filtering based on density approximation and deep BSDE prediction.arXiv:2508.10630, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

A convergent scheme for the Bayesian filtering problem based on the Fokker--Planck equation and deep splitting

K. B˚ agmark, A. Andersson, S. Larsson, and F. Rydin. A convergent scheme for the Bayesian filtering problem based on the Fokker–Planck equation and deep splitting.arXiv:2409.14585, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

F. Bao, Z. Zhang, and G. Zhang. A score-based filter for nonlinear data assimilation.J. Comput. Phys., 514:Paper No. 113207, 16, 2024

work page 2024

[6] [6]

Bar-Shalom, X

Y. Bar-Shalom, X. R. Li, and T. Kirubarajan.Estimation with Applications to Tracking and Navigation. John Wiley & Sons, 2001

work page 2001

[7] [7]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep learning based numerical approxima- tion algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems. arXiv:2012.01194, 2020

work page arXiv 2012

[8] [8]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep splitting method for parabolic PDEs.SIAM J. Sci. Comput., 43:A3135–A3154, 2021

work page 2021

[9] [9]

C. Beck, W. E, and A. Jentzen. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations.J. Nonlinear Sci., 29(4):1563–1619, 2019

work page 2019

[10] [10]

Bickel, B

P. Bickel, B. Li, and T. Bengtsson. Sharp failure rates for the bootstrap particle filter in high dimensions. In Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh, volume 3 ofInst. Math. Stat. (IMS) Collect., pages 318–329. Inst. Math. Statist., Beachwood, OH, 2008

work page 2008

[11] [11]

S. S. Blackman and R. Popoli.Design and Analysis of Modern Tracking Systems. Artech House Publishers, 1999. HIGH-DIMENSIONAL BAYESIAN FILTERING THROUGH DEEP DENSITY APPROXIMATION 19

work page 1999

[12] [12]

Brajard, A

J. Brajard, A. Carrassi, M. Bocquet, and L. Bertino. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model.J. Comput. Sci., 44:101171, 11, 2020

work page 2020

[13] [13]

Burgers, P

G. Burgers, P. J. van Leeuwen, and G. Evensen. Analysis scheme in the ensemble Kalman filter.Mon. Wea. Rev., 126(6):1719 – 1724, 1998

work page 1998

[14] [14]

Chan-Wai-Nam, J

Q. Chan-Wai-Nam, J. Mikael, and X. Warin. Machine learning for semi linear PDEs.J. Sci. Comput., 79(3):1667–1712, 2019

work page 2019

[15] [15]

N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist., 32(6):2385–2411, 2004

work page 2004

[16] [16]

T. M. Cover and J. A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006

work page 2006

[17] [17]

Crisan and A

D. Crisan and A. Doucet. A survey of convergence results on particle filtering methods for practitioners.IEEE Trans. Signal Process., 50(3):736–746, 2002

work page 2002

[18] [18]

N. Cui, L. Hong, and J. R. Layne. A comparison of nonlinear filtering approaches with an application to ground target tracking.Signal Processing, 85:1469–1492, 2005

work page 2005

[19] [19]

Del Moral.Feynman-Kac formulae: Genealogical and interacting particle systems with applications

P. Del Moral.Feynman-Kac formulae: Genealogical and interacting particle systems with applications. Springer-Verlag, New York, 2004

work page 2004

[20] [20]

L. Duc, T. Kuroda, K. Saito, and T. Fujita. Ensemble Kalman filter data assimilation and storm surge experi- ments of tropical cyclone nargis.Tellus A, 67:25941, 2015

work page 2015

[21] [21]

W. E, J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Commun. Math. Stat, 5:349–380, Nov. 2017

work page 2017

[22] [22]

Ehrendorfer

M. Ehrendorfer. A review of issues in ensemble-based Kalman filtering.Meteorol. Z., 16, 2007

work page 2007

[23] [23]

G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.J. Geophys. Res., 99(C5):10143–10162, 1994

work page 1994

[24] [24]

G. Evensen. The ensemble Kalman filter: Theoretical formulation and practical implementation.Ocean Dyn., 53(4):343–367, 2003

work page 2003

[25] [25]

Frey and V

R. Frey and V. K¨ ock. Convergence analysis of the deep splitting scheme: the case of partial integro-differential equations and the associated forward backward SDEs with jumps.SIAM J. Sci. Comput., 47(1):A527–A552, 2025

work page 2025

[26] [26]

Galanis, P

G. Galanis, P. Louka, P. Katsafados, I. Pytharoulis, and G. Kallos. Applications of Kalman filters based on non-linear functions to numerical weather predictions.Ann. Geophys, 24:1–10, 2006

work page 2006

[27] [27]

Germain, H

M. Germain, H. Pham, and X. Warin. Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J. Sci. Comput., 44(1):A28–A56, 2022

work page 2022

[28] [28]

D. T. Gillespie. The chemical Langevin equation.J. Chem. Phys., 113(1):297–306, 2000

work page 2000

[29] [29]

I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen.Mathematics of Data Fusion, volume 37 ofTheory and Decision Library. Series B: Mathematical and Statistical Methods. Kluwer Academic Publishers Group, Dordrecht, 1997

work page 1997

[30] [30]

N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation.IEEE Proceedings F (Radar and Signal Processing), 140(2):107–113, 1993

work page 1993

[31] [31]

J. Han, A. Jentzen, and W. E. A brief review of the deep BSDE method for solving high-dimensional partial differential equations.arXiv:2505.17032, 2025

work page arXiv 2025

[32] [32]

Han and J

J. Han and J. Long. Convergence of the deep BSDE method for coupled FBSDEs.Probab. Uncertain. Quant. Risk, 5:Paper No. 5, 33, 2020

work page 2020

[33] [33]

Han and X

X. Han and X. Li. An evaluation of the nonlinear/non-Gaussian filters for the sequential data assimilation. Remote Sens. Environ., 112(4):1434–1449, 2008

work page 2008

[34] [34]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735––1780, 1997

work page 1997

[35] [35]

M. S. Johannes and N. G. Polson. MCMC Methods for Continuous-Time Financial Econometrics. InHandbook of Financial Econometrics, pages 1–72. Elsevier, 2009

work page 2009

[36] [36]

Kamino, N

K. Kamino, N. Kadakia, F. Avgidis, Z.-X. Liu, K. Aoki, T. S. Shimizu, and T. Emonet. Optimal inference of molecular interaction dynamics in FRET microscopy.Proc. Natl. Acad. Sci. U.S.A., 120(15):e2211807120, 2023

work page 2023

[37] [37]

Karimi and M

A. Karimi and M. R. Paul. Extensive chaos in the Lorenz-96 model.Chaos, 20(4):043105, 2010

work page 2010

[38] [38]

Katzfuss, J

M. Katzfuss, J. R. Stroud, and C. K. Wikle. Understanding the ensemble Kalman filter.Amer. Statist., 70(4):350–357, 2016

work page 2016

[39] [39]

E. N. Lorenz. Predictability: A problem partly solved. InSeminar on Predictability, Vol. I, pages 1–18. ECMWF, Reading, Berkshire, UK, 1996

work page 1996

[40] [40]

Virtanen, et al

P. Virtanen, et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python.Nature Methods, 17(3):261–272, 2020

work page 2020

[41] [41]

Schl¨ ogl

F. Schl¨ ogl. Chemical reaction models for non-equilibrium phase transitions.Zeitschrift f¨ ur physik, 253(2):147– 161, 1972

work page 1972

[42] [42]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein. Deep information propagation. InProc. Int. Conf. Learn. Represent., 2017

work page 2017

[43] [43]

D. W. Scott.Multivariate density estimation. John Wiley & Sons, Inc., New York, 1992

work page 1992

[44] [44]

Silverman.Density Estimation for Statistics and Data Analysis

B. Silverman.Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986

work page 1986

[45] [45]

C. Snyder. Particle filters, the “optimal” proposal and high-dimensional systems. InProceedings of the ECMWF Seminar on Data Assimilation for atmosphere and ocean, pages 1–10, 2011

work page 2011

[46] [46]

Snyder, T

C. Snyder, T. Bengtsson, and M. Morzfeld. Performance bounds for particle filters using the optimal proposal. Mon. Weather Rev., 143:4750–4761, 2015. 20 K. B ˚AGMARK AND F. RYDIN

work page 2015

[47] [47]

D. L. van Kekem and A. E. Sterk. Symmetries in the Lorenz-96 model.Internat. J. Bifur. Chaos Appl. Sci. Engrg., 29(1):1950008, 18, 2019

work page 2019

[48] [48]

Vellela and H

M. Vellela and H. Qian. Stochastic dynamics and non-equilibrium thermodynamics of a bistable chemical system: the Schl¨ ogl model revisited.J. R. Soc. Interface, 6(39):925–940, 2009

work page 2009

[49] [49]

Vlysidis and Y

M. Vlysidis and Y. N. Kaznessis. Solving stochastic reaction networks with maximum entropy lagrange multi- pliers.Entropy, 20(9):678, 2018

work page 2018

[50] [50]

D. S. Wilks. Effects of stochastic parametrizations in the Lorenz-96 system.Q. J. R. Meteorol. Soc., 131:389– 407, 2005. AppendixA.Proof of Theorem 2.1 For simplicity, we hidet,x, andk, in the notation and writep=p k(t, x). The initial condition follows by insertion and simplification. It remains to derive the log-transformed equation from the original Fo...

work page 2005

[51] [51]

In the figure,xdenotes the state value in which the density is evaluated ando 1:k denotes the available observations

FC ReLU FC exp /Linear ×L o1:k x Stk ok Figure 9.The standard FCN architecture used in the implementation. In the figure,xdenotes the state value in which the density is evaluated ando 1:k denotes the available observations. The input is padded with zeros so that it has a constant dimension with respect tok. the same observation sequence. Moreover, to pre...

work page 2048