pith. sign in

arxiv: 2509.16896 · v1 · submitted 2025-09-21 · 🧮 math.OC

An Improved Yau-Yau Algorithm for High Dimensional Nonlinear Filtering Problems

Pith reviewed 2026-05-18 15:18 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonlinear filteringYau-Yau algorithmhigh-dimensional systemsquasi-Monte Carlo samplingerror boundsstate estimationreal-time filteringcurse of dimensionality
0
0 comments X

The pith

An improved Yau-Yau algorithm delivers accurate nonlinear filtering for systems with up to 1000 state dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Nonlinear state estimation grows intractable as the number of variables increases because standard numerical methods suffer from exponential complexity. The paper develops an enhanced Yau-Yau filtering procedure that combines quasi-Monte Carlo sampling, offline-online updates, high-order kernel approximations, and parallel computation to reach real-time performance with explicit error bounds. A sympathetic reader would care because the approach promises to make reliable estimation feasible in high-dimensional control, signal processing, and dynamical systems where current tools either lose accuracy or become too slow. The reported experiments show sub-quadratic run times and better results than extended Kalman, unscented Kalman, and particle filters under strong nonlinearity.

Core claim

The paper claims that integrating quasi-Monte Carlo low-discrepancy sampling, a novel offline-online scheme, high-order multi-scale kernel approximations, fully log-domain likelihood evaluation, and a local resampling-restart step produces a Yau-Yau-type filter whose local truncation error is O(Δt² + D*(n)) and global error is O(Δt + D*(n)/Δt), where Δt is the time step and D*(n) is the star discrepancy of the chosen points, thereby extending accurate real-time nonlinear filtering to systems with thousands of dimensions.

What carries the argument

The improved Yau-Yau filtering framework realized through quasi-Monte Carlo low-discrepancy sampling combined with offline-online updates and multi-scale kernel approximations.

If this is right

  • Real-time nonlinear filtering becomes feasible for state dimensions up to at least 1000.
  • Runtime scales sub-quadratically and error grows sub-linearly with dimension in the tested regimes.
  • The method outperforms extended and unscented Kalman filters as well as particle filters when nonlinearity is strong.
  • Performance matches or exceeds the optimal Kalman-Bucy filter on linear Gaussian problems.
  • Accurate high-dimensional nonlinear filtering becomes available for science and engineering applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sampling and kernel strategy could be tested on other stochastic estimation tasks such as smoothing or parameter identification.
  • Hardware acceleration on modern GPUs suggests further speed-ups are possible as compute resources improve.
  • The error bounds invite direct comparison experiments against other dimension-reduction techniques on shared benchmark problems.
  • Applications in fields with naturally high-dimensional states, such as fluid dynamics or neural population models, become worth exploring.

Load-bearing premise

The star-discrepancy of the chosen quasi-Monte Carlo points stays small enough relative to the chosen time step and the local resampling-restart step does not introduce uncontrolled bias.

What would settle it

Numerical runs on a 1000-dimensional nonlinear cubic sensor problem where the observed global error grows faster than O(Δt + D*(n)/Δt) or where the method fails to produce real-time results would falsify the central performance claims.

Figures

Figures reproduced from arXiv: 2509.16896 by Shing-Tung Yau, Yi-Shuai Niu.

Figure 1
Figure 1. Figure 1: Required sample size n vs. ∆t on a log-log scale, comparing MC (n ∼ ∆t −4 , dashed blue) and QMC (n ∼ ∆t −2 (log(1/∆t))3 , solid red) for r = 3. 6 Local Resampling-Restart Mechanism In high‐dimensional nonlinear filtering, globally distributed samples can leave large regions of the state space severely under‐covered, giving rise to the so‐called “great‐wall” phenomenon: within a single time step ∆t, the la… view at source ↗
Figure 2
Figure 2. Figure 2: The “great‐wall” phenomenon: due to excessively sparse sampling, the filter correction [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance metrics for large-scale nonlinear filtering over 1000 time steps as state [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Improved Yau–Yau filtering results for the large-scale nonlinear filtering over 1000 time [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of four filtering algorithms on the 1-D cubic sensor test problem. [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average state estimates over 20 runs on the double‐well test: (a) EKF and (b) UKF both [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: State estimation trajectories for the improved Yau–Yau (dashed) and Kalman–Bucy [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

Nonlinear state estimation under noisy observations is rapidly intractable as system dimension increases. We introduce an improved Yau-Yau filtering framework that breaks the curse of dimensionality and extends real-time nonlinear filtering to systems with up to thousands of state dimensions, achieving high-accuracy estimates in just a few seconds with rigorous theoretical error guarantees. This new approach integrates quasi-Monte Carlo low-discrepancy sampling, a novel offline-online update, high-order multi-scale kernel approximations, fully log-domain likelihood computation, and a local resampling-restart mechanism, all realized with CPU/GPU-parallel computation. Theoretical analysis guarantees local truncation error \(O(\Delta t^2 + D^*(n))\) and global error \(O(\Delta t + D^*(n)/\Delta t)\), where \(\Delta t\) is the time step and \(D^*(n)\) the star-discrepancy. Numerical experiments, spanning large-scale nonlinear cubic sensors up to 1000 dimensions, highly nonlinear small-scale problems, and linear Gaussian benchmarks, demonstrate sub-quadratic runtime scaling, sub-linear error growth, and excellent performance that surpasses the extended and unscented Kalman filters (EKF, UKF) and the particle filter (PF) under strong nonlinearity, while matching or exceeding the optimal Kalman-Bucy filter in linear regimes. By breaking the curse of dimensionality, our method enables accurate, real-time, high-dimensional nonlinear filtering, opening broad opportunities for applications in science and engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an improved Yau-Yau algorithm for high-dimensional nonlinear filtering. It combines quasi-Monte Carlo low-discrepancy sampling, a novel offline-online update, high-order multi-scale kernel approximations, fully log-domain likelihood computation, and a local resampling-restart mechanism, all implemented with CPU/GPU parallelism. Theoretical analysis is claimed to yield local truncation error O(Δt² + D*(n)) and global error O(Δt + D*(n)/Δt). Numerical experiments on cubic sensors up to 1000 dimensions, highly nonlinear small-scale problems, and linear Gaussian cases report sub-quadratic runtime, sub-linear error growth, and performance superior to EKF, UKF, and PF while matching or exceeding the Kalman-Bucy filter in linear regimes.

Significance. If the error bounds are rigorously established without hidden dimension dependence in the discrepancy term, the work would constitute a notable advance for real-time high-dimensional nonlinear state estimation. The integration of multiple techniques, the parallel implementation, and the breadth of numerical benchmarks (including 1000-d cases) provide practical value. Explicit credit is due for attempting a theoretical guarantee alongside reproducible large-scale experiments.

major comments (2)
  1. Theoretical analysis (global error bound): The claimed global error O(Δt + D*(n)/Δt) requires D*(n) ≪ Δt for the bound to remain controlled. Standard low-discrepancy bounds give D*(n) ≲ (log n)^d / n, which for d=1000 and practical n (e.g., 10^5–10^6) yields values far larger than typical Δt; the manuscript provides neither a dimension-independent estimate nor explicit computation of D*(n) for the reported 1000-dimensional experiments to justify the headline claim of breaking the curse of dimensionality with rigorous control.
  2. Resampling-restart mechanism: The assertion that the local resampling-restart does not introduce uncontrolled bias lacks an explicit propagation of the discrepancy term through successive restart steps. The global error bound appears to treat segments independently without accounting for potential accumulation, which is load-bearing for the overall guarantee.
minor comments (2)
  1. The choice of sampling size n and time step Δt for the 1000-d experiments should be detailed with respect to the discrepancy condition, ideally with a table or explicit values.
  2. Notation for the multi-scale kernel approximation would benefit from an explicit definition or equation reference in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We appreciate the recognition of the practical value of the parallel implementation and the breadth of the numerical benchmarks. We address each major comment below and will make the corresponding revisions to strengthen the theoretical presentation.

read point-by-point responses
  1. Referee: Theoretical analysis (global error bound): The claimed global error O(Δt + D*(n)/Δt) requires D*(n) ≪ Δt for the bound to remain controlled. Standard low-discrepancy bounds give D*(n) ≲ (log n)^d / n, which for d=1000 and practical n (e.g., 10^5–10^6) yields values far larger than typical Δt; the manuscript provides neither a dimension-independent estimate nor explicit computation of D*(n) for the reported 1000-dimensional experiments to justify the headline claim of breaking the curse of dimensionality with rigorous control.

    Authors: We thank the referee for highlighting this key point in the error analysis. The global error bound is obtained by summing the local truncation errors O(Δt² + D*(n)) over successive time steps under the assumption that the star-discrepancy remains sufficiently small relative to Δt. While the manuscript invokes standard low-discrepancy properties, we acknowledge that it does not supply explicit numerical estimates of D*(n) for the 1000-dimensional experiments nor a fully dimension-independent bound. In the revised version we will add a dedicated paragraph in the theoretical section that (i) recalls the specific low-discrepancy sequence employed, (ii) provides computed or analytically bounded values of D*(n) for the reported n and d=1000 cases, and (iii) clarifies that the multi-scale kernel approximation effectively reduces the integration dimension, thereby keeping the practical discrepancy well below typical Δt. These additions will make the conditions for the headline claim fully explicit. revision: yes

  2. Referee: Resampling-restart mechanism: The assertion that the local resampling-restart does not introduce uncontrolled bias lacks an explicit propagation of the discrepancy term through successive restart steps. The global error bound appears to treat segments independently without accounting for potential accumulation, which is load-bearing for the overall guarantee.

    Authors: We agree that a more detailed accounting of discrepancy propagation across restarts is necessary for a complete guarantee. The current analysis treats each inter-restart interval as an independent segment whose local error is bounded by O(Δt² + D*(n)), with the restart resetting the sampling distribution. To address possible accumulation we will insert a short lemma in the revised manuscript that propagates the discrepancy term through a finite number of restarts and shows that the local resampling condition keeps the accumulated contribution inside the same O(D*(n)/Δt) order. The argument will be supported by a brief numerical check of error growth over long horizons with and without restarts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; error bounds reference independent discrepancy measure

full rationale

The paper states local truncation error O(Δt² + D*(n)) and global error O(Δt + D*(n)/Δt) where D*(n) is the star-discrepancy of the chosen quasi-Monte Carlo points. This quantity is an external, standard property of low-discrepancy sequences and is not defined or fitted from the filtering outputs or error expressions themselves. No self-definitional step, fitted-input prediction, or load-bearing self-citation chain appears in the abstract or described derivation; the bounds are presented as consequences of combining the Yau-Yau framework with QMC approximation theory under the maintained assumption that D*(n) remains controlled relative to Δt. The central performance claims therefore retain independent content relative to the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The method rests on standard quasi-Monte Carlo discrepancy theory and kernel approximation properties; no new physical entities are postulated and the free parameters (sampling size n, time step Δt, kernel scales) are algorithmic choices rather than data-fitted constants.

free parameters (2)
  • sampling size n
    Number of quasi-Monte Carlo points chosen to control star-discrepancy; selected for computational budget rather than fitted to output data.
  • time step Δt
    Discretization interval whose size trades off truncation error against accumulated discrepancy term.
axioms (2)
  • standard math Star-discrepancy D*(n) of low-discrepancy sequences satisfies known upper bounds independent of dimension for the chosen point sets.
    Invoked to obtain the stated local and global error orders.
  • domain assumption The underlying nonlinear filtering PDE admits a sufficiently smooth solution so that high-order kernel approximations remain accurate.
    Required for the truncation-error term O(Δt²) to hold.

pith-pipeline@v0.9.0 · 5788 in / 1691 out tokens · 43611 ms · 2026-05-18T15:18:26.135207+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    V. J. Aidala, Kalman filter behavior in bearings-only tracking applications , IEEE Transactions on Aerospace and Electronic Systems 15 (1979), no. 1, 29–39

  2. [2]

    Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind, Automatic differentiation in machine learning: a survey , Journal of Machine Learning Research 18 (2018), no

    Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind, Automatic differentiation in machine learning: a survey , Journal of Machine Learning Research 18 (2018), no. 153, 1–43

  3. [3]

    Bensoussan, R

    A. Bensoussan, R. Glowinski, and A. Rascanu, Approximation of the zakai equation by the splitting up method , SIAM Journal on Control and Optimization 28 (1990), 1420–1431

  4. [4]

    , Approximation of some stochastic differential equations by the splitting up method , Applied Mathematics and Optimization 25 (1992), 81–106

  5. [5]

    Bratley, B

    P. Bratley, B. L. Fox, and H. Niederreiter, Implementation and tests of low‐discrepancy se- quences, ACM Transactions on Modeling and Computer Simulation 2 (1992), no. 3, 195–213

  6. [6]

    Cassola and M

    F. Cassola and M. Burlando, Wind speed and wind energy forecast through kalman filtering of numerical weather prediction model output , Applied Energy 99 (2012), 154–166

  7. [7]

    Xiuqiong Chen, Zeju Sun, Yangtianze Tao, and Stephen S.-T. Yau, A uniform framework of yau–yau algorithm based on deep learning with the capability of overcoming the curse of dimensionality, IEEE Transactions on Automatic Control 70 (2025), no. 1, 339–354

  8. [8]

    D. B. Cox and J. D. Brading, Integration of lambda ambiguity resolution with kalman filter for relative navigation of spacecraft , Navigation 47 (2000), no. 3, 205–210. 33

  9. [9]

    T. E. Duncan, Probability density for diffusion processes with applications to nonlinear filtering theory, Ph.D. thesis, Stanford University, 1967

  10. [10]

    4, 337–351

    Henri Faure, Discrépance de suites associées à un système de numération (en dimension s) , Acta Arithmetica 41 (1982), no. 4, 337–351

  11. [11]

    W. H. Fleming and S. K. Mitter, Optimal control and nonlinear filtering for nondegenerate diffusion processes, Stochastics 8 (1982), 63–77

  12. [12]

    Florchinger and F

    P. Florchinger and F. Le Gland, Time discretization of the zakai equation for diffusion processes observed in correlated noise , Stoch. Stoch. Rep. 35 (1991), 233–256

  13. [13]

    Gyongy and N

    I. Gyongy and N. Krylov, On the splitting-up method and stochastic partial differential equa- tion, Ann. Probab. 31 (2003), 564–591

  14. [14]

    J. H. Halton, On the efficiency of certain quasi‐random sequences of points in evaluating multi‐dimensional integrals, Numerische Mathematik 2 (1960), 84–90

  15. [15]

    A. H. Jazwinski, Stochastic processes and filtering theory , Academic Press, 1970

  16. [16]

    Joe and F

    S. Joe and F. Y. Kuo, Constructing sobol sequences with better two‐dimensional projections , SIAM Journal on Scientific Computing 30 (2008), no. 5, 2635–2654

  17. [17]

    S. J. Julier and J. K. Uhlmann, A new extension of the kalman filter to nonlinear systems , Proceedings of AeroSense: The 11th International Symposium on Aerospace/Defense Sensing, Simulation and Controls, 1997, pp. 182–193

  18. [18]

    R. E. Kalman, A new approach to linear filtering and prediction problems , Transactions of the ASME, Journal of Basic Engineering, Series D 82 (1960), no. 1, 35–44

  19. [19]

    R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory , Trans. ASME 83 (1961), 95–108

  20. [20]

    Kocis and W

    L. Kocis and W. J. Whiten, Computational investigations of low‐discrepancy sequences , ACM Transactions on Mathematical Software 23 (1997), no. 2, 266–294

  21. [21]

    Kuipers and H

    L. Kuipers and H. Niederreiter, Uniform distribution of sequences , Wiley–Interscience, 1974

  22. [22]

    H. J. Kushner, Dynamical equations for optimal nonlinear filtering , Journal of Differential Equations 3 (1967), 170–180

  23. [23]

    H. J. Kushner, A robust discrete-state approximation to the optimal nonlinear filter for a diffusion, Stochastics 3 (1979), 75–83

  24. [24]

    Lototsky, R

    S. Lototsky, R. Mikulevicius, and B. L. Rozovskii, Nonlinear filtering revisited: A spectral approach, SIAM J. Control Optim. 35 (1997), 435–461

  25. [25]

    Joaquim R. R. A. Martins, Peter Sturdza, and Juan J. Alonso, The complex-step derivative approximation, ACM Transactions on Mathematical Software 29 (2003), no. 3, 245–262 (en)

  26. [26]

    M. D. McKay, R. J. Beckman, and W. J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , Technometrics 21 (1979), no. 2, 239–245

  27. [27]

    Del Moral, J

    P. Del Moral, J. Jacod, and P. Protter, The monte-carlo method for filtering with discrete-time observations, Probab. Theory Related Fields 120 (2001), 346–368

  28. [28]

    Morokoff and Russel E

    William J. Morokoff and Russel E. Caflisch, Quasi–monte carlo integration , Journal of Com- putational Physics 122 (1995), no. 2, 218–230. 34

  29. [29]

    R. E. Mortensen, Optional control of continuous time stochastic systems , Ph.D. thesis, Uni- versity of California, Berkeley, 1966

  30. [30]

    63, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992

    Harald Niederreiter, Random number generation and quasi-monte carlo methods , SIAM CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992

  31. [31]

    Owen, Orthogonal arrays for computer experiments, integration and visualization , Statistica Sinica 2 (1992), no

    Art B. Owen, Orthogonal arrays for computer experiments, integration and visualization , Statistica Sinica 2 (1992), no. 2, 439–452

  32. [32]

    Pardoux, Stochastic partial differential equations and filtering of diffusion processes , Stochastics 3 (1979), 127–128

    E. Pardoux, Stochastic partial differential equations and filtering of diffusion processes , Stochastics 3 (1979), 127–128

  33. [33]

    , équations du filtrage nonlinéaire, de la prédiction et du lissage , Stochastics 6 (1982), 193–231

  34. [34]

    Flour XIX, Lecture Notes in Math., vol

    , Filtrage nonlinéaire et équations aux dérivées partielles stochastiques associées , École d’Été de Probabilités de St. Flour XIX, Lecture Notes in Math., vol. 1464, Springer-Verlag, Berlin, 1991, pp. 67–163

  35. [35]

    S. F. Schmidt, The kalman filter—its recognition and development for aerospace applications , Journal of Guidance, Control, and Dynamics 4 (1981), no. 1, 4–7

  36. [36]

    I. M. Sobol, On the distribution of points in a cube and the approximate evaluation of integrals , USSR Computational Mathematics and Mathematical Physics 7 (1967), no. 4, 86–112

  37. [37]

    Sebastian Thrun, Particle filters in robotics , Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 2002, pp. 511–518

  38. [38]

    Jr. W. E. Hopkins and W. S. Wong, Lie-trotter product formulas for nonlinear filtering , Stochastics 17 (1986), 313–337

  39. [39]

    Wells, The kalman filter in finance , Springer Science and Business Media, 1996

    C. Wells, The kalman filter in finance , Springer Science and Business Media, 1996

  40. [40]

    Yau, Real time solution of nonlinear filtering problem without memory i , Math

    Shing-Tung Yau and Stephen S.-T. Yau, Real time solution of nonlinear filtering problem without memory i , Math. Res. Lett. 7 (2000), 671–693

  41. [41]

    1, 163–195

    , Real time solution of the nonlinear filtering problem without memory ii , Siam Journal on Control and Optimization 47 (2008), no. 1, 163–195

  42. [42]

    4, 243–262

    Mei-Heng Yueh, Wen-Wei Lin, and Shing-Tung Yau, An efficient numerical method for solving high-dimensional nonlinear filtering problems , Communications in Information and Systems 14 (2014), no. 4, 243–262

  43. [43]

    Zakai, On the optimal filtering of diffusion processes , Z

    M. Zakai, On the optimal filtering of diffusion processes , Z. Wahrsch. Verw. Gebiete 11 (1969), 230–243. 35