Natural Gradient Gaussian Approximation Filter with Positive Definiteness Guarantee
Pith reviewed 2026-05-10 16:06 UTC · model grok-4.3
The pith
Two modifications ensure the NANO filter keeps positive definite covariance and avoids divergence during natural gradient updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The posterior covariance equals the sum of the inverse prior covariance and the expected Hessian of the log-likelihood; indefiniteness of the Hessian term is the source of occasional divergence. Approximating the Hessian by the Gauss-Newton method yields a guaranteed positive semi-definite matrix. Equivalently, an exponential-form update of the Cholesky factor followed by Gram-matrix reconstruction also guarantees positive definiteness. On three classical nonlinear test systems the resulting filter produces lower estimation error than both the original NANO filter and popular members of the Kalman family.
What carries the argument
Gauss-Newton approximation of the log-likelihood Hessian (or the equivalent Cholesky exponential covariance update) that supplies the positive semi-definite term inside the natural-gradient Bayesian update.
If this is right
- The filter can now be run on strongly nonlinear dynamics without ad-hoc safeguards against covariance collapse.
- Natural-gradient moment matching remains usable for both prediction and update while positive definiteness is enforced.
- The same two remedies can be inserted into any other Bayesian filter whose update relies on an indefinite Hessian term.
Where Pith is reading between the lines
- The same positive-definiteness fix might transfer to natural-gradient methods used in variational inference or expectation propagation.
- Real-time implementations could drop monitoring logic that previously checked for negative eigenvalues.
- Scaling tests on high-dimensional state spaces would reveal whether the Jacobian computation remains practical.
Load-bearing premise
The Gauss-Newton Hessian approximation or Cholesky reformulation keeps enough accuracy in the natural gradient step that no new bias or instability appears in general nonlinear problems.
What would settle it
A nonlinear system on which the modified NANO filter still produces diverging covariance or worse mean-squared error than the original NANO filter would falsify the claim.
Figures
read the original abstract
Popular Bayes filters often apply linearization techniques, such as Taylor expansion or stochastic linear regression, to enable the use of the Kalman filter structure, but this can lead to large errors in strongly nonlinear systems. The recently proposed NANO filter addresses this issue by interpreting the prediction and update steps of Bayesian filtering as two distinct optimization problems and solving them through moment matching and natural gradient descent, thereby avoiding model linearization errors. However, the natural gradient update in NANO can occasionally diverge because the posterior covariance in its iteration may lose positive definiteness. Our analysis shows that the posterior covariance is the sum of the inverse prior covariance and the expected Hessian of the log-likelihood function, and that the indefiniteness of the latter term is the root cause of update failure. To address this issue, we propose two remedies. The first approximates the log-likelihood Hessian using the Gauss-Newton method, representing it as the self-adjoint product of the Jacobian of the normalized measurement residual, which is guaranteed to be positive semi-definite. The second reformulates the covariance update as an exponential-form update of the Cholesky factor and reconstructs the covariance via its Gram matrix, which ensures positive definiteness. Experiments on three classical nonlinear systems demonstrate that the proposed NANO filter with guaranteed positive definiteness outperforms popular members of the Kalman filter family and original NANO filter.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two modifications to the recently introduced NANO filter for nonlinear Bayesian filtering. The NANO approach frames prediction and update as optimization problems solved via moment matching and natural gradient descent, avoiding explicit linearization. The authors identify that the posterior covariance equals the inverse prior covariance plus the expected Hessian of the log-likelihood and that indefiniteness of this Hessian can cause divergence. Remedy 1 replaces the Hessian by its Gauss-Newton form J^T J (PSD by construction). Remedy 2 replaces the covariance update by an exponential map on the Cholesky factor followed by Gram-matrix reconstruction. Experiments on three classical nonlinear systems are reported to show that both modified filters outperform members of the Kalman family and the unmodified NANO filter.
Significance. If the directional changes introduced by the two remedies remain benign across nonlinear regimes, the work supplies a practical, theoretically grounded fix for a known failure mode of natural-gradient filters while preserving their avoidance of linearization error. The experimental comparison against standard baselines on well-known benchmarks is a concrete strength; reproducible code or machine-checked derivations would further elevate the contribution.
major comments (3)
- [§3.1] §3.1 (Gauss-Newton remedy): the replacement of E[∇² log p(y|x)] by J^T J is exact only for linear measurements or Gaussian residuals. The manuscript should quantify the resulting change in the natural-gradient direction (e.g., angle between the two vectors or difference in the information matrix) on the three test systems, because this difference, not merely the restoration of positive definiteness, determines whether the observed performance gain is attributable to the fix or to an unintended alteration of the update.
- [§4] §4 (experiments): the headline claim that the modified NANO “outperforms … original NANO filter” rests on the premise that the two remedies do not systematically bias the filter. No table or figure compares the approximated versus (when available) exact natural-gradient steps, nor reports the frequency of indefiniteness in the baseline NANO runs. Without these data the performance advantage cannot be unambiguously attributed to the positive-definiteness guarantee.
- [§2.2] §2.2 (posterior covariance derivation): the statement that the posterior covariance equals inverse prior plus expected Hessian is central to identifying the root cause. The derivation should be expanded to show the precise expectation and any assumptions on the measurement model; if the expectation is taken under the predictive distribution, the resulting matrix is not guaranteed to be the Fisher information, which affects the interpretation of the natural gradient.
minor comments (3)
- Notation: the symbol for the normalized measurement residual should be introduced once and used consistently; its definition appears only after the first use of J.
- Figure 2 (or equivalent): the covariance trajectories for the original NANO should be plotted on the same axes as the modified versions so that the frequency and severity of indefiniteness are visually evident.
- References: the original NANO paper and the classic works on natural-gradient methods in filtering should be cited in the introduction when the method is first described.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable suggestions. We address each of the major comments in detail below and will revise the manuscript accordingly to strengthen the presentation and provide additional supporting analysis.
read point-by-point responses
-
Referee: §3.1 (Gauss-Newton remedy): the replacement of E[∇² log p(y|x)] by J^T J is exact only for linear measurements or Gaussian residuals. The manuscript should quantify the resulting change in the natural-gradient direction (e.g., angle between the two vectors or difference in the information matrix) on the three test systems, because this difference, not merely the restoration of positive definiteness, determines whether the observed performance gain is attributable to the fix or to an unintended alteration of the update.
Authors: We agree that the Gauss-Newton approximation is not exact for general nonlinear measurement models. It serves as a computationally efficient positive semi-definite surrogate to the expected Hessian. To directly address this point, we will include in the revised manuscript a quantitative comparison of the natural gradient directions with and without the approximation for the three test systems. Specifically, we will report the average angle between the original and approximated natural gradient vectors, as well as the relative difference in the information matrices, over the simulation runs. This analysis will help attribute the performance improvements more precisely. We believe this addition will clarify the impact of the remedy. revision: yes
-
Referee: §4 (experiments): the headline claim that the modified NANO “outperforms … original NANO filter” rests on the premise that the two remedies do not systematically bias the filter. No table or figure compares the approximated versus (when available) exact natural-gradient steps, nor reports the frequency of indefiniteness in the baseline NANO runs. Without these data the performance advantage cannot be unambiguously attributed to the positive-definiteness guarantee.
Authors: We appreciate this observation. In the revised version, we will add a table reporting the frequency of positive definiteness violations in the original NANO filter across all experiments and systems. For the comparison of approximated versus exact steps, we note that the exact natural gradient is only defined when the Hessian is positive definite; in cases of indefiniteness, the update fails, which is the issue being remedied. Where the original NANO remains positive definite, we will provide side-by-side comparisons of the gradient steps and resulting performance. This will support the claim that the remedies primarily address the divergence issue without introducing systematic bias. revision: partial
-
Referee: §2.2 (posterior covariance derivation): the statement that the posterior covariance equals inverse prior plus expected Hessian is central to identifying the root cause. The derivation should be expanded to show the precise expectation and any assumptions on the measurement model; if the expectation is taken under the predictive distribution, the resulting matrix is not guaranteed to be the Fisher information, which affects the interpretation of the natural gradient.
Authors: We thank the referee for highlighting the need for a more rigorous derivation. We will expand Section 2.2 to provide a step-by-step derivation of the posterior covariance expression. The expectation is indeed taken with respect to the predictive distribution p(x | y_{1:t-1}). We will clarify the assumptions, including that the prior is approximated as Gaussian and the measurement model is general. Additionally, we will discuss the distinction from the Fisher information matrix, noting that the natural gradient in this context uses the observed information (Hessian) rather than the expected Fisher information, which is appropriate for the local approximation in the filter update. This expanded derivation will better justify the root cause analysis. revision: yes
Circularity Check
No circularity; derivations rest on standard matrix identities and approximations.
full rationale
The paper derives the posterior covariance as inverse prior plus expected log-likelihood Hessian from the natural-gradient formulation of Bayesian filtering, then applies two standard fixes: Gauss-Newton replacement of the Hessian by the self-adjoint product J^T J (PSD by algebraic construction) and Cholesky-factor exponential map whose Gram-matrix reconstruction is positive definite by definition. Neither step redefines its inputs, renames a fitted quantity as a prediction, nor relies on a self-citation chain; the cited matrix properties are external and falsifiable. Experimental comparisons are presented separately and do not reduce to the same quantities used in the derivation. The chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The expected Hessian of the log-likelihood can be approximated by the Gauss-Newton form as the self-adjoint product of the Jacobian of the normalized measurement residual.
- standard math Reconstructing covariance via Gram matrix of the Cholesky factor in exponential form preserves positive definiteness.
Reference graph
Works this paper leans on
-
[1]
Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],
M. S. Grewal and A. P. Andrews, “Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],”IEEE Control Systems Magazine, vol. 30, no. 3, pp. 69–78, 2010
work page 1960
-
[2]
Convolutional unscented kalman filter for multi-object tracking with outliers,
S. Liu, W. Cao, C. Liu, T. Zhang, and S. E. Li, “Convolutional unscented kalman filter for multi-object tracking with outliers,”IEEE Transactions on Intelligent Vehicles, 2024
work page 2024
-
[3]
Robust state estimation for legged robots with dual beta kalman filter,
T. Zhang, W. Cao, C. Liu, T. Zhang, J. Li, and S. E. Li, “Robust state estimation for legged robots with dual beta kalman filter,”IEEE Robotics and Automation Letters, 2025
work page 2025
-
[4]
S. S ¨arkk¨a and L. Svensson,Bayesian filtering and smoothing. Cam- bridge university press, 2023, vol. 17
work page 2023
-
[5]
A new approach to linear filtering and prediction prob- lems,
R. Kalman, “A new approach to linear filtering and prediction prob- lems,”Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960
work page 1960
-
[6]
S. Thrun, “Probabilistic robotics,”Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002
work page 2002
-
[7]
G. L. Smith, S. F. Schmidt, and L. A. McGee,Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. National Aeronautics and Space Administration, 1962, vol. 135
work page 1962
-
[8]
The iterated kalman filter update as a gauss-newton method,
B. M. Bell and F. W. Cathey, “The iterated kalman filter update as a gauss-newton method,”IEEE Transactions on Automatic Control, vol. 38, no. 2, pp. 294–297, 1993
work page 1993
-
[9]
A new approach for filtering nonlinear systems,
S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, “A new approach for filtering nonlinear systems,” inProceedings of 1995 American Control Conference-ACC’95, vol. 3. IEEE, 1995, pp. 1628–1632
work page 1995
-
[10]
Unscented filtering and nonlinear estimation,
S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,”Proceedings of the IEEE, vol. 92, no. 3, pp. 401–422, 2004
work page 2004
-
[11]
Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,
I. Arasaratnam, S. Haykin, and R. J. Elliott, “Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,”Proceedings of the IEEE, vol. 95, no. 5, pp. 953–977, 2007
work page 2007
-
[12]
I. Arasaratnam and S. Haykin, “Cubature Kalman filters,”IEEE Transactions on automatic control, vol. 54, no. 6, pp. 1254–1269, 2009
work page 2009
-
[13]
Posterior linearization filter: Principles and implementation using sigma points,
´A. F. Garc´ıa-Fern´andez, L. Svensson, M. R. Morelande, and S. S¨arkk¨a, “Posterior linearization filter: Principles and implementation using sigma points,”IEEE transactions on signal processing, vol. 63, no. 20, pp. 5561–5573, 2015
work page 2015
-
[14]
Nonlin- ear bayesian filtering with natural gradient gaussian approximation,
W. Cao, T. Zhang, Z. Sun, C. Liu, S. S.-T. Yau, and S. E. Li, “Nonlin- ear bayesian filtering with natural gradient gaussian approximation,” arXiv preprint arXiv:2410.15832, 2024
-
[15]
Algorithm design and compar- ative test of natural gradient gaussian approximation filter,
W. Cao, T. Zhang, and S. E. Li, “Algorithm design and compar- ative test of natural gradient gaussian approximation filter,”IFAC- PapersOnLine, 2025
work page 2025
-
[16]
Tractable structured natural-gradient descent using local parameterizations,
W. Lin, F. Nielsen, K. M. Emtiyaz, and M. Schmidt, “Tractable structured natural-gradient descent using local parameterizations,” in International Conference on Machine Learning. PMLR, 2021, pp. 6680–6691
work page 2021
-
[17]
Inverse unscented kalman filter,
H. Singh, K. V . Mishra, and A. Chattopadhyay, “Inverse unscented kalman filter,”IEEE Transactions on Signal Processing, 2024
work page 2024
-
[18]
A code for unscented kalman filtering on manifolds (ukf-m),
M. Brossard, A. Barrau, and S. Bonnabel, “A code for unscented kalman filtering on manifolds (ukf-m),” in2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 5701–5708
work page 2020
-
[19]
J. Guckenheimer and P. Holmes,Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Springer Science & Business Media, 2013, vol. 42
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.