Natural Gradient Gaussian Approximation Filter with Positive Definiteness Guarantee

Shengbo Eben Li; Tianyi Zhang; Wenhan Cao

arxiv: 2604.10053 · v1 · submitted 2026-04-11 · 📡 eess.SY · cs.SY

Natural Gradient Gaussian Approximation Filter with Positive Definiteness Guarantee

Tianyi Zhang , Wenhan Cao , Shengbo Eben Li This is my paper

Pith reviewed 2026-05-10 16:06 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords NANO filternatural gradientpositive definitenessBayesian filteringnonlinear estimationGauss-Newton approximationCholesky factor

0 comments

The pith

Two modifications ensure the NANO filter keeps positive definite covariance and avoids divergence during natural gradient updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the original NANO filter can fail because the posterior covariance loses positive definiteness when the expected Hessian of the log-likelihood is indefinite. The authors replace that Hessian with a Gauss-Newton approximation built from the self-adjoint product of the Jacobian of the normalized measurement residual, which is positive semi-definite by construction. They also offer a second route that updates the Cholesky factor in exponential form and rebuilds the covariance as its Gram matrix. Either change preserves the natural-gradient treatment of the Bayesian update and prediction steps without linearizing the model. A reader would care because the resulting filter stays stable on strongly nonlinear dynamics where standard Kalman-family methods accumulate large linearization errors.

Core claim

The posterior covariance equals the sum of the inverse prior covariance and the expected Hessian of the log-likelihood; indefiniteness of the Hessian term is the source of occasional divergence. Approximating the Hessian by the Gauss-Newton method yields a guaranteed positive semi-definite matrix. Equivalently, an exponential-form update of the Cholesky factor followed by Gram-matrix reconstruction also guarantees positive definiteness. On three classical nonlinear test systems the resulting filter produces lower estimation error than both the original NANO filter and popular members of the Kalman family.

What carries the argument

Gauss-Newton approximation of the log-likelihood Hessian (or the equivalent Cholesky exponential covariance update) that supplies the positive semi-definite term inside the natural-gradient Bayesian update.

If this is right

The filter can now be run on strongly nonlinear dynamics without ad-hoc safeguards against covariance collapse.
Natural-gradient moment matching remains usable for both prediction and update while positive definiteness is enforced.
The same two remedies can be inserted into any other Bayesian filter whose update relies on an indefinite Hessian term.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same positive-definiteness fix might transfer to natural-gradient methods used in variational inference or expectation propagation.
Real-time implementations could drop monitoring logic that previously checked for negative eigenvalues.
Scaling tests on high-dimensional state spaces would reveal whether the Jacobian computation remains practical.

Load-bearing premise

The Gauss-Newton Hessian approximation or Cholesky reformulation keeps enough accuracy in the natural gradient step that no new bias or instability appears in general nonlinear problems.

What would settle it

A nonlinear system on which the modified NANO filter still produces diverging covariance or worse mean-squared error than the original NANO filter would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.10053 by Shengbo Eben Li, Tianyi Zhang, Wenhan Cao.

**Figure 2.** Figure 2: Estimation errors in FM Demodulator system. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Estimation errors in satellite attitude estimation. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Mean RMSE of satellite attitude system under model mis [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Box plot of RMSE over all MC experiments for Duffing [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Estimation errors in Duffing Oscillator system. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Mean RMSE of Duffing Oscillator system under model mis [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

read the original abstract

Popular Bayes filters often apply linearization techniques, such as Taylor expansion or stochastic linear regression, to enable the use of the Kalman filter structure, but this can lead to large errors in strongly nonlinear systems. The recently proposed NANO filter addresses this issue by interpreting the prediction and update steps of Bayesian filtering as two distinct optimization problems and solving them through moment matching and natural gradient descent, thereby avoiding model linearization errors. However, the natural gradient update in NANO can occasionally diverge because the posterior covariance in its iteration may lose positive definiteness. Our analysis shows that the posterior covariance is the sum of the inverse prior covariance and the expected Hessian of the log-likelihood function, and that the indefiniteness of the latter term is the root cause of update failure. To address this issue, we propose two remedies. The first approximates the log-likelihood Hessian using the Gauss-Newton method, representing it as the self-adjoint product of the Jacobian of the normalized measurement residual, which is guaranteed to be positive semi-definite. The second reformulates the covariance update as an exponential-form update of the Cholesky factor and reconstructs the covariance via its Gram matrix, which ensures positive definiteness. Experiments on three classical nonlinear systems demonstrate that the proposed NANO filter with guaranteed positive definiteness outperforms popular members of the Kalman filter family and original NANO filter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two fixes keep the NANO filter's covariance positive definite, with experiments showing gains over the original and standard alternatives, though the fixes modify the natural gradient step.

read the letter

The paper's main advance is two concrete fixes that stop the NANO filter from losing positive definiteness during the natural gradient update. The first replaces the log-likelihood Hessian with its Gauss-Newton form, which is a product of Jacobians and therefore positive semi-definite. The second updates the Cholesky factor with an exponential map and recovers the covariance as a Gram matrix. Both are straightforward and avoid the divergence the original NANO can hit. Experiments on three classical nonlinear systems show the fixed versions outperform the original NANO and several Kalman variants. That is useful evidence. The soft spot is that these are approximations. The Gauss-Newton version ignores second-derivative terms in the true Hessian, so the update is no longer the exact natural gradient. The Cholesky approach is a retraction that differs from the original covariance update. The abstract does not give quantitative bounds on the resulting change in the information matrix or on approximation error. If those shifts are small on the tested problems, fine, but it would be good to know how general that is. This work is for people who already use or are considering the NANO filter in applications like robotics or signal processing. It is a practical patch rather than a new framework. I would send it to peer review. The problem is real, the solutions are explicit, and the experiments give a starting point for evaluation.

Referee Report

3 major / 3 minor

Summary. The manuscript proposes two modifications to the recently introduced NANO filter for nonlinear Bayesian filtering. The NANO approach frames prediction and update as optimization problems solved via moment matching and natural gradient descent, avoiding explicit linearization. The authors identify that the posterior covariance equals the inverse prior covariance plus the expected Hessian of the log-likelihood and that indefiniteness of this Hessian can cause divergence. Remedy 1 replaces the Hessian by its Gauss-Newton form J^T J (PSD by construction). Remedy 2 replaces the covariance update by an exponential map on the Cholesky factor followed by Gram-matrix reconstruction. Experiments on three classical nonlinear systems are reported to show that both modified filters outperform members of the Kalman family and the unmodified NANO filter.

Significance. If the directional changes introduced by the two remedies remain benign across nonlinear regimes, the work supplies a practical, theoretically grounded fix for a known failure mode of natural-gradient filters while preserving their avoidance of linearization error. The experimental comparison against standard baselines on well-known benchmarks is a concrete strength; reproducible code or machine-checked derivations would further elevate the contribution.

major comments (3)

[§3.1] §3.1 (Gauss-Newton remedy): the replacement of E[∇² log p(y|x)] by J^T J is exact only for linear measurements or Gaussian residuals. The manuscript should quantify the resulting change in the natural-gradient direction (e.g., angle between the two vectors or difference in the information matrix) on the three test systems, because this difference, not merely the restoration of positive definiteness, determines whether the observed performance gain is attributable to the fix or to an unintended alteration of the update.
[§4] §4 (experiments): the headline claim that the modified NANO “outperforms … original NANO filter” rests on the premise that the two remedies do not systematically bias the filter. No table or figure compares the approximated versus (when available) exact natural-gradient steps, nor reports the frequency of indefiniteness in the baseline NANO runs. Without these data the performance advantage cannot be unambiguously attributed to the positive-definiteness guarantee.
[§2.2] §2.2 (posterior covariance derivation): the statement that the posterior covariance equals inverse prior plus expected Hessian is central to identifying the root cause. The derivation should be expanded to show the precise expectation and any assumptions on the measurement model; if the expectation is taken under the predictive distribution, the resulting matrix is not guaranteed to be the Fisher information, which affects the interpretation of the natural gradient.

minor comments (3)

Notation: the symbol for the normalized measurement residual should be introduced once and used consistently; its definition appears only after the first use of J.
Figure 2 (or equivalent): the covariance trajectories for the original NANO should be plotted on the same axes as the modified versions so that the frequency and severity of indefiniteness are visually evident.
References: the original NANO paper and the classic works on natural-gradient methods in filtering should be cited in the introduction when the method is first described.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We address each of the major comments in detail below and will revise the manuscript accordingly to strengthen the presentation and provide additional supporting analysis.

read point-by-point responses

Referee: §3.1 (Gauss-Newton remedy): the replacement of E[∇² log p(y|x)] by J^T J is exact only for linear measurements or Gaussian residuals. The manuscript should quantify the resulting change in the natural-gradient direction (e.g., angle between the two vectors or difference in the information matrix) on the three test systems, because this difference, not merely the restoration of positive definiteness, determines whether the observed performance gain is attributable to the fix or to an unintended alteration of the update.

Authors: We agree that the Gauss-Newton approximation is not exact for general nonlinear measurement models. It serves as a computationally efficient positive semi-definite surrogate to the expected Hessian. To directly address this point, we will include in the revised manuscript a quantitative comparison of the natural gradient directions with and without the approximation for the three test systems. Specifically, we will report the average angle between the original and approximated natural gradient vectors, as well as the relative difference in the information matrices, over the simulation runs. This analysis will help attribute the performance improvements more precisely. We believe this addition will clarify the impact of the remedy. revision: yes
Referee: §4 (experiments): the headline claim that the modified NANO “outperforms … original NANO filter” rests on the premise that the two remedies do not systematically bias the filter. No table or figure compares the approximated versus (when available) exact natural-gradient steps, nor reports the frequency of indefiniteness in the baseline NANO runs. Without these data the performance advantage cannot be unambiguously attributed to the positive-definiteness guarantee.

Authors: We appreciate this observation. In the revised version, we will add a table reporting the frequency of positive definiteness violations in the original NANO filter across all experiments and systems. For the comparison of approximated versus exact steps, we note that the exact natural gradient is only defined when the Hessian is positive definite; in cases of indefiniteness, the update fails, which is the issue being remedied. Where the original NANO remains positive definite, we will provide side-by-side comparisons of the gradient steps and resulting performance. This will support the claim that the remedies primarily address the divergence issue without introducing systematic bias. revision: partial
Referee: §2.2 (posterior covariance derivation): the statement that the posterior covariance equals inverse prior plus expected Hessian is central to identifying the root cause. The derivation should be expanded to show the precise expectation and any assumptions on the measurement model; if the expectation is taken under the predictive distribution, the resulting matrix is not guaranteed to be the Fisher information, which affects the interpretation of the natural gradient.

Authors: We thank the referee for highlighting the need for a more rigorous derivation. We will expand Section 2.2 to provide a step-by-step derivation of the posterior covariance expression. The expectation is indeed taken with respect to the predictive distribution p(x | y_{1:t-1}). We will clarify the assumptions, including that the prior is approximated as Gaussian and the measurement model is general. Additionally, we will discuss the distinction from the Fisher information matrix, noting that the natural gradient in this context uses the observed information (Hessian) rather than the expected Fisher information, which is appropriate for the local approximation in the filter update. This expanded derivation will better justify the root cause analysis. revision: yes

Circularity Check

0 steps flagged

No circularity; derivations rest on standard matrix identities and approximations.

full rationale

The paper derives the posterior covariance as inverse prior plus expected log-likelihood Hessian from the natural-gradient formulation of Bayesian filtering, then applies two standard fixes: Gauss-Newton replacement of the Hessian by the self-adjoint product J^T J (PSD by algebraic construction) and Cholesky-factor exponential map whose Gram-matrix reconstruction is positive definite by definition. Neither step redefines its inputs, renames a fitted quantity as a prediction, nor relies on a self-citation chain; the cited matrix properties are external and falsifiable. Experimental comparisons are presented separately and do not reduce to the same quantities used in the derivation. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; the paper relies on standard properties of Hessians, Jacobians, and Cholesky factors from linear algebra and optimization. No free parameters, ad-hoc axioms, or invented entities are described.

axioms (2)

domain assumption The expected Hessian of the log-likelihood can be approximated by the Gauss-Newton form as the self-adjoint product of the Jacobian of the normalized measurement residual.
Invoked to guarantee positive semi-definiteness; standard in nonlinear least-squares but treated as sufficient here.
standard math Reconstructing covariance via Gram matrix of the Cholesky factor in exponential form preserves positive definiteness.
Follows from properties of positive-definite matrices and matrix exponentials.

pith-pipeline@v0.9.0 · 5540 in / 1353 out tokens · 48656 ms · 2026-05-10T16:06:18.318879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],

M. S. Grewal and A. P. Andrews, “Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],”IEEE Control Systems Magazine, vol. 30, no. 3, pp. 69–78, 2010

work page 1960
[2]

Convolutional unscented kalman filter for multi-object tracking with outliers,

S. Liu, W. Cao, C. Liu, T. Zhang, and S. E. Li, “Convolutional unscented kalman filter for multi-object tracking with outliers,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024
[3]

Robust state estimation for legged robots with dual beta kalman filter,

T. Zhang, W. Cao, C. Liu, T. Zhang, J. Li, and S. E. Li, “Robust state estimation for legged robots with dual beta kalman filter,”IEEE Robotics and Automation Letters, 2025

work page 2025
[4]

S ¨arkk¨a and L

S. S ¨arkk¨a and L. Svensson,Bayesian filtering and smoothing. Cam- bridge university press, 2023, vol. 17

work page 2023
[5]

A new approach to linear filtering and prediction prob- lems,

R. Kalman, “A new approach to linear filtering and prediction prob- lems,”Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960

work page 1960
[6]

Probabilistic robotics,

S. Thrun, “Probabilistic robotics,”Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002

work page 2002
[7]

G. L. Smith, S. F. Schmidt, and L. A. McGee,Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. National Aeronautics and Space Administration, 1962, vol. 135

work page 1962
[8]

The iterated kalman filter update as a gauss-newton method,

B. M. Bell and F. W. Cathey, “The iterated kalman filter update as a gauss-newton method,”IEEE Transactions on Automatic Control, vol. 38, no. 2, pp. 294–297, 1993

work page 1993
[9]

A new approach for filtering nonlinear systems,

S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, “A new approach for filtering nonlinear systems,” inProceedings of 1995 American Control Conference-ACC’95, vol. 3. IEEE, 1995, pp. 1628–1632

work page 1995
[10]

Unscented filtering and nonlinear estimation,

S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,”Proceedings of the IEEE, vol. 92, no. 3, pp. 401–422, 2004

work page 2004
[11]

Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,

I. Arasaratnam, S. Haykin, and R. J. Elliott, “Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,”Proceedings of the IEEE, vol. 95, no. 5, pp. 953–977, 2007

work page 2007
[12]

Cubature Kalman filters,

I. Arasaratnam and S. Haykin, “Cubature Kalman filters,”IEEE Transactions on automatic control, vol. 54, no. 6, pp. 1254–1269, 2009

work page 2009
[13]

Posterior linearization filter: Principles and implementation using sigma points,

´A. F. Garc´ıa-Fern´andez, L. Svensson, M. R. Morelande, and S. S¨arkk¨a, “Posterior linearization filter: Principles and implementation using sigma points,”IEEE transactions on signal processing, vol. 63, no. 20, pp. 5561–5573, 2015

work page 2015
[14]

Nonlin- ear bayesian filtering with natural gradient gaussian approximation,

W. Cao, T. Zhang, Z. Sun, C. Liu, S. S.-T. Yau, and S. E. Li, “Nonlin- ear bayesian filtering with natural gradient gaussian approximation,” arXiv preprint arXiv:2410.15832, 2024

work page arXiv 2024
[15]

Algorithm design and compar- ative test of natural gradient gaussian approximation filter,

W. Cao, T. Zhang, and S. E. Li, “Algorithm design and compar- ative test of natural gradient gaussian approximation filter,”IFAC- PapersOnLine, 2025

work page 2025
[16]

Tractable structured natural-gradient descent using local parameterizations,

W. Lin, F. Nielsen, K. M. Emtiyaz, and M. Schmidt, “Tractable structured natural-gradient descent using local parameterizations,” in International Conference on Machine Learning. PMLR, 2021, pp. 6680–6691

work page 2021
[17]

Inverse unscented kalman filter,

H. Singh, K. V . Mishra, and A. Chattopadhyay, “Inverse unscented kalman filter,”IEEE Transactions on Signal Processing, 2024

work page 2024
[18]

A code for unscented kalman filtering on manifolds (ukf-m),

M. Brossard, A. Barrau, and S. Bonnabel, “A code for unscented kalman filtering on manifolds (ukf-m),” in2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 5701–5708

work page 2020
[19]

Guckenheimer and P

J. Guckenheimer and P. Holmes,Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Springer Science & Business Media, 2013, vol. 42

work page 2013

[1] [1]

Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],

M. S. Grewal and A. P. Andrews, “Applications of kalman filtering in aerospace 1960 to the present [historical perspectives],”IEEE Control Systems Magazine, vol. 30, no. 3, pp. 69–78, 2010

work page 1960

[2] [2]

Convolutional unscented kalman filter for multi-object tracking with outliers,

S. Liu, W. Cao, C. Liu, T. Zhang, and S. E. Li, “Convolutional unscented kalman filter for multi-object tracking with outliers,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024

[3] [3]

Robust state estimation for legged robots with dual beta kalman filter,

T. Zhang, W. Cao, C. Liu, T. Zhang, J. Li, and S. E. Li, “Robust state estimation for legged robots with dual beta kalman filter,”IEEE Robotics and Automation Letters, 2025

work page 2025

[4] [4]

S ¨arkk¨a and L

S. S ¨arkk¨a and L. Svensson,Bayesian filtering and smoothing. Cam- bridge university press, 2023, vol. 17

work page 2023

[5] [5]

A new approach to linear filtering and prediction prob- lems,

R. Kalman, “A new approach to linear filtering and prediction prob- lems,”Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960

work page 1960

[6] [6]

Probabilistic robotics,

S. Thrun, “Probabilistic robotics,”Communications of the ACM, vol. 45, no. 3, pp. 52–57, 2002

work page 2002

[7] [7]

G. L. Smith, S. F. Schmidt, and L. A. McGee,Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. National Aeronautics and Space Administration, 1962, vol. 135

work page 1962

[8] [8]

The iterated kalman filter update as a gauss-newton method,

B. M. Bell and F. W. Cathey, “The iterated kalman filter update as a gauss-newton method,”IEEE Transactions on Automatic Control, vol. 38, no. 2, pp. 294–297, 1993

work page 1993

[9] [9]

A new approach for filtering nonlinear systems,

S. J. Julier, J. K. Uhlmann, and H. F. Durrant-Whyte, “A new approach for filtering nonlinear systems,” inProceedings of 1995 American Control Conference-ACC’95, vol. 3. IEEE, 1995, pp. 1628–1632

work page 1995

[10] [10]

Unscented filtering and nonlinear estimation,

S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,”Proceedings of the IEEE, vol. 92, no. 3, pp. 401–422, 2004

work page 2004

[11] [11]

Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,

I. Arasaratnam, S. Haykin, and R. J. Elliott, “Discrete-time nonlinear filtering algorithms using Gauss–hermite quadrature,”Proceedings of the IEEE, vol. 95, no. 5, pp. 953–977, 2007

work page 2007

[12] [12]

Cubature Kalman filters,

I. Arasaratnam and S. Haykin, “Cubature Kalman filters,”IEEE Transactions on automatic control, vol. 54, no. 6, pp. 1254–1269, 2009

work page 2009

[13] [13]

Posterior linearization filter: Principles and implementation using sigma points,

´A. F. Garc´ıa-Fern´andez, L. Svensson, M. R. Morelande, and S. S¨arkk¨a, “Posterior linearization filter: Principles and implementation using sigma points,”IEEE transactions on signal processing, vol. 63, no. 20, pp. 5561–5573, 2015

work page 2015

[14] [14]

Nonlin- ear bayesian filtering with natural gradient gaussian approximation,

W. Cao, T. Zhang, Z. Sun, C. Liu, S. S.-T. Yau, and S. E. Li, “Nonlin- ear bayesian filtering with natural gradient gaussian approximation,” arXiv preprint arXiv:2410.15832, 2024

work page arXiv 2024

[15] [15]

Algorithm design and compar- ative test of natural gradient gaussian approximation filter,

W. Cao, T. Zhang, and S. E. Li, “Algorithm design and compar- ative test of natural gradient gaussian approximation filter,”IFAC- PapersOnLine, 2025

work page 2025

[16] [16]

Tractable structured natural-gradient descent using local parameterizations,

W. Lin, F. Nielsen, K. M. Emtiyaz, and M. Schmidt, “Tractable structured natural-gradient descent using local parameterizations,” in International Conference on Machine Learning. PMLR, 2021, pp. 6680–6691

work page 2021

[17] [17]

Inverse unscented kalman filter,

H. Singh, K. V . Mishra, and A. Chattopadhyay, “Inverse unscented kalman filter,”IEEE Transactions on Signal Processing, 2024

work page 2024

[18] [18]

A code for unscented kalman filtering on manifolds (ukf-m),

M. Brossard, A. Barrau, and S. Bonnabel, “A code for unscented kalman filtering on manifolds (ukf-m),” in2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 5701–5708

work page 2020

[19] [19]

Guckenheimer and P

J. Guckenheimer and P. Holmes,Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Springer Science & Business Media, 2013, vol. 42

work page 2013