arxiv: 2603.19910 · v2 · submitted 2026-03-20 · 📡 eess.SY · cs.SY

Recognition: 2 theorem links

· Lean Theorem

Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering

Ondrej Straka , Felipe Giraldo-Grueso , Renato Zanetti

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:39 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords nonlinear Bayesian filteringreinforcement learningadaptive parametersunscented Kalman filterstochastic integration filterstate estimationsequential decision making

0 comments

The pith

Reinforcement learning trains policies to choose filter parameters dynamically in nonlinear Bayesian estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the selection of tunable parameters in nonlinear filters, such as scaling factors or iteration counts, as a sequential decision process where each choice affects immediate accuracy and the quality of all future estimates. It trains reinforcement-learning policies that pick parameters at every step using a reward that penalizes both estimation error and inconsistency. Experiments apply the method to the unscented Kalman filter and the stochastic integration filter and report gains in both accuracy and consistency over fixed-parameter baselines. A reader would care because approximation errors in these filters accumulate across time steps in tracking and navigation tasks, and learned adaptation removes the need for manual or heuristic tuning.

Core claim

The paper claims that casting adaptive parameter selection for nonlinear Bayesian filters as a Markov decision process and solving it via reinforcement learning yields policies that improve both the quality and the consistency of state estimates, as shown by experiments with the unscented Kalman filter and the stochastic integration filter.

What carries the argument

A reinforcement-learning agent that selects filter parameters at each time step according to a reward combining estimation accuracy and consistency.

If this is right

The same learned policy can be reused across multiple time steps without recomputation.
Any parameter-dependent nonlinear filter becomes eligible for the same training procedure.
Consistency gains reduce the chance of filter divergence over long sequences.
Average computational cost can stay the same or drop while accuracy rises.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be extended to particle filters by treating the number of particles as one of the selectable actions.
If the training scenarios miss important operating regimes, the policy may require occasional online adaptation or a richer reward signal.
The approach links parameter tuning directly to long-horizon estimation performance rather than single-step error.

Load-bearing premise

A reward defined on accuracy and consistency will produce policies that generalize to new time-varying nonlinearities without retraining or instability.

What would settle it

Apply a trained policy to a filtering problem whose degree of nonlinearity exceeds the training distribution and check whether the consistency metrics fall below those of a well-tuned fixed-parameter filter.

Figures

Figures reproduced from arXiv: 2603.19910 by Felipe Giraldo-Grueso, Ondrej Straka, Renato Zanetti.

**Figure 5.** Figure 5: Time averaged ANEES for UKF and CTM. Time-averaged RMSE 46 48 50 52 54 Tim e-a v era g e d A N E E S 11 12 13 14 15 16 17 18 RMSE–ANEES trade-off 0 1 2 3 4 5 P ara meter value Fixed Adaptive Default Myopic Optimal [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 2.** Figure 2: Time averaged ANEES for UKF and UNGM. Time-averaged RMSE 6 7 8 9 10 11 12 Tim e-averaged A N E E S 0 500 1000 1500 Pareto RMSE–ANEES trade-off 0 0.5 1 1.5 2 2.5 44.35 3.5 5 0 1 2 3 4 5 Parameter value Fixed Adaptive Default Myopic Optimal [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Time averaged ANEES vs. RMSE for UKF and UNGM. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Time averaged RMSE for UKF and CTM. Parameter value 0 1 2 3 4 5 A N E E S tim e a v era g e 11 12 13 14 15 16 17 18 UKF - Time-averaged ANEES vs fixed parameter Adaptive Default Myopic Optimal Fixed Adaptive Default Myopic Optimal [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

For many nonlinear Bayesian state estimation problems, the posterior recursion is not analytically tractable, leading to algorithms that are influenced by numerical approximation errors. These algorithms depend on parameters that affect the approximation's accuracy and computational cost. The parameters include, for example, the number of particles, scaling parameters, and the number of iterations in iterative computations. Typically, these parameters are fixed or adjusted heuristically, although the approximation accuracy can change over time with the local degree of nonlinearity and uncertainty. The approximation errors introduced at a time step propagate through subsequent updates, affecting the accuracy, consistency, and robustness of future estimates. This paper presents adaptive parameter selection in nonlinear Bayesian filtering as a sequential decision-making problem, where parameters influence not only the immediate estimation outcome but also the future estimates. The decision-making problem is addressed using reinforcement learning to learn adaptive parameter policies for nonlinear Bayesian filters. Experiments with the unscented Kalman filter and stochastic integration filter demonstrate that the learned policies improve both estimate quality and consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper casts parameter tuning in nonlinear Bayesian filters as an RL problem and reports that learned policies improve UKF and SIF accuracy plus consistency, but the abstract supplies almost no experimental numbers or setup details.

read the letter

The paper's main contribution is to treat the choice of filter parameters—such as UKF scaling factors or iteration counts—as a sequential decision problem solved with reinforcement learning. The motivation is clear: approximation errors at one step affect later estimates, so a policy that reacts to the current degree of nonlinearity and uncertainty can keep both immediate accuracy and long-term consistency under control. Experiments on the unscented Kalman filter and stochastic integration filter are presented as evidence that the learned policies outperform fixed-parameter versions on both metrics. This framing is a natural extension of earlier adaptive-filtering work rather than a complete reinvention, and it has the advantage of letting the reward function directly encode the propagation effect instead of relying on separate heuristics. The approach stays grounded in existing filter algorithms and standard RL methods, which keeps the implementation burden low for someone already running these filters. The clearest limitation is the experimental reporting. The abstract states that the policies improve quality and consistency but gives no quantitative deltas, no definition of the reward, no training procedure, and no information on how generalization was checked across different nonlinearity regimes. Without those pieces it is hard to tell whether the gains are robust or tied to the specific training trajectories. If the full paper contains clear tables, ablation studies, and held-out test results, that gap closes; otherwise the central claim rests on an unreported demonstration. The work is aimed at practitioners who already implement nonlinear filters for tracking or navigation and want a data-driven way to adjust parameters on the fly. A reader familiar with both Kalman-type methods and basic RL will understand the setup quickly and could test the idea on their own data. I would send the paper to peer review. The core idea is coherent and the application is practical enough that referees can assess the experiments and ask for the missing details rather than rejecting outright.

Referee Report

1 major / 0 minor

Summary. The manuscript frames adaptive parameter selection (e.g., scaling parameters, iteration counts) in nonlinear Bayesian filters as a sequential decision-making problem solved via reinforcement learning. The goal is to learn policies that account for how approximation errors propagate across time steps and thereby improve both immediate and future estimate quality and consistency. Experiments with the unscented Kalman filter and stochastic integration filter are reported to demonstrate that the learned policies outperform fixed-parameter baselines.

Significance. If the empirical results hold under scrutiny, the work supplies a principled, data-driven alternative to heuristic parameter tuning in nonlinear filtering. By optimizing for long-horizon estimation metrics rather than instantaneous accuracy alone, the approach could improve robustness in time-varying nonlinear regimes without requiring manual retuning. The formulation is a natural application of RL to filtering and, if reproducible, would constitute a concrete methodological contribution.

major comments (1)

[Abstract and §4] Abstract and §4 (Experiments): the central claim that 'learned policies improve both estimate quality and consistency' is presented without any quantitative metrics, baseline comparisons, reward-function definition, training hyperparameters, or description of how generalization to unseen trajectories was assessed. Because the performance benefit is the sole empirical support for the method, the absence of these details renders the claim unevaluable and load-bearing for acceptance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and for identifying the need to strengthen the empirical presentation. We agree that the performance claims require more explicit quantitative support to be fully evaluable and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim that 'learned policies improve both estimate quality and consistency' is presented without any quantitative metrics, baseline comparisons, reward-function definition, training hyperparameters, or description of how generalization to unseen trajectories was assessed. Because the performance benefit is the sole empirical support for the method, the absence of these details renders the claim unevaluable and load-bearing for acceptance.

Authors: We acknowledge that the current manuscript summarizes the experimental outcomes without providing the requested quantitative details. In the revised version we will expand the abstract to report key metrics (RMSE and NEES) and will rewrite §4 to include: (i) explicit tables comparing learned policies against fixed-parameter baselines on both filters, (ii) the precise reward-function definition (a weighted combination of estimation error and consistency penalty), (iii) the complete list of RL training hyperparameters, and (iv) the train/test split protocol used to evaluate generalization to unseen trajectories. These additions will make the performance benefit directly verifiable while preserving the original experimental design. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper formulates adaptive parameter selection for nonlinear filters as a reinforcement learning problem and reports empirical improvements on held-out test trajectories for UKF and SIF. No derivation chain, equations, or self-citations reduce the central performance claims to fitted inputs or self-definitions by construction; the reported gains are measured outcomes rather than tautological restatements of the reward or training objective. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a well-defined reward signal that correctly trades off immediate accuracy against long-term consistency; this reward is not specified in the abstract and must be treated as an ad-hoc modeling choice. No new physical entities are introduced.

axioms (1)

domain assumption The approximation error at each time step can be meaningfully quantified by a scalar reward that the reinforcement-learning agent can optimize.
Implicit in the decision to cast parameter selection as an RL problem; appears in the abstract's description of the sequential decision-making formulation.

pith-pipeline@v0.9.0 · 5469 in / 1356 out tokens · 38053 ms · 2026-05-15T08:39:18.924732+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The decision-making problem is addressed using reinforcement learning to learn adaptive parameter policies for nonlinear Bayesian filters. Experiments with the unscented Kalman filter and stochastic integration filter demonstrate that the learned policies improve both estimate quality and consistency.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

S ¨arkk¨a,Bayesian Filtering and Smoothing

S. S ¨arkk¨a,Bayesian Filtering and Smoothing. Cambridge University Press, 2013

work page 2013
[2]

Unscented filtering and nonlinear estimation,

S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,”IEEE Proceedings, vol. 92, no. 3, pp. 401–421, 2004

work page 2004
[3]

Stochastic integration filter,

J. Dun ´ık, O. Straka, and M.ˇSimandl, “Stochastic integration filter,”IEEE Transactions on Automatic Control, vol. 58, no. 6, pp. 1561–1566, 2013

work page 2013
[4]

Sequential data assimilation with a nonlinear quasi- geostrophic model using Monte Carlo methods to forecast error statis- tics,

G. Evensen, “Sequential data assimilation with a nonlinear quasi- geostrophic model using Monte Carlo methods to forecast error statis- tics,”Journal of Geophysical Research: Oceans, vol. 99, no. C5, pp. 10 143–10 162, 1994

work page 1994
[5]

A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts,

J. Anderson and S. Anderson, “A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts,”Monthly Weather Review, vol. 127, no. 12, pp. 2741–2758, 1999

work page 1999
[6]

Kernel-based ensemble Gaussian mixture filtering for orbit determination with sparse data,

S. Yun, R. Zanetti, and B. A. Jones, “Kernel-based ensemble Gaussian mixture filtering for orbit determination with sparse data,”Advances in Space Research, vol. 69, no. 12, pp. 4179–4197, June 2022, https://doi.org/10.1016/j.asr.2022.03.041

work page doi:10.1016/j.asr.2022.03.041 2022
[7]

Efficient Gaussian mixture filters based on transition density approximation,

O. Straka and U. D. Hanebeck, “Efficient Gaussian mixture filters based on transition density approximation,” in2025 28th International Conference on Information Fusion (FUSION), 2025, pp. 1–8

work page 2025
[8]

Doucet, N

A. Doucet, N. De Freitas, and N. Gordon, Eds.,Sequential Monte Carlo Methods in Practice. Springer, 2001, (Ed. Doucet A., de Freitas N., and Gordon N.)

work page 2001
[9]

Sampling density design for particle filters,

M. Simandl and O. Straka, “Sampling density design for particle filters,” inProceedings of the 13th IFAC Symposium on System Identification, Rotterdam, 2003

work page 2003
[10]

Design of efficient point-mass filter with illustration in terrain aided navigation,

J. Matou ˇsek, J. Dun´ık, and M. Brandner, “Design of efficient point-mass filter with illustration in terrain aided navigation,” in26th International Conference on Information Fusion (FUSION), Charleston, USA, 2023

work page 2023
[11]

Unscented Kalman filter with advanced adaptation of scaling parameter,

O. Straka, J. Dun ´ık, and M. ˇSimandl, “Unscented Kalman filter with advanced adaptation of scaling parameter,”Automatica, vol. 50, no. 10, pp. 2657–2664, 2014

work page 2014
[12]

Unscented Kalman filter: As- pects and adaptive setting of scaling parameter,

J. Dunik, M. Simandl, and O. Straka, “Unscented Kalman filter: As- pects and adaptive setting of scaling parameter,”IEEE Transactions on Automatic Control, vol. 57, no. 9, pp. 2411–2416, 2012

work page 2012
[13]

On nonlinearity measuring aspects of stochastic integration filter,

J. Havl ´ık, O. Straka, J. Dun´ık, and J. Ajgl, “On nonlinearity measuring aspects of stochastic integration filter,” inProceedings of the 13th International Conference on Informatics in Control, Automation and Robotics, ser. ICINCO 2016. Setubal, PRT: SCITEPRESS - Science and Technology Publications, Lda, 2016, p. 353–361

work page 2016
[14]

An adaptive covariance parameterization technique for the ensemble Gaussian mixture filter,

A. A. Popov and R. Zanetti, “An adaptive covariance parameterization technique for the ensemble Gaussian mixture filter,”SIAM Journal on Scientific Computing, vol. 46, no. 3, 2024

work page 2024
[15]

RNN- UKF: Enhancing hyperparameter auto-tuning in unscented Kalman filters through recurrent neural networks,

Z. Fan, D. Shen, Y . Bao, K. Pham, E. Blasch, and G. Chen, “RNN- UKF: Enhancing hyperparameter auto-tuning in unscented Kalman filters through recurrent neural networks,” in2024 27th International Conference on Information Fusion (FUSION), 2024, pp. 1–8

work page 2024
[16]

Complete offline tuning of the unscented Kalman filter,

L. A. Scardua and J. J. da Cruz, “Complete offline tuning of the unscented Kalman filter,”Automatica, vol. 80, pp. 54–61, 2017

work page 2017
[17]

Maneuvering target tracking using q-learning based kalman filter,

Z. Bekhtaoui, A. Meche, M. Dahmani, and K. A. Meraim, “Maneuvering target tracking using q-learning based kalman filter,” in2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE- B), 2017, pp. 1–5

work page 2017
[18]

Q-learning- based noise covariance matrices adaptation in Kalman filter for inertial navigation,

G. Shaaban, H. Fourati, C. Prieur, and A. Kibangou, “Q-learning- based noise covariance matrices adaptation in Kalman filter for inertial navigation,”IFAC-PapersOnLine, vol. 58, no. 21, pp. 96–101, 2024, 4th IFAC Conference on Modelling, Identification and Control of Nonlinear Systems MICNON 2024

work page 2024
[19]

Lyapunov-based reinforcement learning state estimator,

L. Hu, C. Wu, and W. Pan, “Lyapunov-based reinforcement learning state estimator,” 2021. [Online]. Available: https://arxiv.org/abs/2010. 13529

work page 2021
[20]

Kalmannet: Data-driven kalman filtering,

G. Revach, N. Shlezinger, R. J. G. van Sloun, and Y . C. Eldar, “Kalmannet: Data-driven kalman filtering,” inICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3905–3909

work page 2021
[21]

Danse: Data-driven non-linear state estimation of model-free process in unsupervised learning setup,

A. Ghosh, A. Honor ´e, and S. Chatterjee, “Danse: Data-driven non-linear state estimation of model-free process in unsupervised learning setup,” IEEE Transactions on Signal Processing, vol. 72, pp. 1824–1838, 2024

work page 2024
[22]

Generalized correntropy for robust adaptive filtering,

B. Chen, L. Xing, H. Zhao, N. Zheng, and J. C. Prı´ncipe, “Generalized correntropy for robust adaptive filtering,”IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3376–3387, 2016

work page 2016
[23]

D. P. Bertsekas,Dynamic programming and optimal control. Athena Scientific, 2007

work page 2007
[24]

M. L. Puterman,Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014
[25]

Distributed design for active fault diag- nosis,

O. Straka and I. Pun ˇcoch´aˇr, “Distributed design for active fault diag- nosis,”International Journal of Systems Science, vol. 53, no. 3, pp. 562–574, 2022

work page 2022
[26]

A bayesian framework for reinforcement learning,

M. J. A. Strens, “A bayesian framework for reinforcement learning,” inProceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p. 943–950

work page 2000
[27]

Huber-based divided difference filter- ing,

C. D. Karlgaard and H. Schaub, “Huber-based divided difference filter- ing,”Journal of Guidance, Control, and Dynamics, vol. 30, no. 3, pp. 885–891, 2007

work page 2007
[28]

Stochastic integration student’s-t filter,

O. Straka and J. Dun ´ık, “Stochastic integration student’s-t filter,” in2017 20th International Conference on Information Fusion (Fusion), 2017

work page 2017
[29]

Optimal control of markov processes with incomplete state information,

K. ˚Astr¨om, “Optimal control of markov processes with incomplete state information,”Journal of Mathematical Analysis and Applications, vol. 10, no. 1, pp. 174–205, 1965

work page 1965
[30]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. The MIT Press, 2018

work page 2018
[31]

Average-reward soft actor-critic,

J. Adamczyk, V . Makarenko, S. Tiomkin, and R. V . Kulkarni, “Average-reward soft actor-critic,” 2025. [Online]. Available: https: //arxiv.org/abs/2501.09080

work page arXiv 2025
[32]

Adaptive choice of scaling pa- rameter in derivative-free local filters,

J. Dun ´ık, M. ˇSimandl, and O. Straka, “Adaptive choice of scaling pa- rameter in derivative-free local filters,” in2010 International Conference on Information Fusion, Edinburgh, Great Britain, 2010

work page 2010
[33]

Measuring estimator’s credibility: Noncredibility index,

X. R. Li and Z. Zhao, “Measuring estimator’s credibility: Noncredibility index,” inProceedings of 2006 International Conference on Information Fusion, Florence, Italy, 2006

work page 2006