Learning a Contracting KKL-observer with Local Optimal Guarantees

Clara Luc\'ia Galimberti; Daniele Astolfi; Johan Peralez; Madiha Nadri; Vincent Andrieu

arxiv: 2605.13453 · v1 · pith:6ZFXME7Jnew · submitted 2026-05-13 · 📡 eess.SY · cs.SY

Learning a Contracting KKL-observer with Local Optimal Guarantees

Clara Luc\'ia Galimberti , Johan Peralez , Daniele Astolfi , Vincent Andrieu , Madiha Nadri This is my paper

Pith reviewed 2026-05-14 18:57 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords KKL observernonlinear state estimationdeep learningcontracting systemsminimum energy estimatorMortensen observerneural network approximation

0 comments

The pith

Neural networks learn KKL observers that stay globally contracting yet locally match the minimum-energy estimator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a condition on the latent dynamics of a KKL observer so that its error behavior near the true state matches the Mortensen minimum-energy estimator. It then trains neural networks to approximate both the immersion map and those latent dynamics, choosing architectures that force the contraction property to hold. This combination supplies global stability guarantees while improving local accuracy and noise rejection on nonlinear systems. A sympathetic reader cares because standard KKL designs rely on heuristic latent dynamics whose performance is hard to predict, especially under noise.

Core claim

A condition is derived on the latent dynamics such that the KKL observer locally mimics the behavior of a Minimum Energy Estimator. Deep learning is then used to approximate the KKL transformation and the latent dynamics with neural network architectures that structurally enforce the contraction property, yielding both global stability and local optimality.

What carries the argument

Contracting neural network architectures that structurally enforce contraction while approximating the KKL immersion map and latent dynamics.

If this is right

The resulting observer guarantees global asymptotic stability through enforced contraction.
Local error dynamics replicate those of the Mortensen observer near the true state.
Estimation accuracy holds under combined state and measurement noise on standard nonlinear benchmarks.
The method applies to nonlinear systems for which a qualifying latent dynamics can be found.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same contraction-enforcing training could be tried with other immersion-based observer families beyond KKL.
Lightweight versions of the networks might allow embedded real-time implementation on resource-limited hardware.
Validation on physical plants with model mismatch would test whether the local optimality survives unmodeled effects.

Load-bearing premise

A suitable latent dynamics satisfying the local-optimality condition exists for the target system and neural networks with contraction-enforcing architectures can accurately approximate the required maps.

What would settle it

A simulation in which the learned observer's estimate deviates from the true minimum-energy estimate inside a neighborhood of the origin or fails to converge under added state and measurement noise.

Figures

Figures reproduced from arXiv: 2605.13453 by Clara Luc\'ia Galimberti, Daniele Astolfi, Johan Peralez, Madiha Nadri, Vincent Andrieu.

**Figure 1.** Figure 1: True states x(t), estimated states ˆx(t), and noisy measurements y(t) for the Van der Pol system. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 5 0 x1(t) x1(t) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2.5 0.0 2.5 x2(t) x2(t) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 t 5 0 5 y(t) y(t) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: True states x(t), estimated states ˆx(t), and noisy measurements y(t) for the inverse Duffing oscillator. 6.2 Training Domain and Datasets We consider a compact set X that will contain all the trajectories considered for evaluation. For ΣVdP, we set XVdP = [−2.5, 2.5] × [−3.5, 3.5], and for Σduff, we consider Xduff = [−4, 4]2 . For both benchmarks, the training dataset (17) contains N = 5 × 104 state point… view at source ↗

**Figure 3.** Figure 3: Comparison of learned observers for ΣVdP trained with Q = I and different values of R. Trajectories were generated with v(t) = 0 and w(t) ∼ N (0, 0.252 ). Left: R = 1. Right: R = 10−2 . 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 5 0 x1(t) x1(t) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 t 2.5 0.0 x2(t) x2(t) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2.5 0.0 2.5 x1(t) x1(t) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20… view at source ↗

**Figure 4.** Figure 4: Comparison of learned observers for Σduff trained with Q = I and different values of R. Trajectories were generated with v(t) = 0 and w(t) ∼ N (0, 0.252 ). Left: R = 1. Right: R = 10−2 . We observe that predicted states maintain a stabilizing behavior even when initialized far from the true initial conditions, thanks to the contraction property of ˆφ. Furthermore, we observe that the learned estimators a… view at source ↗

read the original abstract

The Kazantzis-Kravaris-Luenberger (KKL) observer provides a general framework for nonlinear state estimation by immersing the system dynamics into a stable linear or nonlinear latent dynamics. However, the performance of KKL observers relies heavily on the specific choice of these latent dynamics, which is often heuristic. This paper proposes a methodology to learn a KKL observer that combines global stability guarantees with local optimality. We derive a condition on the latent dynamics such that the observer locally mimics the behavior of a Minimum Energy Estimator (Mortensen observer). We then employ Deep Learning to approximate the KKL transformation and the latent dynamics, using neural network architectures that structurally enforce the contraction property. The proposed strategy is validated through numerical simulations on nonlinear benchmarks, demonstrating a good performance in the presence of state and measurement noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a local-optimality condition for KKL latent dynamics to mimic the Mortensen observer, then learns both the immersion and dynamics with contraction-enforcing networks, but approximation error leaves the optimality claim empirical.

read the letter

The core contribution is a condition on the latent vector field so that the KKL observer matches the first-order behavior of the Mortensen minimum-energy estimator, followed by neural-network approximations of the immersion map and the latent dynamics that keep contraction by construction. This moves the choice of latent dynamics from heuristic to something tied to an existing estimator, and the structural enforcement of contraction is a clean way to retain the global stability property of KKL observers. The numerical examples on standard nonlinear benchmarks with added noise show the learned observers perform well in practice, which is useful evidence that the approach is at least workable on the systems tested. The main limitation is that the local-optimality condition holds exactly only for the true immersion; once both the map and the dynamics are replaced by networks, the condition is satisfied only up to residual error. No a-priori bound on that residual is given, and it is not shown that the contraction architecture preserves the Lie-derivative identities needed for the local mimicry. Global contraction survives, but the optimality guarantee becomes an empirical outcome rather than a theorem. The abstract supplies no explicit equations or proof outline, so the derivation itself needs verification in the full text. This work is aimed at control theorists who design nonlinear observers and want a more systematic route than pure trial-and-error on the latent dynamics. A reader already familiar with KKL and contraction analysis will see the value in the architectural idea and the benchmark results. I would send it to peer review; the construction is coherent enough to deserve referee scrutiny even if the local-optimality claim requires tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a learning-based KKL observer design that derives a condition on the latent dynamics so the observer locally mimics the first-order behavior of the Mortensen minimum-energy estimator, then approximates both the immersion map and latent dynamics by neural networks whose architectures structurally enforce contraction, thereby combining global stability with local optimality; the approach is illustrated on nonlinear benchmark systems subject to state and measurement noise.

Significance. If the approximation errors can be shown not to destroy the local-optimality condition, the work would supply a principled, data-driven route to tune KKL observers for performance while retaining rigorous contraction-based stability, addressing a long-standing heuristic choice in nonlinear observer design.

major comments (2)

[Section 3] The central construction (Section 3) derives a PDE-like condition that the latent vector field must satisfy for the KKL observer to reproduce the first-order behavior of the Mortensen estimator. Because both the immersion and the latent dynamics are replaced by neural networks, the learned pair satisfies the condition only up to approximation error; no a-priori bound on the residual is supplied, nor is it shown that the contraction architecture preserves the required Lie-derivative identities.
[Section 5] Table 1 and the numerical examples (Section 5) report good empirical performance under noise, yet provide no quantitative metric (e.g., local linearization error or distance to the true Mortensen trajectory near the origin) that would confirm the local-optimality claim survives the neural-network approximation.

minor comments (2)

[Abstract] The abstract states the main claims but contains no equations or quantitative performance figures; adding a brief statement of the derived condition and a representative error metric would improve readability.
[Section 2] Notation for the contraction metric and the neural-network parameterizations is introduced without a consolidated table; a short notation summary would help readers track the structural enforcement arguments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and will revise the manuscript accordingly to strengthen the theoretical and empirical support for the local-optimality claim.

read point-by-point responses

Referee: [Section 3] The central construction (Section 3) derives a PDE-like condition that the latent vector field must satisfy for the KKL observer to reproduce the first-order behavior of the Mortensen estimator. Because both the immersion and the latent dynamics are replaced by neural networks, the learned pair satisfies the condition only up to approximation error; no a-priori bound on the residual is supplied, nor is it shown that the contraction architecture preserves the required Lie-derivative identities.

Authors: We agree that the neural-network approximations satisfy the derived PDE-like condition only up to a nonzero residual and that the original manuscript did not supply an a-priori bound. The contraction architecture guarantees global stability of the observer error independently of approximation quality, because the latent vector field remains strictly contracting by construction. To address the local-optimality concern, the revised Section 3 will include an explicit residual bound derived from the universal approximation theorem on compact sets together with the Lipschitz constants of the system vector field and output map. We will also clarify that the architecture enforces contraction but does not automatically preserve the Lie-derivative identities; these identities are enforced by the training loss, and the same approximation argument yields a bound on the resulting Lie-derivative residual. revision: yes
Referee: [Section 5] Table 1 and the numerical examples (Section 5) report good empirical performance under noise, yet provide no quantitative metric (e.g., local linearization error or distance to the true Mortensen trajectory near the origin) that would confirm the local-optimality claim survives the neural-network approximation.

Authors: We concur that a direct quantitative metric is needed to confirm that local optimality survives the approximation. In the revised Section 5 we will add the local linearization error, defined as the Frobenius norm between the Jacobian of the learned observer at the origin and the corresponding Jacobian of the Mortensen estimator. For the benchmark systems we will also report the integrated state-estimation error over small neighborhoods of the origin, comparing the learned observer against a numerically integrated Mortensen trajectory. These new metrics will be included in an extended Table 1 and in additional plots. revision: yes

Circularity Check

0 steps flagged

Local optimality condition derived from external Mortensen estimator; contraction enforced structurally by NN architecture

full rationale

The central derivation produces a condition on latent dynamics so that the KKL observer locally reproduces first-order behavior of the independent Mortensen minimum-energy estimator. This condition is not defined in terms of the paper's own fitted quantities. Neural-network architectures are chosen to enforce contraction by construction rather than by fitting parameters that would render subsequent predictions tautological. No self-citation chain is load-bearing for the uniqueness or existence of the immersion, and the approximation error is treated as an empirical matter rather than claimed to vanish by definition. The result therefore remains self-contained against external benchmarks and receives only a minor self-citation penalty.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of a latent dynamics satisfying the local-optimality condition and on the capacity of neural networks to approximate the required functions while preserving contraction; no new physical entities are introduced.

free parameters (1)

Neural network weights and biases
Parameters of the networks approximating the KKL transformation and latent dynamics are fitted during training.

axioms (2)

domain assumption The nonlinear system admits a KKL immersion into contracting latent dynamics.
Core assumption of the KKL observer framework invoked throughout the abstract.
domain assumption The derived condition on latent dynamics ensures local equivalence to the Mortensen minimum-energy estimator.
Load-bearing premise for the local-optimality claim.

pith-pipeline@v0.9.0 · 5450 in / 1333 out tokens · 53500 ms · 2026-05-14T18:57:21.558890+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

and Praly, L

Andrieu, V. and Praly, L. (2006). On the existence of a Kazantzis–Kravaris/Luenberger observer.SIAM Journal on Control and Optimization, 45(2), 432–456. Beik Mohammadi, H., Hauberg, S., Arvanitidis, G.,

work page 2006
[2]

Figueroa, N., Neumann, G., and Rozo, L. (2024). Neural contractive dynamical systems. InICLR, 49097–49120

work page 2024
[3]

Bernard, P., Andrieu, V., and Astolfi, D. (2022). Observer design for continuous-time dynamical systems.Annual Reviews in Control, 53, 224–248

work page 2022
[4]

Brivadis, L., Andrieu, V., Bernard, P., and Serres, U. (2023). Further remarks on KKL observers.Systems & Control Letters, 172, 105429

work page 2023
[5]

Buisson-Fenet, M., Bahr, L., Morgenthaler, V., and Di Meglio, F. (2023). Towards gain tuning for numerical KKL observers.IFAC-PapersOnLine, 56(2), 4061–4067

work page 2023
[6]

and Piga, D

Forgione, M. and Piga, D. (2021). dynonet: A neural network architecture for learning dynamical systems. International Journal of Adaptive Control and Signal Processing, 35(4), 612–626

work page 2021
[7]

Gu, A., Goel, K., and R´ e, C. (2022). Efficiently modeling long sequences with structured state spaces. InInterna- tional Conference on Learning Representations, 1–27

work page 2022
[8]

Martinelli, D., Galimberti, C.L., Manchester, I.R., Furi- eri, L., and Ferrari-Trecate, G. (2023). Unconstrained parametrization of dissipative and contracting neural or- dinary differential equations. In62nd IEEE Conference on Decision and Control, 3043–3048

work page 2023
[9]

and Gatsis, K

Miao, K. and Gatsis, K. (2023). Learning robust state observers using neural ODEs. InLearning for Dynamics and Control Conference, 208–219. PMLR

work page 2023
[10]

Mortensen, R.E. (1968). Maximum-likelihood recursive nonlinear filtering.Journal of Optimization Theory and Applications, 2(6), 386–394

work page 1968
[11]

Niazi, M.U.B., Cao, J., Sun, X., Das, A., and Johansson, K.H. (2022). Learning-based design of Luenberger ob- servers for autonomous nonlinear systems. InAmerican Control Conference

work page 2022
[12]

Praly, L. (2024). On the existence of KKL observers with nonlinear contracting dynamics.IFAC-PapersOnLine, 58(21), 262–267

work page 2024
[13]

and Nadri, M

Peralez, J. and Nadri, M. (2021). Deep learning-based Luenberger observer design for discrete-time nonlinear systems. In60th IEEE Conference on Decision and Control, 4370–4375

work page 2021
[14]

and Nadri, M

Peralez, J. and Nadri, M. (2024). Deep model-free KKL observer: A switching approach. In6th Annual Learning for Dynamics & Control Conference, 929–940. PMLR

work page 2024
[15]

Ramos, L.d.C., Di Meglio, F., Morgenthaler, V., da Silva, L.F.F., and Bernard, P. (2020). Numerical design of Luenberger observers for nonlinear systems. In59th IEEE Conference on Decision and Control, 5435–5442

work page 2020
[16]

Revay, M., Wang, R., and Manchester, I.R. (2023). Re- current equilibrium networks: Flexible dynamic models with guaranteed stability and robustness.IEEE Trans. Autom. Control, 69(5), 2855–2870

work page 2023
[17]

and Praly, L

Sanfelice, R.G. and Praly, L. (2015). Solution of a riccati equation for the design of an observer contracting a riemannian distance. In54th IEEE Conference on Decision and Control, 4996–5001. IEEE

work page 2015
[18]

Zakwan, M., Xu, L., and Ferrari-Trecate, G. (2022). Ro- bust classification using contractive Hamiltonian neural ODEs.IEEE Control Systems Letters, 7, 145–150. Appendix A. REVIEW OF THE LINEAR CASE It is instructive to instantiate the proposed framework for a linear system ˙x=Ax, y=Cx. Consistent with Section 4, letP≻0 be the steady-state information matr...

work page 2022

[1] [1]

and Praly, L

Andrieu, V. and Praly, L. (2006). On the existence of a Kazantzis–Kravaris/Luenberger observer.SIAM Journal on Control and Optimization, 45(2), 432–456. Beik Mohammadi, H., Hauberg, S., Arvanitidis, G.,

work page 2006

[2] [2]

Figueroa, N., Neumann, G., and Rozo, L. (2024). Neural contractive dynamical systems. InICLR, 49097–49120

work page 2024

[3] [3]

Bernard, P., Andrieu, V., and Astolfi, D. (2022). Observer design for continuous-time dynamical systems.Annual Reviews in Control, 53, 224–248

work page 2022

[4] [4]

Brivadis, L., Andrieu, V., Bernard, P., and Serres, U. (2023). Further remarks on KKL observers.Systems & Control Letters, 172, 105429

work page 2023

[5] [5]

Buisson-Fenet, M., Bahr, L., Morgenthaler, V., and Di Meglio, F. (2023). Towards gain tuning for numerical KKL observers.IFAC-PapersOnLine, 56(2), 4061–4067

work page 2023

[6] [6]

and Piga, D

Forgione, M. and Piga, D. (2021). dynonet: A neural network architecture for learning dynamical systems. International Journal of Adaptive Control and Signal Processing, 35(4), 612–626

work page 2021

[7] [7]

Gu, A., Goel, K., and R´ e, C. (2022). Efficiently modeling long sequences with structured state spaces. InInterna- tional Conference on Learning Representations, 1–27

work page 2022

[8] [8]

Martinelli, D., Galimberti, C.L., Manchester, I.R., Furi- eri, L., and Ferrari-Trecate, G. (2023). Unconstrained parametrization of dissipative and contracting neural or- dinary differential equations. In62nd IEEE Conference on Decision and Control, 3043–3048

work page 2023

[9] [9]

and Gatsis, K

Miao, K. and Gatsis, K. (2023). Learning robust state observers using neural ODEs. InLearning for Dynamics and Control Conference, 208–219. PMLR

work page 2023

[10] [10]

Mortensen, R.E. (1968). Maximum-likelihood recursive nonlinear filtering.Journal of Optimization Theory and Applications, 2(6), 386–394

work page 1968

[11] [11]

Niazi, M.U.B., Cao, J., Sun, X., Das, A., and Johansson, K.H. (2022). Learning-based design of Luenberger ob- servers for autonomous nonlinear systems. InAmerican Control Conference

work page 2022

[12] [12]

Praly, L. (2024). On the existence of KKL observers with nonlinear contracting dynamics.IFAC-PapersOnLine, 58(21), 262–267

work page 2024

[13] [13]

and Nadri, M

Peralez, J. and Nadri, M. (2021). Deep learning-based Luenberger observer design for discrete-time nonlinear systems. In60th IEEE Conference on Decision and Control, 4370–4375

work page 2021

[14] [14]

and Nadri, M

Peralez, J. and Nadri, M. (2024). Deep model-free KKL observer: A switching approach. In6th Annual Learning for Dynamics & Control Conference, 929–940. PMLR

work page 2024

[15] [15]

Ramos, L.d.C., Di Meglio, F., Morgenthaler, V., da Silva, L.F.F., and Bernard, P. (2020). Numerical design of Luenberger observers for nonlinear systems. In59th IEEE Conference on Decision and Control, 5435–5442

work page 2020

[16] [16]

Revay, M., Wang, R., and Manchester, I.R. (2023). Re- current equilibrium networks: Flexible dynamic models with guaranteed stability and robustness.IEEE Trans. Autom. Control, 69(5), 2855–2870

work page 2023

[17] [17]

and Praly, L

Sanfelice, R.G. and Praly, L. (2015). Solution of a riccati equation for the design of an observer contracting a riemannian distance. In54th IEEE Conference on Decision and Control, 4996–5001. IEEE

work page 2015

[18] [18]

Zakwan, M., Xu, L., and Ferrari-Trecate, G. (2022). Ro- bust classification using contractive Hamiltonian neural ODEs.IEEE Control Systems Letters, 7, 145–150. Appendix A. REVIEW OF THE LINEAR CASE It is instructive to instantiate the proposed framework for a linear system ˙x=Ax, y=Cx. Consistent with Section 4, letP≻0 be the steady-state information matr...

work page 2022