arxiv: 2605.06934 · v1 · submitted 2026-05-07 · 💻 cs.LG

Learned Lyapunov Shielding for Adaptive Control

Giansalvo Cirrincione , Adriano Fagiolini This is my paper

Pith reviewed 2026-05-11 00:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords Lyapunov shieldingadaptive controlsafety filterphysics-informed neural networkEuler-Lagrange systemsrobot manipulatorsreinforcement learning

0 comments

The pith

A closed-form safety filter from a learned Lyapunov function lets adaptive controllers safely incorporate neural policies and unmodeled dynamics estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors aim to show that a learned Lyapunov function can be used to construct a closed-form safety filter that safely incorporates learned policy corrections and dynamics estimates into classical adaptive control for Euler-Lagrange systems. If correct, this would let robot controllers improve on nominal performance under friction and payload changes while keeping formal stability guarantees and avoiding online quadratic programs. The construction relies on a structured quadratic certificate whose decay condition defines an explicit projection, and the paper supplies feasibility, stability, convergence, and generalization results together with manipulator experiments.

Core claim

By parameterizing a quadratic Lyapunov function with Cholesky factors and combining it with a residual reinforcement-learning policy and a physics-informed network, the authors obtain a closed-form filter that enforces the affine decay constraint without an online solver. They prove global feasibility whenever a drift-decay condition holds, exponential stability under exact shielding with a robustness margin set by the network error, almost-sure convergence of the three-timescale learning to a KKT point, and a PAC bound on the certificate. On a 2-DOF manipulator the approach reduces tracking error by 41% with nominal friction and 24% with aggressive friction, and the method scales cleanly to

What carries the argument

The single affine constraint on the time derivative of the Cholesky-parameterized Lyapunov function, which yields an explicit projection that maps any policy torque onto the safe set.

If this is right

Global feasibility of the filter holds under the stated drift-decay condition on the control-degeneracy set.
Exponential stability is guaranteed under exact shielding, with a robust extension whose margin depends on the PINN approximation error remaining small.
The three-timescale policy-certificate-multiplier updates converge almost surely to a KKT point.
A PAC generalization bound holds for the certificate over compact sets.
Tracking error on a 2-DOF manipulator drops 41% under nominal friction and 24% under aggressive friction when the learned certificate is active.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The warm-start pathology noted in the 7-DOF study suggests that practical deployment may require periodic certificate re-initialization to avoid transient performance loss.
The same closed-form filter structure could be paired with other adaptive baselines beyond the Slotine-Li controller, provided the drift-decay condition can be verified.
Performance gains over exact model-based control appear only inside the training distribution; outside it the learned components are not guaranteed to help.

Load-bearing premise

The drift-decay condition on the control-degeneracy set must hold for the closed-form filter to remain globally feasible.

What would settle it

A trajectory on which the filter projection becomes infeasible even though the drift-decay condition is satisfied, or a case of instability when the PINN approximation error lies below the derived stability margin.

Figures

Figures reproduced from arXiv: 2605.06934 by Adriano Fagiolini, Giansalvo Cirrincione.

**Figure 1.** Figure 1: Architecture overview. The SAC policy πϕ proposes torques projected onto the safe half-space via a closed-form QP derived from the learned Lyapunov function Vψ; the PINN fθ supplies the dynamics model. Solid arrows: forward flow; dashed: dual path. 2.1 Adaptive control of manipulators The passivity-based adaptive controller of Slotine and Li [26] exploits the linear-in-parameters property of rigid-body dyn… view at source ↗

**Figure 2.** Figure 2: Tracking RMSE on payload range [0.2, 1.0] kg under nominal (left) and aggressive friction (right). Dotted line: training centroid p = 0.4 kg. +32.0% at p = 0.2 under aggressive friction (0.272 vs. 0.400), and +13.4% at p = 0.6 under aggressive friction (0.389 vs. 0.450). These gains arise even though the proposed controller is a residual correction to the Slotine–Li signal, demonstrating that the learned L… view at source ↗

read the original abstract

We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function \(V_\psi\) whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic policy that adds bounded torque corrections to the analytic baseline, and a physics-informed neural network that estimates unmodeled dynamics. A closed-form safety filter, derived from the single affine constraint \(\dot V_\psi + \alpha V_\psi \le 0\), projects every policy output onto the safe set without requiring an online QP solver. We prove: global feasibility of the filter under a drift-decay condition on the control-degeneracy set; exponential stability under exact shielding, with a robust extension whose margin depends on the PINN approximation error; almost-sure convergence of the three-timescale policy--certificate--multiplier updates to a KKT point; and a PAC generalization bound for the certificate over compacts. On a 2-DOF manipulator with nonlinear friction and variable payload, the learned certificate accounts for most of the empirical gain: tracking error drops by 41\% on nominal friction and 24\% on aggressive friction at the centroid of the training distribution. A 7-DOF scalability study on a Franka Emika Panda confirms clean convergence of the full pipeline at industrial scale, identifies the conditions under which gains over exact model-based baselines should and should not be expected, and documents a warm-start pathology of the learned certificate that has practical implications for deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines a Cholesky-structured learned Lyapunov, residual SAC corrections, and a closed-form single-constraint filter for the Slotine-Li controller, but the global feasibility claim depends on an unverified drift-decay condition that is not checked for the trained networks.

read the letter

The core contribution is a pipeline that augments the classic Slotine-Li adaptive law with three learned pieces: a quadratic Lyapunov function kept positive definite by Cholesky parameterization, a residual Soft Actor-Critic policy that supplies bounded torque corrections, and a physics-informed network that estimates the unmodeled dynamics. The safety filter is then a simple projection onto the single affine constraint coming from the Lie derivative of that Lyapunov function, avoiding any online quadratic program. They state proofs for global feasibility under a drift-decay condition, exponential stability when the shield is exact, almost-sure convergence of the three-timescale updates to a KKT point, and a PAC bound on the certificate. On the 2-DOF manipulator the tracking error drops noticeably, and the 7-DOF Franka study shows the pipeline scales without obvious numerical blow-up. That combination of structured certificate, residual policy, and closed-form filter is not a direct copy of any single prior result cited in the abstract, and the empirical numbers on nominal versus aggressive friction give a concrete sense of where the learned certificate helps most. The main weakness is that the drift-decay condition required for the filter to stay feasible everywhere is never verified for the trained V_ψ or on the actual trajectories. The abstract treats it as given, yet nothing shows that the learned function satisfies it on a positive-measure set or that the PINN error stays below the threshold needed for the robust margin. Without that check the global safety guarantee is conditional rather than demonstrated. The KKT convergence and PAC bound are formally stated but inherit the same limitation. This work is aimed at researchers who already know the Slotine-Li controller and want a practical route to add learned components while keeping a safety filter that runs in closed form. It deserves a serious referee because the pipeline is concrete, the experiments are on real robot scales, and the claimed proofs are specific enough that a reviewer can check whether the missing verification can be supplied or whether the condition is in fact automatic. I would send it out rather than desk-reject.

Referee Report

2 major / 3 minor

Summary. The paper augments the Slotine-Li adaptive controller for Euler-Lagrange systems with three learned components: a Cholesky-parameterized quadratic Lyapunov function V_ψ, a residual Soft Actor-Critic policy adding bounded torque corrections, and a PINN estimating unmodeled dynamics. A closed-form safety filter is derived from the single affine constraint ḊV_ψ + α V_ψ ≤ 0 that projects policy outputs without an online QP. The authors claim proofs of global feasibility under a drift-decay condition on the control-degeneracy set, exponential stability under exact shielding with a robust margin depending on PINN error, almost-sure convergence of three-timescale updates to a KKT point, and a PAC generalization bound for the certificate. On 2-DOF and 7-DOF manipulators, tracking error drops 41% (nominal friction) and 24% (aggressive friction) at the training centroid, with a scalability study on the Franka Emika Panda.

Significance. If the proofs hold and the unverified conditions are satisfied, the work offers a practical route to combine adaptive control with learned certificates and closed-form shielding, avoiding online optimization while retaining formal safety margins. The structured Cholesky parameterization, three-timescale KKT convergence, and empirical gains on manipulators with friction/payload variation are strengths; the avoidance of QP solvers aids real-time robotics deployment. The PAC bound and robust extension could support generalization claims if the PINN error is quantified.

major comments (2)

[Abstract and feasibility proof] Abstract and feasibility theorem: global feasibility of the closed-form filter is stated to follow from a drift-decay condition on the control-degeneracy set when the learned V_ψ (Cholesky) and PINN dynamics are substituted into the Lie derivatives. No argument or post-training verification is provided that this condition holds on a positive-measure set for the trained networks, nor is it checked on the 2-DOF or 7-DOF trajectories. Violation would make the single-affine projection infeasible, directly undermining the claimed global safety guarantee.
[Robust stability section] Robust stability extension: the exponential stability margin under approximate shielding is stated to depend on the PINN approximation error remaining below an unspecified threshold. Without an a-priori bound, a-posteriori error metric, or sensitivity analysis on the 2-DOF/7-DOF experiments, the robust claim remains conditional on an unquantified assumption.

minor comments (3)

[Notation and assumptions] Clarify notation for the drift-decay condition and its dependence on the learned V_ψ parameters.
[Experiments] Report quantitative PINN residual errors and their relation to the stability margin in the experimental tables.
[Abstract] The abstract is dense; consider moving some proof sketches or condition statements to a dedicated assumptions subsection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate the suggested verifications.

read point-by-point responses

Referee: [Abstract and feasibility proof] Abstract and feasibility theorem: global feasibility of the closed-form filter is stated to follow from a drift-decay condition on the control-degeneracy set when the learned V_ψ (Cholesky) and PINN dynamics are substituted into the Lie derivatives. No argument or post-training verification is provided that this condition holds on a positive-measure set for the trained networks, nor is it checked on the 2-DOF or 7-DOF trajectories. Violation would make the single-affine projection infeasible, directly undermining the claimed global safety guarantee.

Authors: The global feasibility result is conditional on the drift-decay condition holding after substitution of the learned V_ψ and PINN. The original manuscript stated the theoretical condition but omitted post-training verification. In revision we will add explicit checks: after training we will evaluate the Lie derivatives along the 2-DOF and 7-DOF trajectories, confirm that the drift-decay inequality holds on a positive-measure subset of the control-degeneracy set, and report the fraction of time steps satisfying the condition. This verification will be placed in the feasibility section and referenced from the abstract. revision: yes
Referee: [Robust stability section] Robust stability extension: the exponential stability margin under approximate shielding is stated to depend on the PINN approximation error remaining below an unspecified threshold. Without an a-priori bound, a-posteriori error metric, or sensitivity analysis on the 2-DOF/7-DOF experiments, the robust claim remains conditional on an unquantified assumption.

Authors: The referee is correct that the robust margin is stated in terms of an unspecified threshold on PINN error. We will revise the robust stability section to compute an a-posteriori L^∞ bound on the PINN residual over the experimental trajectories, derive the resulting explicit stability margin, and include a sensitivity plot showing how tracking error and stability degrade as the PINN error is artificially increased up to and beyond the observed value. This will make the robust claim quantitative for the reported experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on explicit assumptions and standard theory

full rationale

The derivation begins with a Cholesky-parametrized V_ψ that enforces positive-definiteness by construction (standard and independent of data). The closed-form filter is obtained directly from the single affine constraint on Lie derivatives, without reduction to fitted outputs. Global feasibility is stated under the drift-decay condition (an assumption, not derived from the learned networks). Exponential stability under exact shielding includes an explicit robust margin for PINN error (conditional, not tautological). Three-timescale convergence to a KKT point and the PAC bound follow from stochastic approximation and statistical learning results, not from re-using the fitted V_ψ or policy as inputs. No step equates a theorem to a training objective or renames a fit as a prediction. Experiments report empirical gains separately from the theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard Lyapunov stability theory, the existence of a drift-decay condition, boundedness of approximation errors, and the ability of neural networks to represent the required functions; several neural-network weight vectors are free parameters fitted during training.

free parameters (1)

neural network weights for V_ψ, policy, and PINN
All three learned components contain trainable parameters whose values are determined by optimization on training trajectories.

axioms (2)

standard math Positive-definiteness of V_ψ follows from Cholesky parameterization
Invoked to guarantee the Lyapunov function is a valid certificate without additional constraints.
domain assumption Drift-decay condition on the control-degeneracy set
Required for global feasibility of the closed-form safety filter.

pith-pipeline@v0.9.0 · 5573 in / 1541 out tokens · 32140 ms · 2026-05-11T00:57:57.059359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

[1]

Achiam, D

J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. InProc. 34th Int. Conf. on Machine Learning (ICML), vol. 70, pp. 22–31, 2017

work page 2017
[2]

Altman.Constrained Markov Decision Processes

E. Altman.Constrained Markov Decision Processes. Chapman and Hall/CRC, Boca Raton, FL, 1999

work page 1999
[3]

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs forsafety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2017

work page 2017
[4]

B. Amos, L. Xu, and J. Z. Kolter. Input convex neural networks. InProc. 34th Int. Conf. on Machine Learning (ICML), vol. 70, pp. 146–155, 2017. Learned Lyapunov Shielding for Adaptive Control33

work page 2017
[5]

E. A. Antonelo, E. Camponogara, L. O. Seman, J. P. Jordanou, E. R. de Souza, and J. F. Hübner. Physics-informed neural nets for control of dynamical systems.Neurocomputing, 579:127419, 2024

work page 2024
[6]

V. S. Borkar.Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge, UK, 2008

work page 2008
[7]

Chang, N

Y.-C. Chang, N. Roohi, and S. Gao. Neural Lyapunov control. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 3245–3254, 2019

work page 2019
[8]

Cheng, G

R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. InProc. AAAI Conf. on Artificial Intelligence, vol. 33, pp. 3387–3395, 2019

work page 2019
[9]

Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh. A Lyapunov-based approach to safe reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 31, pp. 8092–8101, 2018

work page 2018
[10]

Dawson, Z

C. Dawson, Z. Qin, S. Gao, and C. Fan. Safe nonlinear control using robust neural Lyapunov-barrier functions. InProc. Conf. on Robot Learning (CoRL), pp. 1724–1735, 2022

work page 2022
[11]

Dawson, S

C. Dawson, S. Gao, and C. Fan. Learning safe, generalizable perception-based hybrid control with certificates.IEEE Robotics and Automation Letters, 7(2):3140–3147, 2022

work page 2022
[12]

Y. Emam, P. Glotfelter, S. Wilson, G. Notomista, and M. Egerstedt. Safe reinforcement learning using robust control barrier functions.IEEE Robotics and Automation Letters, 7(4):11201–11208, 2022

work page 2022
[13]

Fareh, T

R. Fareh, T. Siddique, K. Choutri, and D. V. Dylov. Physics-informed reward shaped reinforcement learning control of a robot manipulator.Alexandria Engineering Journal, in press, 2025. DOI: 10.1016/j.aej.2025.04.027

work page doi:10.1016/j.aej.2025.04.027 2025
[14]

Greydanus, M

S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 15353–15363, 2019

work page 2019
[15]

H. K. Khalil.Nonlinear Systems, 3rd ed. Prentice Hall, Upper Saddle River, NJ, 2002

work page 2002
[16]

V. R. Konda and J. N. Tsitsiklis. On actor-critic algorithms.SIAM Journal on Control and Optimization, 42(4):1143–1166, 2003

work page 2003
[17]

J. Liu, P. Borja, and C. Della Santina. Physics-informed neural networks to model and control robots: a theoretical and experimental investigation.Advanced Intelligent Sys- tems, 6(5):2300385, 2024

work page 2024
[18]

Lutter, C

M. Lutter, C. Ritter, and J. Peters. Deep Lagrangian networks: using physics as model prior for deep learning. InProc. 7th Int. Conf. on Learning Representations (ICLR), 2019. Learned Lyapunov Shielding for Adaptive Control34

work page 2019
[19]

Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation

P. Mestres, K. Long, N. Atanasov, and J. Cortés. Explicit control barrier function-based safety filters and their resource-aware computation. arXiv preprint arXiv:2512.10118, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Nicodemus, J

J. Nicodemus, J. Kneifl, J. Fehr, and B. Unger. Physics-informed neural networks-based model predictive control for multi-link manipulators.IFAC-PapersOnLine, 55(20):331– 336, 2022

work page 2022
[21]

Ortega and M

R. Ortega and M. W. Spong. Adaptive motion control of rigid robots: a tutorial.Auto- matica, 25(6):877–888, 1989

work page 1989
[22]

Paternain, L

S. Paternain, L. F. O. Chamon, M. Calvo-Fullana, and A. Ribeiro. Constrained rein- forcement learning has zero duality gap. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 7555–7565, 2019

work page 2019
[23]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

work page 2019
[24]

A. Ray,J. Achiam,andD. Amodei. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 2019

work page internal anchor Pith review arXiv 1910
[25]

S. M. Richards, F. Berkenkamp, and A. Krause. The Lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. InProc. Conf. on Robot Learning (CoRL), pp. 466–476, 2018

work page 2018
[26]

J.-J. E. Slotine and W. Li. On the adaptive control of robot manipulators.International Journal of Robotics Research, 6(3):49–59, 1987

work page 1987
[27]

M. W. Spong, S. Hutchinson, and M. Vidyasagar.Robot Modeling and Control, 2nd ed. Wiley, Hoboken, NJ, 2020

work page 2020
[28]

Stooke, J

A. Stooke, J. Achiam, and P. Abbeel. Responsive safety in reinforcement learning by PID Lagrangian methods. InProc. 37th Int. Conf. on Machine Learning (ICML), pp. 9133–9143, 2020

work page 2020
[29]

Tessler, D

C. Tessler, D. J. Mankowitz, and S. Mannor. Reward constrained policy optimization. InProc. 7th Int. Conf. on Learning Representations (ICLR), 2019

work page 2019
[30]

Todorov, T

E. Todorov, T. Erez, and Y. Tassa. MuJoCo: A physics engine for model-based control. InProc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),pp. 5026–5033, 2012

work page 2012
[31]

M. J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cam- bridge University Press, Cambridge, UK, 2019

work page 2019
[32]

J. Wang, Y. Liu, and J. Luo. PINN-based predictive control combined with unknown payload identification for robots with prismatic quasi-direct-drives.IEEE Robotics and Automation Letters, 2025. DOI: 10.1109/LRA.2025.3589127. Learned Lyapunov Shielding for Adaptive Control35

work page doi:10.1109/lra.2025.3589127 2025
[33]

J. Wu, H. Dai, L. Yang, and R. Tedrake. Lyapunov-stable neural control for state and output feedback: a novel formulation. arXiv preprint arXiv:2404.07956, 2024

work page arXiv 2024
[34]

R. Zhou, T. Quartz, H. De Sterck, and J. Liu. Neural Lyapunov control of unknown non- linear systems with stability guarantees. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 29113–29125, 2022

work page 2022