Learned Lyapunov Shielding for Adaptive Control
Pith reviewed 2026-05-11 00:57 UTC · model grok-4.3
The pith
A closed-form safety filter from a learned Lyapunov function lets adaptive controllers safely incorporate neural policies and unmodeled dynamics estimates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By parameterizing a quadratic Lyapunov function with Cholesky factors and combining it with a residual reinforcement-learning policy and a physics-informed network, the authors obtain a closed-form filter that enforces the affine decay constraint without an online solver. They prove global feasibility whenever a drift-decay condition holds, exponential stability under exact shielding with a robustness margin set by the network error, almost-sure convergence of the three-timescale learning to a KKT point, and a PAC bound on the certificate. On a 2-DOF manipulator the approach reduces tracking error by 41% with nominal friction and 24% with aggressive friction, and the method scales cleanly to
What carries the argument
The single affine constraint on the time derivative of the Cholesky-parameterized Lyapunov function, which yields an explicit projection that maps any policy torque onto the safe set.
If this is right
- Global feasibility of the filter holds under the stated drift-decay condition on the control-degeneracy set.
- Exponential stability is guaranteed under exact shielding, with a robust extension whose margin depends on the PINN approximation error remaining small.
- The three-timescale policy-certificate-multiplier updates converge almost surely to a KKT point.
- A PAC generalization bound holds for the certificate over compact sets.
- Tracking error on a 2-DOF manipulator drops 41% under nominal friction and 24% under aggressive friction when the learned certificate is active.
Where Pith is reading between the lines
- The warm-start pathology noted in the 7-DOF study suggests that practical deployment may require periodic certificate re-initialization to avoid transient performance loss.
- The same closed-form filter structure could be paired with other adaptive baselines beyond the Slotine-Li controller, provided the drift-decay condition can be verified.
- Performance gains over exact model-based control appear only inside the training distribution; outside it the learned components are not guaranteed to help.
Load-bearing premise
The drift-decay condition on the control-degeneracy set must hold for the closed-form filter to remain globally feasible.
What would settle it
A trajectory on which the filter projection becomes infeasible even though the drift-decay condition is satisfied, or a case of instability when the PINN approximation error lies below the derived stability margin.
Figures
read the original abstract
We augment the Slotine--Li adaptive controller for Euler--Lagrange systems with three learned components: a structured-quadratic Lyapunov function \(V_\psi\) whose positive-definiteness follows from a Cholesky parameterization, a residual Soft Actor--Critic policy that adds bounded torque corrections to the analytic baseline, and a physics-informed neural network that estimates unmodeled dynamics. A closed-form safety filter, derived from the single affine constraint \(\dot V_\psi + \alpha V_\psi \le 0\), projects every policy output onto the safe set without requiring an online QP solver. We prove: global feasibility of the filter under a drift-decay condition on the control-degeneracy set; exponential stability under exact shielding, with a robust extension whose margin depends on the PINN approximation error; almost-sure convergence of the three-timescale policy--certificate--multiplier updates to a KKT point; and a PAC generalization bound for the certificate over compacts. On a 2-DOF manipulator with nonlinear friction and variable payload, the learned certificate accounts for most of the empirical gain: tracking error drops by 41\% on nominal friction and 24\% on aggressive friction at the centroid of the training distribution. A 7-DOF scalability study on a Franka Emika Panda confirms clean convergence of the full pipeline at industrial scale, identifies the conditions under which gains over exact model-based baselines should and should not be expected, and documents a warm-start pathology of the learned certificate that has practical implications for deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper augments the Slotine-Li adaptive controller for Euler-Lagrange systems with three learned components: a Cholesky-parameterized quadratic Lyapunov function V_ψ, a residual Soft Actor-Critic policy adding bounded torque corrections, and a PINN estimating unmodeled dynamics. A closed-form safety filter is derived from the single affine constraint ḊV_ψ + α V_ψ ≤ 0 that projects policy outputs without an online QP. The authors claim proofs of global feasibility under a drift-decay condition on the control-degeneracy set, exponential stability under exact shielding with a robust margin depending on PINN error, almost-sure convergence of three-timescale updates to a KKT point, and a PAC generalization bound for the certificate. On 2-DOF and 7-DOF manipulators, tracking error drops 41% (nominal friction) and 24% (aggressive friction) at the training centroid, with a scalability study on the Franka Emika Panda.
Significance. If the proofs hold and the unverified conditions are satisfied, the work offers a practical route to combine adaptive control with learned certificates and closed-form shielding, avoiding online optimization while retaining formal safety margins. The structured Cholesky parameterization, three-timescale KKT convergence, and empirical gains on manipulators with friction/payload variation are strengths; the avoidance of QP solvers aids real-time robotics deployment. The PAC bound and robust extension could support generalization claims if the PINN error is quantified.
major comments (2)
- [Abstract and feasibility proof] Abstract and feasibility theorem: global feasibility of the closed-form filter is stated to follow from a drift-decay condition on the control-degeneracy set when the learned V_ψ (Cholesky) and PINN dynamics are substituted into the Lie derivatives. No argument or post-training verification is provided that this condition holds on a positive-measure set for the trained networks, nor is it checked on the 2-DOF or 7-DOF trajectories. Violation would make the single-affine projection infeasible, directly undermining the claimed global safety guarantee.
- [Robust stability section] Robust stability extension: the exponential stability margin under approximate shielding is stated to depend on the PINN approximation error remaining below an unspecified threshold. Without an a-priori bound, a-posteriori error metric, or sensitivity analysis on the 2-DOF/7-DOF experiments, the robust claim remains conditional on an unquantified assumption.
minor comments (3)
- [Notation and assumptions] Clarify notation for the drift-decay condition and its dependence on the learned V_ψ parameters.
- [Experiments] Report quantitative PINN residual errors and their relation to the stability margin in the experimental tables.
- [Abstract] The abstract is dense; consider moving some proof sketches or condition statements to a dedicated assumptions subsection.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate the suggested verifications.
read point-by-point responses
-
Referee: [Abstract and feasibility proof] Abstract and feasibility theorem: global feasibility of the closed-form filter is stated to follow from a drift-decay condition on the control-degeneracy set when the learned V_ψ (Cholesky) and PINN dynamics are substituted into the Lie derivatives. No argument or post-training verification is provided that this condition holds on a positive-measure set for the trained networks, nor is it checked on the 2-DOF or 7-DOF trajectories. Violation would make the single-affine projection infeasible, directly undermining the claimed global safety guarantee.
Authors: The global feasibility result is conditional on the drift-decay condition holding after substitution of the learned V_ψ and PINN. The original manuscript stated the theoretical condition but omitted post-training verification. In revision we will add explicit checks: after training we will evaluate the Lie derivatives along the 2-DOF and 7-DOF trajectories, confirm that the drift-decay inequality holds on a positive-measure subset of the control-degeneracy set, and report the fraction of time steps satisfying the condition. This verification will be placed in the feasibility section and referenced from the abstract. revision: yes
-
Referee: [Robust stability section] Robust stability extension: the exponential stability margin under approximate shielding is stated to depend on the PINN approximation error remaining below an unspecified threshold. Without an a-priori bound, a-posteriori error metric, or sensitivity analysis on the 2-DOF/7-DOF experiments, the robust claim remains conditional on an unquantified assumption.
Authors: The referee is correct that the robust margin is stated in terms of an unspecified threshold on PINN error. We will revise the robust stability section to compute an a-posteriori L^∞ bound on the PINN residual over the experimental trajectories, derive the resulting explicit stability margin, and include a sensitivity plot showing how tracking error and stability degrade as the PINN error is artificially increased up to and beyond the observed value. This will make the robust claim quantitative for the reported experiments. revision: yes
Circularity Check
No significant circularity; claims rest on explicit assumptions and standard theory
full rationale
The derivation begins with a Cholesky-parametrized V_ψ that enforces positive-definiteness by construction (standard and independent of data). The closed-form filter is obtained directly from the single affine constraint on Lie derivatives, without reduction to fitted outputs. Global feasibility is stated under the drift-decay condition (an assumption, not derived from the learned networks). Exponential stability under exact shielding includes an explicit robust margin for PINN error (conditional, not tautological). Three-timescale convergence to a KKT point and the PAC bound follow from stochastic approximation and statistical learning results, not from re-using the fitted V_ψ or policy as inputs. No step equates a theorem to a training objective or renames a fit as a prediction. Experiments report empirical gains separately from the theorems.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights for V_ψ, policy, and PINN
axioms (2)
- standard math Positive-definiteness of V_ψ follows from Cholesky parameterization
- domain assumption Drift-decay condition on the control-degeneracy set
Reference graph
Works this paper leans on
- [1]
-
[2]
Altman.Constrained Markov Decision Processes
E. Altman.Constrained Markov Decision Processes. Chapman and Hall/CRC, Boca Raton, FL, 1999
work page 1999
-
[3]
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs forsafety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2017
work page 2017
-
[4]
B. Amos, L. Xu, and J. Z. Kolter. Input convex neural networks. InProc. 34th Int. Conf. on Machine Learning (ICML), vol. 70, pp. 146–155, 2017. Learned Lyapunov Shielding for Adaptive Control33
work page 2017
-
[5]
E. A. Antonelo, E. Camponogara, L. O. Seman, J. P. Jordanou, E. R. de Souza, and J. F. Hübner. Physics-informed neural nets for control of dynamical systems.Neurocomputing, 579:127419, 2024
work page 2024
-
[6]
V. S. Borkar.Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge, UK, 2008
work page 2008
- [7]
- [8]
-
[9]
Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh. A Lyapunov-based approach to safe reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 31, pp. 8092–8101, 2018
work page 2018
- [10]
- [11]
-
[12]
Y. Emam, P. Glotfelter, S. Wilson, G. Notomista, and M. Egerstedt. Safe reinforcement learning using robust control barrier functions.IEEE Robotics and Automation Letters, 7(4):11201–11208, 2022
work page 2022
-
[13]
R. Fareh, T. Siddique, K. Choutri, and D. V. Dylov. Physics-informed reward shaped reinforcement learning control of a robot manipulator.Alexandria Engineering Journal, in press, 2025. DOI: 10.1016/j.aej.2025.04.027
-
[14]
S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 15353–15363, 2019
work page 2019
-
[15]
H. K. Khalil.Nonlinear Systems, 3rd ed. Prentice Hall, Upper Saddle River, NJ, 2002
work page 2002
-
[16]
V. R. Konda and J. N. Tsitsiklis. On actor-critic algorithms.SIAM Journal on Control and Optimization, 42(4):1143–1166, 2003
work page 2003
-
[17]
J. Liu, P. Borja, and C. Della Santina. Physics-informed neural networks to model and control robots: a theoretical and experimental investigation.Advanced Intelligent Sys- tems, 6(5):2300385, 2024
work page 2024
- [18]
-
[19]
Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation
P. Mestres, K. Long, N. Atanasov, and J. Cortés. Explicit control barrier function-based safety filters and their resource-aware computation. arXiv preprint arXiv:2512.10118, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
J. Nicodemus, J. Kneifl, J. Fehr, and B. Unger. Physics-informed neural networks-based model predictive control for multi-link manipulators.IFAC-PapersOnLine, 55(20):331– 336, 2022
work page 2022
-
[21]
R. Ortega and M. W. Spong. Adaptive motion control of rigid robots: a tutorial.Auto- matica, 25(6):877–888, 1989
work page 1989
-
[22]
S. Paternain, L. F. O. Chamon, M. Calvo-Fullana, and A. Ribeiro. Constrained rein- forcement learning has zero duality gap. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 7555–7565, 2019
work page 2019
- [23]
-
[24]
A. Ray,J. Achiam,andD. Amodei. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 2019
work page internal anchor Pith review arXiv 1910
-
[25]
S. M. Richards, F. Berkenkamp, and A. Krause. The Lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. InProc. Conf. on Robot Learning (CoRL), pp. 466–476, 2018
work page 2018
-
[26]
J.-J. E. Slotine and W. Li. On the adaptive control of robot manipulators.International Journal of Robotics Research, 6(3):49–59, 1987
work page 1987
-
[27]
M. W. Spong, S. Hutchinson, and M. Vidyasagar.Robot Modeling and Control, 2nd ed. Wiley, Hoboken, NJ, 2020
work page 2020
- [28]
-
[29]
C. Tessler, D. J. Mankowitz, and S. Mannor. Reward constrained policy optimization. InProc. 7th Int. Conf. on Learning Representations (ICLR), 2019
work page 2019
-
[30]
E. Todorov, T. Erez, and Y. Tassa. MuJoCo: A physics engine for model-based control. InProc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS),pp. 5026–5033, 2012
work page 2012
-
[31]
M. J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cam- bridge University Press, Cambridge, UK, 2019
work page 2019
-
[32]
J. Wang, Y. Liu, and J. Luo. PINN-based predictive control combined with unknown payload identification for robots with prismatic quasi-direct-drives.IEEE Robotics and Automation Letters, 2025. DOI: 10.1109/LRA.2025.3589127. Learned Lyapunov Shielding for Adaptive Control35
- [33]
-
[34]
R. Zhou, T. Quartz, H. De Sterck, and J. Liu. Neural Lyapunov control of unknown non- linear systems with stability guarantees. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 29113–29125, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.