pith. sign in

arxiv: 2604.09004 · v1 · submitted 2026-04-10 · 📡 eess.SY · cs.SY

Synthesizing Safety in Infinite-Horizon Optimal Control for Disturbed High-Relative-Degree Systems via Barrier-Regulating Auxiliary Variables

Pith reviewed 2026-05-10 17:34 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords optimal controlsafety-critical systemscontrol barrier functionsbarrier Lyapunov functionsH-infinity controlonline critic learninghigh relative degreedisturbance attenuation
0
0 comments X

The pith

Embedding a barrier-regulating auxiliary variable converts safety-constrained infinite-horizon optimal control into an unconstrained problem on an extended state space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to balance long-term performance with strict safety requirements in nonlinear systems that may face disturbances and high relative degree. It achieves this by folding a barrier-Lyapunov safeguarding action directly into the system dynamics through a new auxiliary variable, which removes the need for separate pointwise safety filters. The reformulation allows the use of H-infinity methods in which certain excitations are treated as bounded disturbances, and online critic learning approximates the solution while preserving safety. A sympathetic reader would care because existing filters can trap the system locally when safety actions conflict with nominal control, whereas this approach integrates safety into the core optimization from the start.

Core claim

The paper claims that embedding a BLF-based safeguarding action into the system dynamics and introducing a barrier-regulating auxiliary variable reformulates the original constrained problem as an unconstrained one on an extended state space. For high-relative-degree systems under disturbances, recursive high-order safe-set construction augmented with barrier compensation terms produces a high-order BLF, turning the task into an adversarial disturbance attenuation problem that is solved approximately by safe-exploration-enhanced online critic learning. An adaptive alignment-conditioned tangential excitation orthogonal to the safety direction is added to reduce local trapping and is handled

What carries the argument

The barrier-regulating auxiliary variable, which augments the state to embed safety constraints directly into the dynamics and enables the unconstrained H-infinity formulation.

If this is right

  • The original safety-constrained optimal control problem reduces to an unconstrained problem on the extended state space.
  • Local trapping is mitigated because the tangential excitation is incorporated as an admissible L2 disturbance in the H-infinity formulation.
  • High-relative-degree systems under disturbance remain safe through the high-order BLF constructed with recursive safe-set augmentation and compensation terms.
  • Safe-exploration-enhanced online critic learning yields a policy that is approximately optimal while respecting safety.
  • Pointwise myopic behavior of standard CBF safety filters is avoided by integrating the safeguard into the dynamics from the outset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same auxiliary-variable reformulation could be applied to other pointwise constraints such as actuator limits or performance envelopes.
  • The approach suggests a route to remove separate safety layers in control stacks for robots or vehicles that require both long-horizon planning and hard safety.
  • Convergence of the critic under real-world model mismatch remains an open question that could be tested by injecting parametric uncertainty into the simulation environment.
  • The method may extend naturally to stochastic or switched systems by treating the additional randomness as further admissible disturbances.

Load-bearing premise

The adaptive tangential excitations and high-order barrier compensation terms can be treated as admissible disturbances in the H-infinity problem without destroying safety guarantees or optimality of the learned policy.

What would settle it

A closed-loop simulation or experiment in which the controlled system enters the unsafe region despite the critic-learned policy, or in which the critic fails to converge while safety is violated, would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.09004 by Bo Yang, Qi Li, Wei Xiao, Xinping Guan, Zhanglin Shangguan.

Figure 1
Figure 1. Figure 1: Barrier-regulating variable-enhanced optimal control [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Effect of safeguarding gain without tangential excitation. Left: [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Adaptive tangential excitation under a weak safeguarding gain ( [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Critic weight evolution with Isaacs residual strip for the first-order [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Control decomposition. The nominal control [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Virtual control input v(t) for the multiplier dynamics. The auxiliary state λ evolves according to λ˙ = v [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Adaptive multiplier λ(t) and its state-dependent lower reference λref (x(t)) (implemented via a CBF lower-bound). The multiplier increases when approaching the constraint and relaxes afterward. which serves as an early-warning threshold that increases the safeguarding authority when the state approaches safety￾critical regions. In the present high-order simulation, the parameters are selected as λmin = 0.5… view at source ↗
Figure 10
Figure 10. Figure 10: Trajectory comparison under the same disturbance realization. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 13
Figure 13. Figure 13: (a) Evolution of the adaptive safeguarding multiplier [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: (a) Control signals under safeguarding. The total input is decomposed [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: Minefield navigation for the high-relative-degree plant under flow [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗
read the original abstract

Optimal stabilization of safety-critical nonlinear systems requires balancing long-term performance and strict safety constraints. Existing quadratic-programming-based control barrier function (CBF) safety filters are point-wise and may exhibit myopic behavior and local trapping when the safeguarding action conflicts with the nominal optimal control. This paper develops a safety-aware infinite-horizon optimal control framework by embedding a barrier-Lyapunov function (BLF)-based safeguarding action into the system dynamics and introducing a barrier-regulating auxiliary variable, thereby reformulating the original constrained problem as an unconstrained one on an extended state space. To mitigate local trapping, we introduce an adaptive alignment-conditioned tangential excitation orthogonal to the safety direction, with activation adaptively modulated by the degree of directional alignment between the nominal and safeguarding controllers, and incorporate it as an admissible $\mathcal{L}2$ disturbance in an $H\infty$ formulation. For high-relative-degree systems under disturbances, we further augment the recursive high-order safe-set construction with barrier compensation terms to obtain a high-order BLF and formulate an adversarial disturbance attenuation problem, which is approximately solved via safe-exploration-enhanced online critic learning. Simulations demonstrate reduced local trapping, improved safety--performance trade-offs, and safe operation under disturbances.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a framework for infinite-horizon optimal control of safety-critical nonlinear systems with high relative degree and disturbances. By embedding a barrier-Lyapunov function (BLF) safeguarding action into the dynamics via a barrier-regulating auxiliary variable, the constrained problem is reformulated as an unconstrained optimal control problem on an extended state space. An adaptive alignment-conditioned tangential excitation, orthogonal to the safety direction and modulated by alignment, is introduced and treated as an L2 disturbance in an H∞ adversarial attenuation problem. High-order BLF with barrier compensation terms is used for high-relative-degree systems, and the problem is solved approximately using safe-exploration-enhanced online critic learning. Simulations illustrate reduced local trapping and improved safety-performance trade-offs.

Significance. If the safety properties are preserved under the adaptive excitation and the critic learning converges to a safe policy, this work could advance the integration of safety constraints into long-horizon optimal control without relying on pointwise QP filters that may cause trapping. The reformulation to unconstrained problem and the handling of high-relative-degree via recursive construction are notable contributions. The paper would be strengthened by machine-checked proofs or explicit bounds on the disturbance class.

major comments (2)
  1. [H∞ formulation and disturbance treatment] The abstract claims that the adaptive alignment-conditioned tangential excitation is incorporated as an admissible L2 disturbance in the H∞ formulation. However, there is no explicit bound or analysis showing that the modulation by the degree of directional alignment ensures the excitation amplitude remains within the L2 gain assumed by the high-order BLF, which is necessary to guarantee that the barrier function stays positive and safety is not violated before critic convergence. This is load-bearing for the claim of strict safety under disturbances.
  2. [Online critic learning and high-order BLF construction] The assumption that the adaptive excitation and high-order compensation terms can be treated as admissible disturbances without destroying safety guarantees or optimality requires a concrete condition or proof that the critic-learning fixed point remains inside the safe set when excitation is active. The abstract supplies no derivations, stability proofs, or quantitative results to verify this.
minor comments (1)
  1. [Abstract] The abstract mentions 'simulations demonstrate' but provides no quantitative metrics or comparison baselines; consider adding specific performance improvements in the abstract if space allows.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comments point by point below, providing clarifications and indicating the revisions we will implement to strengthen the safety analysis.

read point-by-point responses
  1. Referee: [H∞ formulation and disturbance treatment] The abstract claims that the adaptive alignment-conditioned tangential excitation is incorporated as an admissible L2 disturbance in the H∞ formulation. However, there is no explicit bound or analysis showing that the modulation by the degree of directional alignment ensures the excitation amplitude remains within the L2 gain assumed by the high-order BLF, which is necessary to guarantee that the barrier function stays positive and safety is not violated before critic convergence. This is load-bearing for the claim of strict safety under disturbances.

    Authors: We acknowledge that the current version of the manuscript does not provide an explicit derivation of the L2 bound for the modulated excitation. The adaptive alignment-conditioned tangential excitation is designed to be orthogonal to the safety direction and its amplitude is scaled by a factor that approaches zero as the nominal control aligns with the safeguarding action, thereby limiting its impact on the barrier function. To address this, we will add a new lemma in the revised manuscript that derives an upper bound on the L2 norm of the excitation term as a function of the alignment measure and demonstrates that this bound is compatible with the disturbance attenuation level required by the high-order BLF. This will ensure the barrier remains positive during the transient phase before critic convergence. revision: yes

  2. Referee: [Online critic learning and high-order BLF construction] The assumption that the adaptive excitation and high-order compensation terms can be treated as admissible disturbances without destroying safety guarantees or optimality requires a concrete condition or proof that the critic-learning fixed point remains inside the safe set when excitation is active. The abstract supplies no derivations, stability proofs, or quantitative results to verify this.

    Authors: The manuscript outlines that the safe-exploration-enhanced online critic learning maintains trajectories within the safe set by construction of the high-order BLF with barrier compensation. However, we agree that a more rigorous statement on the fixed point of the critic iteration is needed. In the revision, we will include a theorem providing sufficient conditions on the critic approximation error and the decay rate of the excitation such that the learned policy satisfies the safety constraint. This will involve showing that the H∞ attenuation ensures the BLF derivative remains negative definite within the safe set, with quantitative bounds derived from the learning convergence analysis. revision: yes

Circularity Check

2 steps flagged

Reformulation to unconstrained problem is self-definitional via auxiliary variable; adaptive excitation treated as disturbance by construction

specific steps
  1. self definitional [Abstract]
    "by embedding a barrier-Lyapunov function (BLF)-based safeguarding action into the system dynamics and introducing a barrier-regulating auxiliary variable, thereby reformulating the original constrained problem as an unconstrained one on an extended state space"

    The auxiliary variable is defined precisely to extend the state space and absorb the safety constraint, so the resulting unconstrained formulation on the extended space holds by the definition of the variable itself rather than from any independent derivation or first-principles analysis of the original dynamics.

  2. self definitional [Abstract]
    "we introduce an adaptive alignment-conditioned tangential excitation orthogonal to the safety direction, with activation adaptively modulated by the degree of directional alignment between the nominal and safeguarding controllers, and incorporate it as an admissible L2 disturbance in an H∞ formulation"

    The excitation is constructed with adaptive modulation and then declared admissible for the H-infinity attenuation problem; the admissibility is asserted by construction without an exhibited bound independent of the modulation parameters, making the disturbance class membership tautological with the definition of the excitation term.

full rationale

The central claim reduces the constrained optimal control problem to an unconstrained one on an extended state by introducing a barrier-regulating auxiliary variable whose explicit purpose is to embed the BLF safeguarding action and remove constraints. This matches the self-definitional pattern exactly. The subsequent step of declaring the adaptive alignment-conditioned tangential excitation an admissible L2 disturbance for the H-infinity problem is likewise definitional rather than derived, as the paper states it is incorporated as such without an independent bound showing it remains within the assumed disturbance class. No self-citations or fitted predictions are quoted in the provided text, so the circularity is limited to these two construction steps. The remainder of the derivation (online critic learning, high-order BLF) is not shown to collapse to inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents exhaustive audit; the method rests on standard nonlinear control assumptions plus newly introduced constructs whose parameters are not specified.

free parameters (1)
  • alignment modulation gain
    Adaptive modulation of tangential excitation depends on directional alignment, implying at least one tunable scalar or function.
axioms (1)
  • domain assumption The nominal system is control-affine nonlinear with well-defined relative degree and bounded disturbances
    Required for recursive high-order safe-set construction and H-infinity formulation.
invented entities (1)
  • barrier-regulating auxiliary variable no independent evidence
    purpose: Extend state space to convert constrained safety problem into unconstrained optimal control
    New variable introduced to embed BLF safeguarding action directly into dynamics

pith-pipeline@v0.9.0 · 5530 in / 1550 out tokens · 137199 ms · 2026-05-10T17:34:20.633052+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    Control barrier function based quadratic programs for safety critical systems,

    A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016

  2. [2]

    Control barrier function-based quadratic programs introduce undesirable asymptotically stable equilibria,

    M. F. Reis, A. P. Aguiar, and P. Tabuada, “Control barrier function-based quadratic programs introduce undesirable asymptotically stable equilibria,”IEEE Control Systems Letters, vol. 5, no. 2, pp. 731–736, 2020

  3. [3]

    On the undesired equi- libria induced by control barrier function based quadratic programs,

    X. Tan and D. V . Dimarogonas, “On the undesired equi- libria induced by control barrier function based quadratic programs,”Automatica, vol. 159, p. 111359, 2024

  4. [4]

    Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem,

    K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem,”Automatica, vol. 46, no. 5, pp. 878–888, 2010

  5. [5]

    Bhasin, R

    S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, “A novel Fig. 16. Minefield navigation for the high-relative-degree plant under flow- field disturbance. The three trajectories start from(−9.0,1.0,0.8,−2.0), (6.2,6.2,−1.0,−1.0), and(4.5,−7.0,−1.0,1.0), respectively. actor–critic–identifier architecture for approxima...

  6. [6]

    Efficient model-based reinforcement learning for ap- proximate online optimal control,

    R. Kamalapurkar, J. A. Rosenfeld, and W. E. Dixon, “Efficient model-based reinforcement learning for ap- proximate online optimal control,”Automatica, vol. 74, pp. 247–258, 2016

  7. [7]

    Safe reinforcement learning for dynamical games,

    Y . Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,”Interna- tional Journal of Robust and Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, 2020

  8. [8]

    Approximate optimal control for safety-critical systems with control barrier functions,

    M. H. Cohen and C. Belta, “Approximate optimal control for safety-critical systems with control barrier functions,” in2020 59th IEEE conference on decision and control (CDC). IEEE, 2020, pp. 2062–2067

  9. [9]

    Safe reinforcement learning and adaptive optimal control with applications to obstacle avoidance problem,

    K. Wang, C. Mu, Z. Ni, and D. Liu, “Safe reinforcement learning and adaptive optimal control with applications to obstacle avoidance problem,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 3, pp. 4599–4612, 2023

  10. [10]

    A safety aware model-based re- inforcement learning framework for systems with uncer- tainties,

    S. N. Mahmud, K. Hareland, S. A. Nivison, Z. I. Bell, and R. Kamalapurkar, “A safety aware model-based re- inforcement learning framework for systems with uncer- tainties,” in2021 American Control Conference (ACC). IEEE, 2021, pp. 1979–1984

  11. [11]

    Safe exploration in model- based reinforcement learning using control barrier func- tions,

    M. H. Cohen and C. Belta, “Safe exploration in model- based reinforcement learning using control barrier func- tions,”Automatica, vol. 147, p. 110684, 2023

  12. [12]

    End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

    R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 3387–3395

  13. [13]

    Safe and effi- cient reinforcement learning using disturbance-observer- based control barrier functions,

    Y . Cheng, P. Zhao, and N. Hovakimyan, “Safe and effi- cient reinforcement learning using disturbance-observer- based control barrier functions,” inLearning for Dynam- 15 ics and Control Conference. PMLR, 2023, pp. 104–115

  14. [14]

    Inverse optimal safety filters,

    M. Krstic, “Inverse optimal safety filters,”IEEE Trans- actions on Automatic Control, vol. 69, no. 1, pp. 16–31, 2023

  15. [15]

    Lagrangian-based online safe reinforcement learning for state-constrained systems,

    S. Bandyopadhyay and S. Bhasin, “Lagrangian-based online safe reinforcement learning for state-constrained systems,”arXiv preprint arXiv:2305.12967, 2023

  16. [16]

    Learning-enhanced safeguard control for high-relative- degree systems: Robust optimization under disturbances and faults,

    X. Wang, H. Zhang, S. Wang, W. Xiao, and M. Guay, “Learning-enhanced safeguard control for high-relative- degree systems: Robust optimization under disturbances and faults,”arXiv preprint arXiv:2501.15373, 2025

  17. [17]

    Learning dis- tributed safe multi-agent navigation via infinite-horizon optimal graph control,

    F. Wang, X. Shu, L. He, and L. Zhao, “Learning dis- tributed safe multi-agent navigation via infinite-horizon optimal graph control,”arXiv preprint arXiv:2506.22117, 2025

  18. [18]

    Control barrier functions with circulation inequalities,

    V . M. Gonc ¸alves, P. Krishnamurthy, A. Tzes, and F. Khorrami, “Control barrier functions with circulation inequalities,”IEEE Transactions on Control Systems Technology, vol. 32, no. 4, pp. 1426–1441, 2024

  19. [19]

    High-order control barrier func- tions,

    W. Xiao and C. Belta, “High-order control barrier func- tions,”IEEE Transactions on Automatic Control, vol. 67, no. 7, pp. 3655–3662, 2021

  20. [20]

    F. L. Lewis, D. Vrabie, and V . L. Syrmos,Optimal control. John Wiley & Sons, 2012

  21. [21]

    D. E. Kirk,Optimal control theory: an introduction. Courier Corporation, 2004

  22. [22]

    Hjb based optimal safe control using control barrier func- tions,

    H. Almubarak, E. A. Theodorou, and N. Sadegh, “Hjb based optimal safe control using control barrier func- tions,” in2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6829–6834

  23. [23]

    Adaptive con- trol barrier functions,

    W. Xiao, C. Belta, and C. G. Cassandras, “Adaptive con- trol barrier functions,”IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2267–2281, 2021

  24. [24]

    Safety- aware pursuit-evasion games in unknown environments using gaussian processes and finite-time convergent re- inforcement learning,

    N.-M. T. Kokolakis and K. G. Vamvoudakis, “Safety- aware pursuit-evasion games in unknown environments using gaussian processes and finite-time convergent re- inforcement learning,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 35, no. 3, pp. 3130– 3143, 2022

  25. [25]

    H. K. Khalil and J. W. Grizzle,Nonlinear systems. Prentice hall Upper Saddle River, NJ, 2002, vol. 3