pith. sign in

arxiv: 2506.22117 · v2 · pith:OHJ7PVQNnew · submitted 2025-06-27 · 📡 eess.SY · cs.SY

Safe Multi-Agent Navigation via Constrained HJB-Informed Learning

Pith reviewed 2026-05-19 08:10 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords multi-agent navigationcontrol barrier functionsHamilton-Jacobi-Bellman equationgraph neural networksLagrange multiplierscollision avoidancedrone swarmssafe learning
0
0 comments X

The pith

HJB-GNN derives graph-dependent Lagrange multipliers from the constrained HJB equation to balance safety and goal-reaching for multi-agent navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing multi-agent navigation methods apply safety filters after planning or use heuristic penalties, which often produce overly cautious motion when agents and obstacles interact densely. The paper introduces HJB-GNN, a framework that jointly learns a graph neural network for the control barrier function, a distributed navigation policy, and a value function. It exploits the analytical solution of the constrained Hamilton-Jacobi-Bellman equation to obtain multipliers that depend on the current interaction graph. These multipliers adaptively trade off collision avoidance against goal progress, supporting centralized training yet distributed execution. Simulations and Crazyflie drone experiments show improved safety and task completion in cluttered, previously unseen environments.

Core claim

By exploiting the analytical solution of the constrained HJB equation, the proposed method derives graph-dependent Lagrange multipliers that adaptively balance collision-avoidance and goal-reaching across diverse multi-agent navigation scenarios. The framework learns a GNN-parameterized control barrier function for explicit safety enforcement, a distributed GNN-based navigation policy, and a value function that induces goal-reaching behavior, while supporting centralized training with distributed deployment.

What carries the argument

Graph-dependent Lagrange multipliers obtained from the analytical solution of the constrained Hamilton-Jacobi-Bellman equation, which balance the safety constraints of the GNN-parameterized control barrier function against the goal-reaching objective of the learned policy.

If this is right

  • The learned multipliers enable explicit safety enforcement without relying on separate post-hoc filtering steps.
  • Centralized training with distributed deployment allows the same model to scale across teams of different sizes.
  • Performance holds in dense, previously unseen environments rather than only in training scenarios.
  • Real-world drone swarm tests confirm both safety and goal-reaching gains over prior filtering or penalty approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same analytical-HJB multiplier technique could apply to other networked control tasks where agents share a communication graph.
  • Testing the method with moving obstacles or communication dropouts would expose whether the graph-dependent adaptation remains effective.
  • The reliance on GNN parameterization suggests the approach may generalize to other graph-structured multi-agent problems beyond navigation.

Load-bearing premise

An analytical solution to the constrained HJB equation exists and can be directly exploited to derive the graph-dependent multipliers when the control barrier function, policy, and value function are parameterized by graph neural networks.

What would settle it

An experiment or simulation in which the derived multipliers produce either frequent collisions or systematic failure to reach goals in dense agent-obstacle interactions, despite the HJB parameterization being satisfied, would disprove the adaptive balancing claim.

Figures

Figures reproduced from arXiv: 2506.22117 by Fenglan Wang, Lei He, Lin Zhao, Xinguo Shu.

Figure 1
Figure 1. Figure 1: Comparison between QP-GCBF+ of [32] in (a) and our HJB-GNN in (b). An agent (a gradient-colored circle) aims to reach the goal (blue square) while avoiding static obstacles (gray rectangles) within a 4m × 4m region. The black arrow indicates the velocity direction of agent. The QP-GCBF+ has deadlock near obstacles, and our HJB-GNN approach achieves goal-reaching without collision. controller in [PITH_FULL… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our HJB-GNN approach on the unmanned [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Safe-reaching rates ((a), (c), (e)) and safety rates ((b), (d), (f)) for 2D unmanned surface vessel under fixed area width. (a)-(b): [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Safe-reaching rates and safety rates for increasing numbers [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Safe-reaching rates and safety rates for increasing numbers [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Safe-reaching rates and safety rates for increasing numbers [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Hardware experiment results: Snapshots of Crazyflie drones [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Hardware experiment results: Minimum distances between [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Simulation trajectories of all Crazyflie drones correspond [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Multi-agent navigation in unknown and cluttered environments has broad applications, yet remains fundamentally challenging. In particular, dense agent-agent and agent-obstacle reactive interactions can exacerbate the inherent competition between collision-avoidance constraints and goal-reaching objectives. Most existing approaches mitigate this by applying per-step safety filtering on top of a predefined goal-reaching controller or by designing heuristic loss functions that penalizes safety constraints violation gradient. While effective in sparse environments, these methods still suffer from overly-conservative behaviors when interactions become dense. To overcome these limitations, we propose HJB-GNN, a Hamilton-Jacobi-Bellman (HJB)-based learning framework that jointly learns a graph neural network (GNN)-parameterized control barrier function for explicit safety enforcement, a distributed GNN-based navigation policy, and a value function that induces goal-reaching behavior. By exploiting the analytical solution of the constrained HJB equation, the proposed method derives graph-dependent Lagrange multipliers that adaptively balance collision-avoidance and goal-reaching across diverse multi-agent navigation scenarios. Moreover, HJB-GNN supports centralized training with distributed deployment. Extensive simulations and real-world experiments with Crazyflie drone swarms demonstrate its superior safety and goal-reaching performance, as well as strong scalability and generalizability to large-scale teams operating in previously unseen, dense environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes HJB-GNN, a framework for safe multi-agent navigation in cluttered environments. It jointly learns a GNN-parameterized control barrier function for safety, a distributed GNN navigation policy, and a value function for goal-reaching. By exploiting an analytical solution to the constrained HJB equation, it derives graph-dependent Lagrange multipliers to adaptively balance collision avoidance and goal reaching. The method supports centralized training with distributed deployment and is validated in simulations and Crazyflie drone experiments showing improved safety and scalability in dense, unseen settings.

Significance. If the analytical exploitation of the constrained HJB holds under GNN parameterization, the work provides a principled mechanism for trading off safety and performance in multi-agent systems, potentially reducing conservatism compared to per-step filtering or heuristic penalties while enabling scalable distributed execution.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Method): The central claim relies on exploiting an analytical solution of the constrained HJB to obtain graph-dependent Lagrange multipliers. However, when the CBF, policy, and value function are all replaced by GNN parameterizations, the resulting PDE is defined over a high-dimensional graph-structured state space with message-passing nonlinearities. Standard closed-form solutions for constrained HJB (typically requiring linear dynamics or specific quadratic forms) do not automatically transfer; the manuscript must either prove preservation of the analytical form or demonstrate that the derived multipliers remain valid despite the approximation.
  2. [§4] §4 (Experiments): The reported superior safety and goal-reaching performance in dense scenarios is promising, but without ablation or verification showing that the Lagrange multipliers adaptively balance the objectives as predicted by the analytical derivation (rather than emerging from the learned GNN components alone), it is unclear whether the theoretical advantage is realized or if the method reduces to a data-driven heuristic.
minor comments (2)
  1. [Notation] Clarify in the notation section how the graph-dependent multipliers are computed from the GNN outputs and whether they remain independent of the fitted value function.
  2. [§3] Add a brief discussion of the conditions under which the analytical HJB solution is assumed to exist for the GNN-parameterized case.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to provide additional clarifications and empirical verification.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Method): The central claim relies on exploiting an analytical solution of the constrained HJB to obtain graph-dependent Lagrange multipliers. However, when the CBF, policy, and value function are all replaced by GNN parameterizations, the resulting PDE is defined over a high-dimensional graph-structured state space with message-passing nonlinearities. Standard closed-form solutions for constrained HJB (typically requiring linear dynamics or specific quadratic forms) do not automatically transfer; the manuscript must either prove preservation of the analytical form or demonstrate that the derived multipliers remain valid despite the approximation.

    Authors: We thank the referee for this insightful comment on the validity of the analytical derivation under GNN parameterization. The constrained HJB is solved analytically to obtain the explicit form of the graph-dependent Lagrange multiplier from the optimality condition, which depends on the gradients of the value function and CBF. The GNNs approximate these functions while the training objective enforces that the HJB residual is driven to zero; the multiplier is then evaluated using the same closed-form expression at each step. This structural derivation does not require the functions to be exactly linear or quadratic, only that they satisfy the HJB condition at optimality. We have added a remark and expanded derivation in the revised Section 3 that clarifies why the multiplier formula remains valid for GNN approximations that approximately satisfy the HJB equation. revision: yes

  2. Referee: [§4] §4 (Experiments): The reported superior safety and goal-reaching performance in dense scenarios is promising, but without ablation or verification showing that the Lagrange multipliers adaptively balance the objectives as predicted by the analytical derivation (rather than emerging from the learned GNN components alone), it is unclear whether the theoretical advantage is realized or if the method reduces to a data-driven heuristic.

    Authors: We agree that explicit verification of the adaptive role of the Lagrange multipliers would strengthen the connection between theory and experiments. In the revised manuscript we have added (i) an ablation study comparing the full method against a variant with fixed (non-adaptive) multipliers and (ii) plots showing the evolution of the learned multipliers as a function of local agent density during navigation. These results demonstrate that the multipliers increase with interaction density in a manner consistent with the analytical HJB prediction and that removing adaptivity degrades both safety and goal-reaching performance. The new material appears in the updated Section 4. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses claimed analytical HJB solution as independent step

full rationale

The paper jointly parameterizes CBF, policy, and value function via GNNs, then states that it exploits an analytical solution of the constrained HJB equation to derive graph-dependent Lagrange multipliers that balance collision avoidance and goal reaching. This step is presented as following from the analytical form rather than redefining the multipliers in terms of the fitted GNN outputs or treating a fitted quantity as a prediction. No equation or passage in the provided text reduces the multipliers to a self-definition, a direct fit to the value function, or a self-citation chain that bears the central load. The derivation therefore remains self-contained against the external benchmark of the constrained HJB equation and does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that constrained HJB equations admit exploitable analytical solutions when functions are GNN-parameterized, plus the modeling choice that GNNs can represent the required barrier, policy, and value functions for multi-agent interactions.

axioms (1)
  • domain assumption The constrained Hamilton-Jacobi-Bellman equation admits an analytical solution that yields graph-dependent Lagrange multipliers when control barrier functions, policies, and value functions are GNN-parameterized.
    Invoked in the abstract to derive adaptive balancing of collision avoidance and goal reaching.

pith-pipeline@v0.9.0 · 5764 in / 1414 out tokens · 68325 ms · 2026-05-19T08:10:40.283471+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Synthesizing Safety in Infinite-Horizon Optimal Control for Disturbed High-Relative-Degree Systems via Barrier-Regulating Auxiliary Variables

    eess.SY 2026-04 unverdicted novelty 5.0

    A framework reformulates safety-constrained infinite-horizon optimal control as an unconstrained problem on an extended state space using barrier-Lyapunov functions, auxiliary variables, adaptive excitation, and onlin...

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper

  1. [1]

    Abaunza, P

    H. Abaunza, P. Castillo, and S. V . Drakunov. Quadrotor fleet autonomous navigation: Fusing virtual points control and nonlinear potential fields. IEEE Transactions on Control Systems Technology , 33(3):903–914, 2025

  2. [2]

    A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control , 62(8):3861–3876, 2017

  3. [3]

    Bandyopadhyay and S

    S. Bandyopadhyay and S. Bhasin. HJB based online safe reinforcement learning for state-constrained systems. arXiv e-prints, 2023

  4. [4]

    M. H. Cohen and C. Belta. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica, 147:110684, 2023

  5. [5]

    Dawood, S

    M. Dawood, S. Pan, N. Dengler, S. Zhou, A. P. Schoellig, and M. Bennewitz. Safe multi-agent reinforcement learning for behavior- based cooperative navigation. IEEE Robotics and Automation Letters, 10(6):6256–6263, 2025

  6. [6]

    Farivarnejad, A

    H. Farivarnejad, A. S. Lafmejani, and S. Berman. Local navigation- like functions for safe robot navigation in bounded domains with unknown convex obstacles. Automatica, 161:111452, 2024

  7. [7]

    J. Fu, G. Wen, and X. Yu. Safe consensus tracking with guaranteed full state and input constraints: A control barrier function-based approach. IEEE Transactions on Automatic Control , 68(12):8075– 8081, 2023

  8. [8]

    Z. Gao, G. Yang, and A. Prorok. Online control barrier functions for decentralized multi-agent navigation. In 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS) , pages 107–113, 2023

  9. [9]

    M. Han, Y . Tian, L. Zhang, J. Wang, and W. Pan. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 129:109689, 2021

  10. [10]

    Huang, X

    F. Huang, X. Chen, and Z. Chen. Cooperative target enclosing control for multiple unmanned surface vehicles with unknown dynamics and safety assurance. IEEE Transactions on Intelligent Vehicles , 2024

  11. [11]

    Jankovic, M

    M. Jankovic, M. Santillo, and Y . Wang. Multiagent systems with CBF-based controllers: Collision avoidance and liveness from instability. IEEE Transactions on Control Systems Technology , 32(2):705–712, 2024

  12. [12]

    H. K. Khalil. Nonlinear systems. Prentice-Hall, New Jersey, 1996

  13. [13]

    Y . Ma, Q. Khan, and D. Cremers. Multi agent navigation in unconstrained environments using a centralized attention based graphical neural network controller. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC) , pages 2893–2900, 2023

  14. [14]

    Mestres, C

    P. Mestres, C. Nieto-Granda, and J. Cort ´es. Distributed safe navigation of multi-agent systems using control barrier function- based controllers. IEEE Robotics and Automation Letters, 9(7):6760– 6767, 2024

  15. [15]

    Namerikawa, A

    R. Namerikawa, A. Wiltz, F. Mehdifar, T. Namerikawa, and D. V . Dimarogonas. On the equivalence between prescribed performance control and control barrier functions. In 2024 American Control Conference (ACC), pages 2458–2463, 2024

  16. [16]

    Sabouni, C

    E. Sabouni, C. G. Cassandras, W. Xiao, and N. Meskin. Optimal control of connected automated vehicles with event/self-triggered control barrier functions. Automatica, 162:111530, 2024

  17. [17]

    Safaoui, A

    S. Safaoui, A. P. Vinod, A. Chakrabarty, R. Quirynen, N. Yoshikawa, and S. Di Cairano. Safe multiagent motion planning under uncertainty for drones using filtered reinforcement learning. IEEE Transactions on Robotics, 40:2529–2542, 2024

  18. [18]

    A. D. Saravanos, Y . Aoyama, H. Zhu, and E. A. Theodorou. Distributed differential dynamic programming architectures for large-scale multiagent control. IEEE Transactions on Robotics , 39(6):4387–4407, 2023

  19. [19]

    K. G. Vamvoudakis and F. L. Lewis. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 46(5):878–888, 2010

  20. [20]

    A. P. Vinod, S. Safaoui, T. H. Summers, N. Yoshikawa, and S. Di Cairano. Decentralized, safe, multiagent motion planning for drones under uncertainty via filtered reinforcement learning. IEEE Transactions on Control Systems Technology, 32(6):2492–2499, 2024

  21. [21]

    L. Wang, A. D. Ames, and M. Egerstedt. Safety barrier certificates for collisions-free multirobot systems. IEEE Transactions on Robotics , 33(3):661–674, 2017

  22. [22]

    Z. Wang, T. Hu, and L. Long. Multi-UA V safe collaborative transportation based on adaptive control barrier function. IEEE Transactions on Systems, Man, and Cybernetics: Systems , 53(11):6975–6983, 2023

  23. [23]

    Wu and L

    S. Wu and L. Long. Obstacle avoidance and safe coverage of moving domains for multiagent systems via adaptive control barrier function. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 55(7):5080–5090, 2025. 14

  24. [24]

    W. Xiao, C. A. Belta, and C. G. Cassandras. Sufficient conditions for feasibility of optimal control problems using control barrier functions. Automatica, 135:109960, 2022

  25. [25]

    B. Yan, P. Shi, C. P. Lim, Y . Sun, and R. K. Agarwal. Security and safety-critical learning-based collaborative control for multiagent systems. IEEE Transactions on Neural Networks and Learning Systems, 36(2):2777–2788, 2025

  26. [26]

    S. Yang, G. J. Pappas, R. Mangharam, and L. Lindemann. Safe perception-based control under stochastic sensor uncertainty using conformal prediction. In 2023 62nd IEEE Conference on Decision and Control (CDC) , pages 6072–6078, 2023

  27. [27]

    Y . Yang, Y . Zhang, W. Zou, J. Chen, Y . Yin, and S. E. Li. Synthesizing control barrier functions with feasible region iteration for safe reinforcement learning. IEEE Transactions on Automatic Control, 69(4):2713–2720, 2024

  28. [28]

    C. Yu, H. Yu, and S. Gao. Learning control admissibility models with graph neural networks for multi-agent navigation. In Conference on Robot Learning, pages 934–945. PMLR, 2023

  29. [29]

    Zhang, R

    L. Zhang, R. Zhang, T. Wu, R. Weng, M. Han, and Y . Zhao. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Transactions on Neural Networks and Learning Systems, 32(12):5435–5444, 2021

  30. [30]

    Zhang, K

    S. Zhang, K. Garg, and C. Fan. Neural graph control barrier functions guided distributed collision-avoidance multi-agent control. In Conference on Robot Learning , pages 2373–2392. PMLR, 2023

  31. [31]

    Zhang, O

    S. Zhang, O. So, M. Black, and C. Fan. Discrete GCBF proximal policy optimization for multi-agent safe optimal control. In Conference on Learning Representations , 2025

  32. [32]

    Zhang, O

    S. Zhang, O. So, K. Garg, and C. Fan. GCBF+: A neural graph control barrier function framework for distributed safe multiagent control. IEEE Transactions on Robotics , 41:1533–1552, 2025

  33. [33]

    Zhang, W

    X. Zhang, W. Pan, C. Li, X. Xu, X. Wang, R. Zhang, and D. Hu. Toward scalable multirobot control: Fast policy learning in distributed MPC. IEEE Transactions on Robotics , 41:1491–1512, 2025. 15