pith. sign in

arxiv: 2606.01605 · v2 · pith:LT7BKPDXnew · submitted 2026-06-01 · 💻 cs.RO

Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control

Pith reviewed 2026-06-28 14:42 UTC · model grok-4.3

classification 💻 cs.RO
keywords semantic riskEuclidean signed distance fieldcontrol barrier functionsmonocular SLAMsafe navigationrisk-aware controlsemantic fusion
0
0 comments X

The pith

Semantic class labels are fused into the Euclidean signed distance field before control optimization so high-risk objects impose larger margins on CBF navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to encode object class risk directly into the spatial distance field used by a barrier-function controller instead of treating all obstacles uniformly or adjusting the controller afterward. Geometry is reconstructed from monocular video and semantic labels are fused in, after which class-dependent inflation expands high-risk regions before the ESDF is computed. The resulting field supplies distances and gradients to the CBF while class-dependent gains further shape the response, all at online rates. Readers would care because the risk information is baked into the geometry representation ahead of time, producing context-sensitive avoidance without extra runtime cost.

Core claim

The framework reconstructs dense 3-D geometry from monocular RGB video via foundation-model SLAM, fuses per-frame semantic segmentation labels into the map, applies class-dependent inflation to safety-relevant regions, and computes an ESDF on the inflated geometry. This semantic-aware ESDF supplies the local distances and spatial derivatives required by the CBF controller, with additional class-dependent gains regulating the response, enabling 10-20 Hz online operation and semantic-aware safe behavior in teleoperation and autonomous navigation.

What carries the argument

The semantic-aware ESDF formed by class-dependent inflation of reconstructed geometry before field computation, which encodes risk into distances and derivatives supplied to the CBF.

If this is right

  • The CBF receives risk-adjusted distances using only standard ESDF queries at runtime.
  • High-risk object classes influence larger spatial regions in the safety field by design.
  • Efficient distance and gradient queries are retained because the representation remains an ESDF.
  • The pipeline achieves 10-20 Hz operation in both simulation and hardware experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Continuous updating of the semantic field could support safer motion among moving objects whose classes are re-labeled over time.
  • The inflation rules could be extended to predicted object trajectories to reduce risk in dynamic scenes.
  • Field performance under noisy or incomplete segmentation would indicate how much label accuracy is required for the approach to remain effective.

Load-bearing premise

Per-frame semantic segmentation labels can be fused reliably into the reconstructed 3-D geometry to produce class-dependent inflation that correctly represents risk for the downstream controller.

What would settle it

A hardware trial in which the robot maintains the same clearance from a high-risk object under semantic inflation as under uniform inflation, or collides despite the inflated field, would show the embedding adds no benefit.

Figures

Figures reproduced from arXiv: 2606.01605 by Dawei Zhang, Nuo Chen, Roberto Tron, Shuo Liu, Zhiwen Fan.

Figure 1
Figure 1. Figure 1: Online semantic-aware safe navigation based on monocular dense SLAM, semantic mapping, and CBFs. Early CBF-based safe control methods typically assume that the environment is known or can be represented by simple geo￾metric obstacles, such as circles, spheres, or other analytical sets [1], [2], [5]. These formulations provide clean safety constraints, but their reliance on manually specified obstacle geome… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed online semantic-aware safe control framework. Monocular RGB frames are processed by semantic segmentation and MASt3R-SLAM-based dense geometry estimation. Semantic labels are temporally fused with reconstructed 3-D geometry, which is integrated into a local TSDF and converted into obstacle-aware occupancy before ESDF construction. Obstacle filtering and class-dependent inflation en… view at source ↗
Figure 3
Figure 3. Figure 3: Teleoperated robot trajectories under different obstacle semantics. (a) Low-risk obstacle (ball), where the robot allows closer interaction with minimal intervention. (b) High-risk obstacle (dog), where the safety filter activates earlier and maintains a larger clearance. matching progress across methods, its clearance advantage from stopping earlier is removed, and the ESDF-based methods achieve comparabl… view at source ↗
Figure 4
Figure 4. Figure 4: Teleoperation results with semantic CBF-based safety filtering. Each row shows the obstacle distance (left), linear velocity (middle), and angular velocity (right). The first row corresponds to the low-risk ball, where the robot approaches with smaller clearance and the safety filter intervenes later. The second row corresponds to the high-risk dog, where the safety filter activates earlier and maintains a… view at source ↗
Figure 5
Figure 5. Figure 5: Overhead view of the navigation experiment. The robot starts from the lower-left corner. (a) The robot completes a loop using the RGB-only SLAM map. (b) An unseen obstacle is introduced along the path, and the robot updates the local ESDF, deviates to avoid it, and continues the task. 2) Navigation: We further evaluate the proposed framework in an autonomous navigation task to demonstrate its integration w… view at source ↗
read the original abstract

We propose an online monocular perception-to-control framework that embeds semantic risk into the distance field used by Control Barrier Function (CBF)-based safe navigation and teleoperation. Many perception-based safety filters assign the same distance-based safety margin to all mapped obstacles or use semantics only as a downstream controller adjustment, rather than encoding semantic risk in the spatial representation. Our framework instead reasons online about obstacle geometry and class-dependent risk by embedding semantic information directly into the Euclidean Signed Distance Field (ESDF). This design encodes semantic risk before control optimization, so high-risk objects exert a larger spatial influence in the safety field while retaining efficient ESDF queries at runtime. Specifically, a foundation-model-based SLAM front end reconstructs dense 3-D geometry from monocular RGB video, while per-frame semantic segmentation provides pixel-level class labels that are fused into the reconstructed geometry. The resulting geometric-semantic representation is then converted into an ESDF, where semantic labels identify safety-relevant regions and impose class-dependent inflation before field computation. The semantic-aware ESDF provides the local distance values and spatial derivatives required by the CBF controller, while class-dependent gains further regulate the controller response. Extensive simulation and hardware experiments demonstrate online operation at 10--20 Hz and semantic-aware safe behavior in both teleoperation and autonomous navigation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes an online monocular perception-to-control pipeline that embeds semantic risk directly into an ESDF by fusing foundation-model SLAM geometry with per-frame semantic segmentation labels, applying class-dependent inflation to the geometry before ESDF computation, and supplying the resulting field and gradients to a CBF controller (with class-dependent gains) for safe teleoperation and navigation. The central claim is that this pre-control encoding yields semantic-aware safety margins while supporting 10-20 Hz runtime operation, as demonstrated in simulation and hardware experiments.

Significance. If the fusion and inflation steps can be shown to produce reliable risk-adjusted margins under realistic monocular perception noise, the approach would offer a useful alternative to post-hoc semantic adjustments in CBF safety filters by shifting class-dependent risk into the spatial representation itself. This could improve safety in environments with heterogeneous obstacles while preserving efficient distance queries.

major comments (2)
  1. [Abstract] Abstract: the claim that experiments demonstrate 'semantic-aware safe behavior' at 10-20 Hz rests on unverified assertions; no quantitative metrics, error bars, ablation results, fusion accuracy measures, or closed-loop violation rates under perception noise are reported, leaving the central safety claim unsupported.
  2. [Abstract] Abstract (fusion and inflation steps): the method assumes per-frame semantic labels can be fused reliably into monocular dense geometry and that class-dependent inflation produces appropriate CBF margins, yet provides no analysis of error propagation from scale drift, depth inaccuracies, label confusion, or temporal inconsistency; this assumption is load-bearing for the claim that h(x) ≥ 0 encodes the intended risk-adjusted safety.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger quantitative support and analysis of modeling assumptions. We address each major comment below and will revise the manuscript to incorporate additional metrics and discussion.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that experiments demonstrate 'semantic-aware safe behavior' at 10-20 Hz rests on unverified assertions; no quantitative metrics, error bars, ablation results, fusion accuracy measures, or closed-loop violation rates under perception noise are reported, leaving the central safety claim unsupported.

    Authors: The manuscript reports measured runtimes of 10-20 Hz via timing benchmarks in simulation and on hardware, along with qualitative demonstrations of collision-free teleoperation and navigation that differentiate semantic-aware inflation from uniform margins. We agree, however, that the abstract and results section would benefit from explicit quantitative safety metrics (e.g., minimum achieved distances, closed-loop constraint violation counts, ablation on inflation radii, and fusion accuracy under noise). We will add these metrics, error bars, and baseline comparisons in the revised version to better substantiate the safety claims. revision: yes

  2. Referee: [Abstract] Abstract (fusion and inflation steps): the method assumes per-frame semantic labels can be fused reliably into monocular dense geometry and that class-dependent inflation produces appropriate CBF margins, yet provides no analysis of error propagation from scale drift, depth inaccuracies, label confusion, or temporal inconsistency; this assumption is load-bearing for the claim that h(x) ≥ 0 encodes the intended risk-adjusted safety.

    Authors: The pipeline relies on off-the-shelf foundation-model SLAM and segmentation whose per-frame outputs are fused into the ESDF; the manuscript does not contain a dedicated propagation analysis or sensitivity study for scale drift, depth error, label noise, or temporal inconsistency. We will revise the manuscript to include an explicit discussion of these error sources and their potential effect on the semantic ESDF and the resulting CBF constraint h(x) ≥ 0. Where data permits, we will also add a limited sensitivity experiment or conservative bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a perception-to-control pipeline that fuses external foundation-model SLAM geometry with per-frame semantic segmentation labels, applies class-dependent inflation, computes an ESDF, and supplies it to a CBF controller. No equations, parameters, or steps are shown that reduce the claimed semantic-risk encoding or safety behavior to fitted values defined inside the paper or to self-citations whose content is itself unverified. The central construction is a design choice whose correctness is asserted to rest on external models rather than internal self-definition or renaming of results. This matches the default expectation of a non-circular pipeline.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the accuracy of external foundation models for SLAM and segmentation plus the appropriateness of manually chosen class-dependent inflation factors; these are not derived inside the paper.

free parameters (2)
  • class-dependent inflation distances
    Abstract states that semantic labels impose class-dependent inflation before field computation; the specific distances per class are not derived from first principles and must be chosen.
  • class-dependent controller gains
    The abstract mentions that class-dependent gains further regulate the controller response; these scalars are introduced without derivation.
axioms (2)
  • domain assumption Foundation-model SLAM produces sufficiently accurate dense 3-D geometry from monocular RGB video for online use.
    Invoked in the description of the SLAM front-end that reconstructs geometry before semantic fusion.
  • domain assumption Per-frame semantic segmentation yields reliable pixel-level class labels that survive projection and fusion into the map.
    Required for the step that identifies safety-relevant regions and imposes class-dependent inflation.

pith-pipeline@v0.9.1-grok · 5767 in / 1556 out tokens · 20131 ms · 2026-06-28T14:42:04.414427+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Control barrier function based quadratic programs for safety critical systems,

    A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, 2017

  2. [2]

    Control barrier functions: Theory and applications,

    A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” inProc. Eur. Control Conf., 2019, pp. 3420–3431

  3. [3]

    Control barrier functions for systems with high relative degree,

    W. Xiao and C. Belta, “Control barrier functions for systems with high relative degree,” inProc. IEEE Conf. Decis. Control, 2019, pp. 474–479

  4. [4]

    Auxiliary-variable adaptive control barrier functions for safety critical systems,

    S. Liu, W. Xiao, and C. A. Belta, “Auxiliary-variable adaptive control barrier functions for safety critical systems,” inProc. IEEE Conf. Decis. Control, 2023, pp. 8602–8607

  5. [5]

    Learning barrier functions with memory for robust safe navigation,

    K. Long, C. Qian, J. Cort ´es, and N. Atanasov, “Learning barrier functions with memory for robust safe navigation,”IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 4931–4938, 2021

  6. [6]

    Enforcing safety for vision-based controllers via control barrier functions and neural radiance fields,

    M. Tong, C. Dawson, and C. Fan, “Enforcing safety for vision-based controllers via control barrier functions and neural radiance fields,” in Proc. IEEE Int. Conf. Robot. Automat., 2023, pp. 10 511–10 517

  7. [7]

    Point cloud-based control barrier function regression for safe and efficient vision-based control,

    M. De Sa, P. Kotaru, and K. Sreenath, “Point cloud-based control barrier function regression for safe and efficient vision-based control,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 366–372

  8. [8]

    Control- barrier-aided teleoperation with visual-inertial SLAM for safe MA V navigation in complex environments,

    S. Zhou, S. Papatheodorou, S. Leutenegger, and A. P. Schoellig, “Control- barrier-aided teleoperation with visual-inertial SLAM for safe MA V navigation in complex environments,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 17 836–17 842

  9. [9]

    A control barrier function for safe navigation with online Gaussian splatting maps,

    T. Chen, A. Swann, J. Yu, O. Shorinwa, R. Murai, M. Kennedy, and M. Schwager, “A control barrier function for safe navigation with online Gaussian splatting maps,” inProc. IEEE Int. Conf. Robot. Automat., 2025, pp. 11 758–11 765

  10. [10]

    Closing the perception-action loop for semantically safe navigation in semi-static environments,

    J. Qian, S. Zhou, N. J. Ren, V . Chatrath, and A. P. Schoellig, “Closing the perception-action loop for semantically safe navigation in semi-static environments,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 11 641–11 648

  11. [11]

    ASMA: An adaptive safety margin algorithm for vision-language drone navigation via scene-aware control barrier functions,

    S. Sanyal and K. Roy, “ASMA: An adaptive safety margin algorithm for vision-language drone navigation via scene-aware control barrier functions,”IEEE Robot. Autom. Lett., 2025

  12. [12]

    Chen and R

    J. Chen and R. Chandra, “Dynamic control barrier function regulation with vision-language models for safe, adaptive, and realtime visual navigation,”arXiv:2603.21142, 2026

  13. [13]

    DUSt3R: Geometric 3D vision made easy,

    S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “DUSt3R: Geometric 3D vision made easy,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 20 697–20 709

  14. [14]

    Grounding image matching in 3D with MASt3R,

    V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3D with MASt3R,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 71–91

  15. [15]

    Vggt: Visual geometry grounded transformer,

    J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025

  16. [16]

    MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors,

    R. Murai, E. Dexheimer, and A. J. Davison, “MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 16 695–16 705

  17. [17]

    VGGT-SLAM 2.0: Real-time dense feed- forward scene reconstruction,

    D. Maggio and L. Carlone, “VGGT-SLAM 2.0: Real-time dense feed- forward scene reconstruction,”arXiv preprint arXiv:2601.19887, 2026

  18. [18]

    CroCo: Self- supervised pre-training for 3D vision tasks by cross-view completion,

    P. Weinzaepfel, V . Leroy, T. Lucas, R. Br ´egier, Y . Cabon, V . Arora, L. Antsfeld, B. Chidlovskii, G. Csurka, and J. Revaud, “CroCo: Self- supervised pre-training for 3D vision tasks by cross-view completion,” inAdv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 3502–3516

  19. [19]

    EfficientViT: Lightweight multi-scale attention for high-resolution dense prediction,

    H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “EfficientViT: Lightweight multi-scale attention for high-resolution dense prediction,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 17 302–17 313

  20. [20]

    BarrierNet: Differentiable control barrier functions for learning of safe robot control,

    W. Xiao, T.-H. Wang, R. Hasani, M. Chahine, A. Amini, X. Li, and D. Rus, “BarrierNet: Differentiable control barrier functions for learning of safe robot control,”IEEE Trans. Robot., vol. 39, no. 3, pp. 2289–2307, 2023

  21. [21]

    Reinforcement learning-based receding horizon control using adaptive control barrier functions for safety-critical systems,

    E. Sabouni, H. Sabbir Ahmad, V . Giammarino, C. G. Cassandras, I. C. Paschalidis, and W. Li, “Reinforcement learning-based receding horizon control using adaptive control barrier functions for safety-critical systems,” inProc. IEEE Conf. Decis. Control, 2024, pp. 401–406

  22. [22]

    Online control barrier functions for decentralized multi-agent navigation,

    Z. Gao, G. Yang, and A. Prorok, “Online control barrier functions for decentralized multi-agent navigation,” inProc. Int. Symp. Multi-Robot Multi-Agent Syst., 2023, pp. 107–113

  23. [23]

    Reactive and safe co- navigation with haptic guidance,

    M. Coffey, D. Zhang, R. Tron, and A. Pierson, “Reactive and safe co- navigation with haptic guidance,” inProc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2023, pp. 213–220

  24. [24]

    Control strategies for pursuit-evasion under occlusion using visibility and safety barrier functions,

    M. Zhou, M. Shaikh, V . Chaubey, P. Haggerty, S. Koga, D. Panagou, and N. Atanasov, “Control strategies for pursuit-evasion under occlusion using visibility and safety barrier functions,” inProc. IEEE Int. Conf. Robot. Automat., 2025, pp. 12 863–12 869

  25. [25]

    OGM-CBF: Occupancy Grid Map-based Control Barrier Function for Safe Mobile Robot Control with Memory of out of View Obstacles

    G. Raja, T. M¨okk¨onen, and R. Ghabcheloo, “Safe control using occupancy grid map-based control barrier function (OGM-CBF),”arXiv:2405.10703, 2024

  26. [26]

    ScanNet++: A high- fidelity dataset of 3D indoor scenes,

    C. Yeshwanth, Y .-C. Liu, M. Nießner, and A. Dai, “ScanNet++: A high- fidelity dataset of 3D indoor scenes,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023

  27. [27]

    CVXPY: A python-embedded modeling language for convex optimization,

    S. Diamond and S. Boyd, “CVXPY: A python-embedded modeling language for convex optimization,”J. Mach. Learn. Res., vol. 17, no. 83, pp. 1–5, 2016