pith. sign in

arxiv: 2605.19703 · v1 · pith:Z27L4DDFnew · submitted 2026-05-19 · 💻 cs.RO

KIO-planner: Attention-Guided Single-Stage Motion Planning with Dual Mapping for UAV Navigation

Pith reviewed 2026-05-20 05:30 UTC · model grok-4.3

classification 💻 cs.RO
keywords UAV navigationmotion planningattention mechanismdual mappinggeometric safetykinodynamic constraintsdepth imagestrajectory planning
0
0 comments X

The pith

KIO-planner uses attention and dual mapping to enable 3 m/s UAV flights in tight spaces with larger safety margins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KIO-planner as a single-stage planning method for UAVs operating in confined, wall-dense areas where traditional optimizers get stuck and learning methods lack safety guarantees. It adds a Convolutional Block Attention Module to the perception network so the system focuses on structural edges and open space in raw depth images. The core addition is Dual Mapping, which activates physical bounds and applies a deterministic Geometric Safety Shield directly in depth-pixel space to enforce both collision avoidance and kinodynamic limits without building or fusing a global map. High-fidelity simulations show the planner sustains 3.0 m/s speeds, runs at roughly 24 ms latency, cuts control cost by 28.4 percent, and raises the minimum distance to obstacles from 0.48 m to 0.76 m.

Core claim

KIO-planner is an attention-guided single-stage trajectory planning framework that integrates a Convolutional Block Attention Module into the perception backbone and introduces Dual Mapping (physical bounds activation plus a deterministic Geometric Safety Shield in depth-pixel space) to enforce kinodynamic feasibility and collision-free flight without global map fusion, achieving agile navigation at up to 3.0 m/s with 24 ms inference latency, 28.4 percent lower control cost, and a worst-case safety margin of 0.76 m.

What carries the argument

Dual Mapping mechanism that combines physical bounds activation with a deterministic Geometric Safety Shield operating directly in depth-pixel space to enforce feasibility and safety without global map fusion.

If this is right

  • UAVs maintain collision-free flight at 3 m/s through dense structural obstacles without constructing or maintaining a global map.
  • Trajectories become smoother, cutting control effort by 28.4 percent and improving energy efficiency.
  • Inference runs at approximately 24 ms, supporting real-time replanning in changing environments.
  • The worst-case minimum distance to obstacles rises from 0.48 m to 0.76 m, reducing risk near walls.
  • The single-stage design removes the latency and local-minima problems of separate mapping and optimization pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pixel-space shield could be ported to ground robots or manipulators that also receive depth input.
  • Pairing the attention module with additional sensor modalities might further improve feature focus in low-light conditions.
  • The latency reduction opens the possibility of closing the loop with higher-frequency state estimators on small UAVs.
  • Extending the shield to handle moving obstacles would require only local depth updates rather than full map rebuilding.

Load-bearing premise

A deterministic Geometric Safety Shield in depth-pixel space together with physical bounds activation can reliably enforce kinodynamic feasibility and collision-free trajectories without global map fusion in the tested high-fidelity scenarios.

What would settle it

A high-fidelity simulation run in which the UAV either collides with an obstacle or violates kinodynamic limits while the Dual Mapping shield is active would falsify the central safety claim.

Figures

Figures reproduced from arXiv: 2605.19703 by Baili Lu, Dexing Yao, Dingcheng Yang, Haochen Li, Jiahui Xu, Jinxuan Hu, Junhao Wei, Lele Tian, Sio-Kei Im, Xu Yang, Yanxiao Li, Yapeng Wang, Yifu Zhao, Zikun Li.

Figure 1
Figure 1. Figure 1: The proposed KIO-planner framework: The depth image and odometry are processed by a CBAM-enhanced backbone to predict raw primitives, which [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative 3D trajectory comparison in a highly confined wall [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectory comparison of the three algorithms at [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trajectory comparison of the three algorithms at [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Autonomous UAV flight in confined, wall-dense environments requires low-latency and reliable motion planning under strict safety constraints. Traditional optimization-based planners suffer from mapping latency and easily fall into local minima when navigating through dense structural obstacles. Meanwhile, existing end-to-end learning methods struggle to extract fine-grained geometric features from raw depth images and lack hard kinodynamic constraints, leading to unpredictable collisions near walls. To address these issues, we propose KIO-planner, an attention-guided single-stage trajectory planning framework. First, we integrate a Convolutional Block Attention Module (CBAM) into the perception backbone to adaptively focus on critical structural edges and traversable space. Second, we introduce a novel Dual Mapping mechanism--comprising physical bounds activation and a deterministic Geometric Safety Shield in the depth-pixel space--to enforce kinodynamic feasibility and collision-free flight without global map fusion. Extensive high-fidelity simulated experiments demonstrate that KIO-planner enables highly agile navigation at speeds up to 3.0 m/s. Compared to the state-of-the-art baseline, KIO-planner achieves lower inference latency (approximately 24 ms) and generates significantly smoother trajectories, reducing control cost by 28.4%. Most notably, our Dual Mapping substantially increases the worst-case safety margin, measured by minimum distance to obstacles, from 0.48 m to 0.76 m, ensuring fast, smooth, and safer navigation in highly constrained environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents KIO-planner, an attention-guided single-stage trajectory planning framework for UAVs in confined, wall-dense environments. It integrates a Convolutional Block Attention Module (CBAM) into the perception backbone and proposes a Dual Mapping mechanism comprising physical bounds activation and a deterministic Geometric Safety Shield in depth-pixel space to enforce kinodynamic feasibility and collision-free flight without global map fusion. High-fidelity simulation results claim agile navigation at speeds up to 3.0 m/s, inference latency of approximately 24 ms, 28.4% reduction in control cost, and an increase in minimum obstacle distance from 0.48 m to 0.76 m relative to a state-of-the-art baseline.

Significance. If the performance claims are substantiated with rigorous experimental protocols, the integration of attention mechanisms with a deterministic pixel-space safety shield could offer a practical advance for low-latency UAV planning in structured environments, potentially reducing dependence on global mapping and improving safety margins in dense obstacle fields.

major comments (2)
  1. [Experimental evaluation] Experimental evaluation section: The abstract and results report specific quantitative improvements (28.4% control cost reduction, minimum distance increase from 0.48 m to 0.76 m, 24 ms latency) from high-fidelity simulations, yet supply no details on trial counts, statistical tests, variance across runs, baseline implementation specifics, or scenario selection criteria. This absence prevents verification of the central performance claims and their statistical reliability.
  2. [§3.2] §3.2 Dual Mapping mechanism: The assertion that the depth-pixel Geometric Safety Shield plus physical bounds activation guarantees 3D kinodynamic feasibility and collision-free trajectories without global map fusion is load-bearing for the safety claims, but the manuscript provides no formal analysis or counterexample testing for cases involving occlusions, thin vertical structures, incomplete depth data, or future-state violations during aggressive maneuvers at 3 m/s.
minor comments (1)
  1. [Methods] Figure captions and method diagrams could more explicitly illustrate the data flow between the CBAM attention module, Dual Mapping components, and the trajectory output to clarify the single-stage architecture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below and outline the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Experimental evaluation] Experimental evaluation section: The abstract and results report specific quantitative improvements (28.4% control cost reduction, minimum distance increase from 0.48 m to 0.76 m, 24 ms latency) from high-fidelity simulations, yet supply no details on trial counts, statistical tests, variance across runs, baseline implementation specifics, or scenario selection criteria. This absence prevents verification of the central performance claims and their statistical reliability.

    Authors: We agree that the current manuscript lacks sufficient detail on the experimental protocol, which limits independent verification of the reported metrics. In the revised manuscript we will expand the Experimental Evaluation section to report: the total number of trials (50 independent runs per scenario), standard deviations and inter-quartile ranges across runs, results of statistical significance tests (Wilcoxon signed-rank test, p < 0.05), precise baseline re-implementation details (official open-source code with identical sensor noise models and hyperparameters), and explicit scenario selection criteria (wall-dense environments drawn from a standard UAV benchmark with obstacle density > 4 walls per 10 m^{3}). These additions will directly support the claimed 28.4 % control-cost reduction, 0.76 m clearance, and 24 ms latency. revision: yes

  2. Referee: [§3.2] §3.2 Dual Mapping mechanism: The assertion that the depth-pixel Geometric Safety Shield plus physical bounds activation guarantees 3D kinodynamic feasibility and collision-free trajectories without global map fusion is load-bearing for the safety claims, but the manuscript provides no formal analysis or counterexample testing for cases involving occlusions, thin vertical structures, incomplete depth data, or future-state violations during aggressive maneuvers at 3 m/s.

    Authors: The Dual Mapping mechanism operates entirely in local depth-pixel space to avoid global-map latency while enforcing deterministic safety bounds. Although the manuscript validates the approach through high-fidelity simulations that include 3 m/s aggressive flight in dense environments, we acknowledge the absence of formal proofs and exhaustive counterexample analysis for edge cases. In the revision we will insert a new limitations subsection in §3.2 that explicitly discusses assumptions regarding depth completeness and thin-structure visibility, and we will add supplementary simulation results that test partial occlusions and thin vertical obstacles. A complete formal guarantee covering every possible future-state violation remains beyond the present scope and will be noted as future theoretical work. revision: partial

Circularity Check

0 steps flagged

No circularity: claims rest on architectural proposal and simulation benchmarks

full rationale

The paper proposes KIO-planner as an attention-guided single-stage framework with CBAM integration and a Dual Mapping mechanism (physical bounds activation plus deterministic Geometric Safety Shield in depth-pixel space) to enforce kinodynamic feasibility without global map fusion. All reported gains—3.0 m/s navigation, ~24 ms latency, 28.4% control-cost reduction, and safety-margin improvement from 0.48 m to 0.76 m—are presented as outcomes of high-fidelity simulated experiments. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would reduce any central claim to its own inputs by construction. The derivation chain is therefore self-contained as an empirical architecture-plus-benchmark contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on standard robotics modeling assumptions plus two newly introduced components whose performance is asserted via simulation results.

axioms (2)
  • domain assumption Depth images from onboard sensors contain sufficient geometric cues for real-time collision avoidance
    Invoked to justify operating the safety shield directly in pixel space without global fusion.
  • domain assumption UAV motion can be constrained by kinodynamic limits that are enforceable in a single planning stage
    Required for the claim that the Geometric Safety Shield produces feasible trajectories.
invented entities (2)
  • Dual Mapping mechanism no independent evidence
    purpose: Enforce kinodynamic feasibility and collision-free flight without global map fusion
    Newly proposed construct whose benefits are demonstrated only within the paper's simulations
  • Geometric Safety Shield no independent evidence
    purpose: Provide deterministic collision checking in depth-pixel space
    Core safety component introduced to overcome limitations of existing mapping approaches

pith-pipeline@v0.9.0 · 5840 in / 1473 out tokens · 73334 ms · 2026-05-20T05:30:09.840356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Robust and efficient quadrotor trajectory generation for fast autonomous flight[J]

    Zhou B, Gao F, Wang L, et al. Robust and efficient quadrotor trajectory generation for fast autonomous flight[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 3529-3536

  2. [2]

    EGO-planner: An esdf-free gradient-based local planner for quadrotors[J]

    Zhou X, Wang Z, Ye H, et al. EGO-planner: An esdf-free gradient-based local planner for quadrotors[J]. IEEE Robotics and Automation Letters. 2020, 6(2): 478-485

  3. [3]

    Geometrically constrained trajectory optimization for multicopters [J]

    Wang Z, Zhou X, Xu C, et al. Geometrically constrained trajectory optimization for multicopters [J]. IEEE Transactions on Robotics, 2022, 38(5): 3259-3278

  4. [4]

    Learning high-speed flight in the wild[J]

    Loquercio A, Kaufmann E, Ranftl R, et al. Learning high-speed flight in the wild[J]. Science Robotics, 2021, 6(59): eabg5810

  5. [5]

    Tordesillas J. How J P. Deep-panther: Learning-based perception-aware trajectory planner in dynamic environments[J]. IEEE Robotics and Automation Letters, 2023, 8(3): 1399-1406

  6. [6]

    Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV)

    Woo S, Park J, Lee J Y , et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19

  7. [7]

    Nanomap: Fast, uncertainty-aware proximity queries with lazy search over local 3d data[C]//2018 IEEE international conference on robotics and automation (ICRA)

    Florence P R, Carter J, Ware J, et al. Nanomap: Fast, uncertainty-aware proximity queries with lazy search over local 3d data[C]//2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018: 7631-7638

  8. [8]

    Ehlers R, et al

    Alshiekh M, Bloem R. Ehlers R, et al. Safe reinforcement learning via shielding[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1)

  9. [9]

    A computationally efficient motion primitive for quadrocopter trajectory generation[J]

    Mueller M W, Hehn M, D’Andrea R. A computationally efficient motion primitive for quadrocopter trajectory generation[J]. IEEE transactions on robotics, 2015, 31(6): 1294-1310

  10. [10]

    Shen L, Sun G

    Hu J. Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141

  11. [11]

    SAGA: A Robust Self-Attention and Goal-Aware Anchor-based Planner for Safe UAV Autonomous Navigation

    Wei J, Li Y , Yao D, et al. SAGA: A Robust Self-Attention and Goal- Aware Anchor-based Planner for Safe UA V Autonomous Navigation[J]. arXiv preprint arXiv:2605.02301, 2026

  12. [12]

    GeoSSA: Geometric Sparrow Search Algo- rithm for UA V Path Planning and Engineering Design Optimization[J]

    Wei J, Zhu W, Xu Q, et al. GeoSSA: Geometric Sparrow Search Algo- rithm for UA V Path Planning and Engineering Design Optimization[J]. arXiv preprint arXiv:2601.19346, 2026

  13. [13]

    CICDWOA: A Collective Cognitive Sharing Whale Optimization Algorithm with Cauchy Inverse Cumu- lative Distribution for 2D/3D Path Planning and Engineering Design Problems[J]

    Wei J, Li Y , Mirjalili S, et al. CICDWOA: A Collective Cognitive Sharing Whale Optimization Algorithm with Cauchy Inverse Cumu- lative Distribution for 2D/3D Path Planning and Engineering Design Problems[J]. arXiv preprint arXiv:2603.20501, 2026

  14. [14]

    Wei J, Gu Y , Law K L E, et al. Adaptive Position Updating Particle Swarm Optimization for UA V Path Planning[C]//2024 22nd International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, 2024: 124–131

  15. [15]

    Landscape-Aware Bandit Hyper-Heuristics for Online Operator Selection in UAV Inspection Routing

    Wei J, Li Y , Zhao Y , et al. Landscape-Aware Bandit Hyper-Heuristics for Online Operator Selection in UA V Inspection Routing[J]. arXiv preprint arXiv:2605.14620, 2026

  16. [16]

    Wei J, Gu Y , Yan Y , et al. TSWOA: An Enhanced WOA with Triangular Walk and Spiral Flight for Engineering Design Optimization[C]//2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE). IEEE, 2025: 186–194

  17. [17]

    Nawoa-xgboost: A novel model for early prediction of academic potential in computer science students[C]//2026 2026 6th Asia Conference on Information Engineering (ACIE) ACIE

    Wei J, Gu Y , Zhang R, et al. Nawoa-xgboost: A novel model for early prediction of academic potential in computer science students[C]//2026 2026 6th Asia Conference on Information Engineering (ACIE) ACIE. Nanyang Technological University, Singapore, 2026: 62–70

  18. [18]

    Li Z, Zhu W, Zhang R, et al. ASKSSA-CNN-BiLSTM: A Novel Time Series Forecasting Model for Stock Price Prediction Based on An Enhanced Sparrow Search Algorithm[C]//2026 2026 6th Asia Confer- ence on Information Engineering (ACIE) ACIE. Nanyang Technological University, Singapore, 2026: 20–26

  19. [19]

    AHRRT: An Enhanced Rapidly-Exploring Random Tree Algorithm with Heuristic Search for UA V Urban Path Planning

    Wei J, Gu Y , Zhang R, Zhu W, Wu S, Wang Y , Cheong N, Wang Z, Im S.-K., Yang X. AHRRT: An Enhanced Rapidly-Exploring Random Tree Algorithm with Heuristic Search for UA V Urban Path Planning. Preprints 2025, 2025111805. doi:10.20944/preprints202511.1805.v1