HAVEN: Hierarchical Adversary-aware Visibility-Enabled Navigation with Cover Utilization using Deep Transformer Q-Networks

Aniket Bera; Damon Conover; Mihir Chauhan

arxiv: 2512.00592 · v2 · submitted 2025-11-29 · 💻 cs.RO

HAVEN: Hierarchical Adversary-aware Visibility-Enabled Navigation with Cover Utilization using Deep Transformer Q-Networks

Mihir Chauhan , Damon Conover , Aniket Bera This is my paper

Pith reviewed 2026-05-17 02:55 UTC · model grok-4.3

classification 💻 cs.RO

keywords hierarchical navigationtransformer Q-networkvisibility-aware planningcover utilizationpartially observable environmentsreinforcement learningautonomous roboticssubgoal selection

0 comments

The pith

A hierarchical controller with transformer Q-networks lets agents pick cover-using subgoals that anticipate occlusions better than memoryless planners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a two-level navigation system, where a transformer-based Q-network selects subgoals from visibility-masked candidates and a potential-field controller executes them, improves success rate, safety margins, and time-to-goal in partially observable settings. It does so by feeding short histories of odometry, goal direction, obstacle proximity, and visibility cues into the network so that exposure penalties and cover rewards shape the high-level decisions. The same feature schema trained in 2D transfers directly to 3D point-cloud input without architectural changes. If this holds, agents can commit to paths that exploit occlusions for protection rather than reacting only to what is currently visible. Ablation results tie the gains to the combination of temporal memory and the visibility-aware candidate generator.

Core claim

The central claim is that a Deep Transformer Q-Network consuming histories of task-aware features can rank candidate subgoals that incorporate masking and exposure penalties, thereby enabling a hierarchical planner to utilize cover and maintain higher safety margins than classical methods or memoryless reinforcement learning while still reaching goals efficiently.

What carries the argument

The Deep Transformer Q-Network that encodes short histories of odometry, goal direction, obstacle proximity, and visibility cues to produce Q-values for ranking visibility-aware subgoals.

If this is right

Agents reach goals at higher rates while keeping larger minimum distances to obstacles.
Temporal memory in the transformer improves decisions over single-frame inputs.
Visibility penalties in candidate generation reduce exposure without slowing overall progress.
The same trained network operates in both 2D grid worlds and 3D Unity-ROS scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same visibility-masking logic could be added to existing potential-field controllers on physical robots to gain anticipatory safety with little extra compute.
If the feature projection generalizes, the approach may reduce the need for separate 3D-specific planners in mixed indoor-outdoor deployments.
Extending the history length or adding adversary models might further improve performance in contested environments.

Load-bearing premise

Projecting 3D point-cloud observations into the identical 2D-derived feature schema works without architectural changes or performance loss.

What would settle it

A controlled 3D trial in which the projected features cause the system to select exposed subgoals and produce measurably lower success rates or safety margins than a native 3D baseline planner would falsify the direct-transfer claim.

Figures

Figures reproduced from arXiv: 2512.00592 by Aniket Bera, Damon Conover, Mihir Chauhan.

**Figure 2.** Figure 2: Proposed hierarchical architecture. The high-level DTQN outputs [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: 2D training and evaluation setup with 5 enemy agents (red) and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: 3D evaluation pipeline. A 2D-trained DTQN + low-level controller [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Autonomous navigation in partially observable environments requires agents to reason beyond immediate sensor input, exploit occlusion, and ensure safety while progressing toward a goal. These challenges arise in many robotics domains, from urban driving and warehouse automation to defense and surveillance. Classical path planning approaches and memoryless reinforcement learning often fail under limited fields of view (FoVs) and occlusions, committing to unsafe or inefficient maneuvers. We propose a hierarchical navigation framework that integrates a Deep Transformer Q-Network (DTQN) as a high-level subgoal selector with a modular low-level controller for waypoint execution. The DTQN consumes short histories of task-aware features, encoding odometry, goal direction, obstacle proximity, and visibility cues, and outputs Q-values to rank candidate subgoals. Visibility-aware candidate generation introduces masking and exposure penalties, rewarding the use of cover and anticipatory safety. A low-level potential field controller then tracks the selected subgoal, ensuring smooth short-horizon obstacle avoidance. We validate our approach in 2D simulation and extend it directly to a 3D Unity-ROS environment by projecting point-cloud perception into the same feature schema, enabling transfer without architectural changes. Results show consistent improvements over classical planners and RL baselines in success rate, safety margins, and time to goal, with ablations confirming the value of temporal memory and visibility-aware candidate design. These findings highlight a generalizable framework for safe navigation under uncertainty, with broad relevance across robotic platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper integrates DTQN with visibility penalties and cover use in a hierarchical setup for occluded navigation, with a direct 2D-to-3D projection trick, but lacks numbers and may lose key 3D occlusion details.

read the letter

The main thing here is a hierarchical RL pipeline that uses a Deep Transformer Q-Network to pick subgoals based on short histories of odometry, goal direction, obstacles, and visibility cues, then passes them to a potential-field low-level controller. It adds masking and exposure penalties to reward staying in cover and avoiding exposure to adversaries. They run it in 2D simulation first, then project 3D point clouds into the same feature format for direct transfer to a Unity-ROS setup without retraining the network. Ablations check the value of the temporal memory and the visibility-aware candidate design.

Referee Report

1 major / 2 minor

Summary. The paper proposes HAVEN, a hierarchical navigation framework for partially observable environments that uses a Deep Transformer Q-Network (DTQN) as a high-level subgoal selector consuming short histories of task-aware features (odometry, goal direction, obstacle proximity, visibility cues). Visibility-aware candidate generation applies masking and exposure penalties to reward cover utilization and anticipatory safety. A modular low-level potential-field controller executes the selected subgoals. The approach is validated in 2D simulation and extended directly to a 3D Unity-ROS environment via projection of point-cloud perception into the identical 2D-derived feature schema, enabling transfer without architectural changes. Results are reported to show consistent gains over classical planners and RL baselines in success rate, safety margins, and time-to-goal, with ablations confirming the contributions of temporal memory and visibility-aware design.

Significance. If the quantitative results hold, the work offers a modular, generalizable framework for safe navigation under uncertainty that explicitly exploits occlusion and cover. The combination of transformer-based temporal reasoning with visibility-aware candidate selection and hierarchical decomposition is a concrete strength that could be adopted across robotic platforms. The direct 2D-to-3D transfer claim, if substantiated, would further increase practical impact.

major comments (1)

[Abstract / 3D Extension] Abstract and § on 3D extension: the central claim that point-cloud projection into the 2D-derived feature schema enables transfer 'without architectural changes' or loss of performance is load-bearing for the reported safety-margin and adversary-aware improvements. Visibility penalties and line-of-sight masking are defined on 2D geometry; the manuscript provides no quantitative check (e.g., cue-fidelity metrics, 3D-vs-2D performance delta, or ablation on vertical occlusion) that the projected features preserve the necessary 3D cover and ray-occlusion information. This assumption therefore remains unverified and directly affects the headline transfer result.

minor comments (2)

[Abstract] Abstract: quantitative improvements are asserted without any numerical values, error bars, statistical tests, or exclusion criteria; adding these (even in summary form) would make the strength of the claims immediately assessable.
[Method] Notation: the precise definition of the visibility cue vector and the exposure penalty coefficient should be stated explicitly (ideally with an equation) rather than described only in prose, to allow reproduction of the candidate-ranking step.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for identifying a key point regarding the 3D extension. We address the comment below and will incorporate the suggested quantitative checks into the revised manuscript.

read point-by-point responses

Referee: [Abstract / 3D Extension] Abstract and § on 3D extension: the central claim that point-cloud projection into the 2D-derived feature schema enables transfer 'without architectural changes' or loss of performance is load-bearing for the reported safety-margin and adversary-aware improvements. Visibility penalties and line-of-sight masking are defined on 2D geometry; the manuscript provides no quantitative check (e.g., cue-fidelity metrics, 3D-vs-2D performance delta, or ablation on vertical occlusion) that the projected features preserve the necessary 3D cover and ray-occlusion information. This assumption therefore remains unverified and directly affects the headline transfer result.

Authors: We agree that the 3D transfer claim would be strengthened by explicit quantitative validation of the projected features. The current implementation extracts horizontal slices from the point cloud and computes 2D visibility and proximity within the projected plane to match the original feature schema exactly, preserving the input format for the DTQN and visibility penalties without any architectural modification. However, we acknowledge that the manuscript does not yet report cue-fidelity metrics, direct 3D-vs-2D performance deltas, or an ablation isolating vertical occlusion effects. In the revision we will add these analyses, including a comparison of ray-occlusion accuracy between the projected 2D features and full 3D ray casting, as well as success-rate and safety-margin differences when the same policy is evaluated in the native 3D environment versus its 2D projection. This will directly substantiate the transfer result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external benchmarks

full rationale

The paper describes a hierarchical DTQN-based navigation framework with visibility-aware candidate generation and evaluates it via direct comparisons to classical planners and RL baselines in 2D simulation plus 3D Unity-ROS transfer. No equations, fitted parameters, or self-citations are shown reducing the reported success-rate, safety-margin, or time-to-goal improvements to inputs by construction. The 2D-to-3D projection step is presented as an implementation choice enabling transfer without architectural changes, not as a self-defining loop. Evaluation relies on independent simulation environments rather than self-referential definitions, satisfying the criteria for a self-contained result against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard RL assumptions and simulation fidelity without introducing new physical entities or unstated mathematical axioms beyond typical MDP framing.

free parameters (2)

DTQN network weights and hyperparameters
Learned during training on simulation episodes; central to Q-value ranking.
Visibility masking and exposure penalty coefficients
Chosen to reward cover use; affect subgoal ranking.

axioms (2)

domain assumption Short histories of task-aware features suffice to capture relevant temporal dependencies for subgoal selection
Invoked in DTQN input design.
domain assumption Point-cloud projection preserves necessary visibility information for 3D transfer
Required for direct sim-to-real extension without architecture changes.

pith-pipeline@v0.9.0 · 5565 in / 1224 out tokens · 41430 ms · 2026-05-17T02:55:58.292687+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We validate our approach in 2D simulation and extend it directly to a 3D Unity-ROS environment by projecting point-cloud perception into the same feature schema, enabling transfer without architectural changes.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Visibility-aware candidate generation introduces masking and exposure penalties, rewarding the use of cover and anticipatory safety.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Motion planning in dynamic environments using velocity obstacles,

P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,”The International Journal of Robotics Research, vol. 17, no. 7, pp. 760–772, 1998. [Online]. Available: https://doi.org/10.1177/027836499801700706

work page doi:10.1177/027836499801700706 1998
[2]

Cov- ernav: Cover following navigation planning in unstructured outdoor environment with deep reinforcement learning,

J. Hossain, A.-Z. Faridee, N. Roy, A. Basak, and D. E. Asher, “Cov- ernav: Cover following navigation planning in unstructured outdoor environment with deep reinforcement learning,” in2023 IEEE Inter- national Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 2023, pp. 127–132

work page 2023
[3]

Path planning and path tracking for collision avoidance of autonomous ground vehicles,

H. Wang and B. Liu, “Path planning and path tracking for collision avoidance of autonomous ground vehicles,”IEEE Systems Journal, vol. 16, no. 3, pp. 3658–3667, 2022

work page 2022
[4]

Occlusion- aware path planning for collision avoidance: Leveraging potential field method with responsibility-sensitive safety,

P. Lin, E. Javanmardi, J. Nakazato, and M. Tsukada, “Occlusion- aware path planning for collision avoidance: Leveraging potential field method with responsibility-sensitive safety,” in2023 IEEE 26th In- ternational Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 2561–2567

work page 2023
[5]

Local path planning with dynamic obstacle avoidance in unstructured environments,

O. A. Guvenkaya, S. A. Iz, and M. Unel, “Local path planning with dynamic obstacle avoidance in unstructured environments,” inIECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 2024, pp. 1–6

work page 2024
[6]

Encomp: Enhanced covert maneuver planning with adaptive target-aware visibility estimation using offline reinforcement learning,

J. Hossain, A.-Z. Faridee, N. Roy, D. E. Asher, J. Freeman, T. Gregory, and T. Trout, “Encomp: Enhanced covert maneuver planning with adaptive target-aware visibility estimation using offline reinforcement learning,” in2024 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 2024, pp. 51–60

work page 2024
[7]

Threat-aware path planning in uncertain urban environments,

G. S. Aoude, B. D. Luders, D. S. Levine, and J. P. How, “Threat-aware path planning in uncertain urban environments,” in2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 6058–6063

work page 2010
[8]

A trajectory plan- ning and tracking method based on deep hierarchical reinforcement learning,

J. Zhang, B.-L. Ye, X. Wang, L. Li, and B. Song, “A trajectory plan- ning and tracking method based on deep hierarchical reinforcement learning,”Journal of Intelligent and Connected Vehicles, vol. 8, no. 2, pp. 9 210 056–1, 2025

work page 2025
[9]

Trajectory planning for autonomous vehicles using hierarchical reinforcement learning,

K. B. Naveed, Z. Qiao, and J. M. Dolan, “Trajectory planning for autonomous vehicles using hierarchical reinforcement learning,” in 2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), 2021, pp. 601–606

work page 2021
[10]

A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,

Y . Zheng, B. Li, D. An, and N. Li, “A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,” in2015 11th International Conference on Natural Computation (ICNC), 2015, pp. 363–369

work page 2015
[11]

Hierarchical reinforcement learning for autonomous decision making and motion planning of intelligent vehicles,

Y . Lu, X. Xu, X. Zhang, L. Qian, and X. Zhou, “Hierarchical reinforcement learning for autonomous decision making and motion planning of intelligent vehicles,”IEEE Access, vol. 8, pp. 209 776– 209 789, 2020

work page 2020
[12]

Deep transformer q-networks for partially observable reinforcement learning,

K. Esslinger, R. Platt, and C. Amato, “Deep transformer q-networks for partially observable reinforcement learning,”arXiv preprint arXiv:2206.01078, 2022

work page arXiv 2022
[13]

Planning and acting in partially observable stochastic domains,

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial Intelligence, vol. 101, no. 1, pp. 99–134, 1998. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S000437029800023X

work page 1998

[1] [1]

Motion planning in dynamic environments using velocity obstacles,

P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,”The International Journal of Robotics Research, vol. 17, no. 7, pp. 760–772, 1998. [Online]. Available: https://doi.org/10.1177/027836499801700706

work page doi:10.1177/027836499801700706 1998

[2] [2]

Cov- ernav: Cover following navigation planning in unstructured outdoor environment with deep reinforcement learning,

J. Hossain, A.-Z. Faridee, N. Roy, A. Basak, and D. E. Asher, “Cov- ernav: Cover following navigation planning in unstructured outdoor environment with deep reinforcement learning,” in2023 IEEE Inter- national Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 2023, pp. 127–132

work page 2023

[3] [3]

Path planning and path tracking for collision avoidance of autonomous ground vehicles,

H. Wang and B. Liu, “Path planning and path tracking for collision avoidance of autonomous ground vehicles,”IEEE Systems Journal, vol. 16, no. 3, pp. 3658–3667, 2022

work page 2022

[4] [4]

Occlusion- aware path planning for collision avoidance: Leveraging potential field method with responsibility-sensitive safety,

P. Lin, E. Javanmardi, J. Nakazato, and M. Tsukada, “Occlusion- aware path planning for collision avoidance: Leveraging potential field method with responsibility-sensitive safety,” in2023 IEEE 26th In- ternational Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 2561–2567

work page 2023

[5] [5]

Local path planning with dynamic obstacle avoidance in unstructured environments,

O. A. Guvenkaya, S. A. Iz, and M. Unel, “Local path planning with dynamic obstacle avoidance in unstructured environments,” inIECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 2024, pp. 1–6

work page 2024

[6] [6]

Encomp: Enhanced covert maneuver planning with adaptive target-aware visibility estimation using offline reinforcement learning,

J. Hossain, A.-Z. Faridee, N. Roy, D. E. Asher, J. Freeman, T. Gregory, and T. Trout, “Encomp: Enhanced covert maneuver planning with adaptive target-aware visibility estimation using offline reinforcement learning,” in2024 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 2024, pp. 51–60

work page 2024

[7] [7]

Threat-aware path planning in uncertain urban environments,

G. S. Aoude, B. D. Luders, D. S. Levine, and J. P. How, “Threat-aware path planning in uncertain urban environments,” in2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 6058–6063

work page 2010

[8] [8]

A trajectory plan- ning and tracking method based on deep hierarchical reinforcement learning,

J. Zhang, B.-L. Ye, X. Wang, L. Li, and B. Song, “A trajectory plan- ning and tracking method based on deep hierarchical reinforcement learning,”Journal of Intelligent and Connected Vehicles, vol. 8, no. 2, pp. 9 210 056–1, 2025

work page 2025

[9] [9]

Trajectory planning for autonomous vehicles using hierarchical reinforcement learning,

K. B. Naveed, Z. Qiao, and J. M. Dolan, “Trajectory planning for autonomous vehicles using hierarchical reinforcement learning,” in 2021 IEEE International Intelligent Transportation Systems Confer- ence (ITSC), 2021, pp. 601–606

work page 2021

[10] [10]

A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,

Y . Zheng, B. Li, D. An, and N. Li, “A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,” in2015 11th International Conference on Natural Computation (ICNC), 2015, pp. 363–369

work page 2015

[11] [11]

Hierarchical reinforcement learning for autonomous decision making and motion planning of intelligent vehicles,

Y . Lu, X. Xu, X. Zhang, L. Qian, and X. Zhou, “Hierarchical reinforcement learning for autonomous decision making and motion planning of intelligent vehicles,”IEEE Access, vol. 8, pp. 209 776– 209 789, 2020

work page 2020

[12] [12]

Deep transformer q-networks for partially observable reinforcement learning,

K. Esslinger, R. Platt, and C. Amato, “Deep transformer q-networks for partially observable reinforcement learning,”arXiv preprint arXiv:2206.01078, 2022

work page arXiv 2022

[13] [13]

Planning and acting in partially observable stochastic domains,

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial Intelligence, vol. 101, no. 1, pp. 99–134, 1998. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S000437029800023X

work page 1998