AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning

Lin Zhao; Rui Huang; Shenao Wang; Yichao Gao; Yuchen Hou; Zichen Yan

arxiv: 2601.15614 · v2 · pith:6IKXV6NWnew · submitted 2026-01-22 · 💻 cs.RO

AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning

Zichen Yan , Yuchen Hou , Shenao Wang , Yichao Gao , Rui Huang , Lin Zhao This is my paper

classification 💻 cs.RO

keywords aionaerialnavigationobjectnavdual-policyefficiencyexplorationhttps

0 comments

read the original abstract

Object-Goal Navigation (ObjectNav) requires an agent to autonomously explore an unknown environment and navigate toward target objects specified by a semantic label. While prior work has primarily studied zero-shot ObjectNav under 2D locomotion, extending it to aerial platforms with 3D locomotion capability remains underexplored. Aerial robots offer superior maneuverability and search efficiency, but they also introduce new challenges in spatial perception, dynamic control, and safety assurance. In this paper, we propose AION for vision-based aerial ObjectNav without relying on external localization or global maps. AION is an end-to-end dual-policy reinforcement learning (RL) framework that decouples exploration and goal-reaching behaviors into two specialized policies. We evaluate AION on the AI2-THOR benchmark and further assess its real-time performance in IsaacSim using high-fidelity drone models. Experimental results show that AION achieves superior performance across comprehensive evaluation metrics in exploration, navigation efficiency, and safety. The video can be found at \url{https://youtu.be/TgsUm6bb7zg}, code and model checkpoints are available at \url{https://github.com/Zichen-Yan/AION}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models
cs.RO 2026-04 unverdicted novelty 4.0

This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.