Recognition: no theorem link
Beyond Isolation: A Unified Benchmark for General-Purpose Navigation
Pith reviewed 2026-05-12 04:30 UTC · model grok-4.3
The pith
Current unified navigation methods struggle with the interleaved, cross-embodiment demands of general-purpose tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OmniNavBench advances evaluation by introducing composite instructions that interleave sub-tasks from PointNav, VLN, ObjectNav, SocialNav, Human Following, and EQA categories, forcing agents to switch between exploration, interaction, and social behaviors. The platform supports multiple robot morphologies through a modular sensor interface across 170 environments mixing synthetic and real scans. Expert trajectories are collected via human teleoperation to capture natural behaviors like exploratory glances and anticipatory avoidance. Evaluations on this setup demonstrate that current methods fail to handle the interleaved nature effectively, underscoring the need for better generalist navigat
What carries the argument
OmniNavBench benchmark, which enables testing of cross-skill coordination via composite instructions from six navigation categories and cross-embodiment generalization across humanoid, quadrupedal, and wheeled robots using human teleoperation data.
Load-bearing premise
The composite instructions interleaving the six navigation categories and the human teleoperation trajectories sufficiently represent the demands and behavioral nuances of real-world general-purpose navigation scenarios.
What would settle it
A concrete test would be whether any existing or new navigation method can achieve high success rates on the composite instruction episodes across multiple robot morphologies in the OmniNavBench environments.
Figures
read the original abstract
The pursuit of general-purpose embodied agents is hindered by fragmented evaluation protocols that isolate navigation skills and fixate on specific robot morphologies, failing to reflect real-world scenarios where agents must orchestrate diverse behaviors across varying embodiments. To bridge this gap, we introduce OmniNavBench, a benchmark for cross-skill coordination and cross-embodiment generalization. OmniNavBench introduces three paradigm shifts: (1) Compositional Complexity. We propose composite instructions that interleave sub-tasks from 6 categories (PointNav, VLN, ObjectNav, SocialNav, Human Following and EQA), compelling agents to transition between exploration, interaction, and social compliance within a single episode. (2) Morphological Universality and Sensor Flexibility. We present a simulation platform that breaks the reliance on single-morphology evaluation, enabling generalization tests across humanoid, quadrupedal, and wheeled robots, with a modular sensor interface and 170 environments blending synthetic assets with real-world scans. (3) Demonstrations Quality. Moving beyond shortest-path algorithms, we curate 1779 expert trajectories via human teleoperation, capturing behavioral nuances such as exploratory glance and anticipatory avoidance. Extensive evaluations demonstrate that current methods, despite their claimed unified design, struggle with the complex, interleaved nature of general-purpose navigation. This exposes a critical disparity between existing capabilities and real-world deployment demands, underscoring OmniNavBench as a testbed for the next generation of generalist navigators. Dataset, code, and leaderboard are available at http://omninavbench.cloud-ip.cc.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OmniNavBench, a unified benchmark for general-purpose navigation that shifts from isolated skill evaluation to composite instructions interleaving six categories (PointNav, VLN, ObjectNav, SocialNav, Human Following, EQA) within single episodes. It supports cross-embodiment testing across humanoid, quadrupedal, and wheeled robots via a modular sensor interface in 170 environments (synthetic + real-world scans) and replaces shortest-path trajectories with 1779 human teleoperated demonstrations that capture exploratory and anticipatory behaviors. Evaluations of existing methods on this benchmark show struggles with interleaved tasks, which the authors interpret as exposing a critical gap between current capabilities and real-world deployment demands. The dataset, code, and leaderboard are released publicly.
Significance. If the benchmark's proxy assumptions hold, OmniNavBench could meaningfully advance embodied AI by providing a more realistic testbed that rewards cross-skill coordination and morphological generalization rather than narrow specialization. The open release of data, code, and leaderboard is a clear strength that supports reproducibility and community follow-up. The emphasis on human trajectories over algorithmic paths adds behavioral nuance that is often missing from simulation benchmarks.
major comments (1)
- [Abstract] Abstract: The claim that results 'expose a critical disparity between existing capabilities and real-world deployment demands' is load-bearing for the paper's broader impact statement yet rests on an unvalidated proxy assumption. All reported evaluations occur inside the simulator (including the human trajectories themselves), with no physical-robot experiments, sim-to-real transfer metrics, or quantification of how composite-task performance degrades outside simulation. This gap directly affects whether the observed struggles can be extrapolated to real-world deployment.
minor comments (2)
- The description of the 170 environments would benefit from explicit details on how real-world scans are integrated, any domain randomization applied, and quantitative measures of visual or geometric fidelity to the source scans.
- Consider adding a dedicated limitations or future-work subsection that explicitly discusses the simulation-only nature of the current results and planned physical-robot validation steps.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the major comment on the abstract claim below, acknowledging the simulation-only nature of the evaluations while defending the benchmark's design as a meaningful proxy.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that results 'expose a critical disparity between existing capabilities and real-world deployment demands' is load-bearing for the paper's broader impact statement yet rests on an unvalidated proxy assumption. All reported evaluations occur inside the simulator (including the human trajectories themselves), with no physical-robot experiments, sim-to-real transfer metrics, or quantification of how composite-task performance degrades outside simulation. This gap directly affects whether the observed struggles can be extrapolated to real-world deployment.
Authors: We agree that the evaluations, including human teleoperated trajectories, are conducted entirely in simulation and that no physical-robot experiments or explicit sim-to-real transfer metrics are provided. This is a limitation of the current work. However, the benchmark incorporates design choices intended to strengthen its relevance as a proxy: the 170 environments blend synthetic assets with real-world scans, the modular sensor interface supports cross-embodiment testing, and the 1779 trajectories were collected via human teleoperation specifically to capture exploratory glances and anticipatory behaviors absent from shortest-path baselines. The composite instructions further require interleaving of skills in ways that reflect real deployment scenarios. The observed struggles of existing methods even under these controlled yet more realistic conditions suggest that the gap to viable real-world performance is substantial. We will revise the abstract to clarify that the results highlight challenges likely to be amplified outside simulation, rather than directly asserting an unvalidated disparity in deployment demands. revision: yes
Circularity Check
No circularity: benchmark and evaluations are independent of inputs
full rationale
The paper defines OmniNavBench via new composite instructions interleaving 6 navigation categories, a modular simulation platform across embodiments, and 1779 human-teleoperated trajectories in 170 environments. Evaluations measure external methods' performance on these tasks. No equations, fitted parameters, or self-referential derivations appear in the provided text. Claims about method struggles follow directly from measured outcomes on the new benchmark rather than reducing to the benchmark definition itself by construction. Self-citations are absent from the abstract and description; the work is self-contained as an empirical testbed without load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Simulation environments with synthetic and real-world scan assets accurately model physics and sensor observations for navigation tasks.
Reference graph
Works this paper leans on
-
[1]
On Evaluation of Embodied Navigation Agents
Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Mano- lis Savva, et al. On evaluation of embodied navigation agents.arXiv preprint arXiv:1807.06757, 2018
work page internal anchor Pith review arXiv 2018
-
[2]
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko S¨underhauf, Ian Reid, Stephen Gould, and Anton Van Den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. InProceedings of the IEEE confer- ence on computer vision and pattern recognition, pages 3674–3683, 2018
work page 2018
-
[3]
Abhijat Biswas, Allan Wang, Gustavo Silvera, Aaron Steinfeld, and Henny Admoni. Socnavbench: A grounded simulation testing framework for evaluating social navi- gation.ACM Transactions on Human-Robot Interaction (THRI), 11(3):1–24, 2022
work page 2022
-
[4]
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments.International Conference on 3D Vision (3DV), 2017
work page 2017
-
[5]
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments.arXiv preprint arXiv:1709.06158, 2017
work page Pith review arXiv 2017
-
[6]
Object goal navigation using goal-oriented semantic exploration
Devendra Singh Chaplot, Dhiraj Prakashchand Gandhi, Abhinav Gupta, and Russ R Salakhutdinov. Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 33: 4247–4258, 2020
work page 2020
-
[7]
Navila: Legged robot vision- language-action model for navigation,
An-Chieh Cheng, Yandong Ji, Zhaojing Yang, Zaitian Gongye, Xueyan Zou, Jan Kautz, Erdem Bıyık, Hongxu Yin, Sifei Liu, and Xiaolong Wang. Navila: Legged robot vision-language-action model for navigation.arXiv preprint arXiv:2412.04453, 2024
-
[8]
Yifei Dong, Fengyi Wu, Qi He, Heng Li, Minghan Li, Zebang Cheng, Yuxuan Zhou, Jingdong Sun, Qi Dai, Zhi-Qi Cheng, et al. Ha-vln: A benchmark for human- aware navigation in discrete-continuous environments with dynamic multi-human interactions, real-world val- idation, and an open leaderboard.arXiv preprint arXiv:2503.14229, 2025
-
[9]
Rh20t: A robotic dataset for learning diverse skills in one-shot
Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot.arXiv preprint arXiv:2307.00595, 2023
-
[10]
Chen Gao, Si Liu, Jinyu Chen, Luting Wang, Qi Wu, Bo Li, and Qi Tian. Room-object entity prompting and reasoning for embodied referring expression.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 46(2):994–1010, 2023
work page 2023
-
[11]
Octonav: Towards generalist embodied navigation.arXiv preprint arXiv:2506.09839, 2025
Chen Gao, Liankai Jin, Xingyu Peng, Jiazhao Zhang, Yue Deng, Annan Li, He Wang, and Si Liu. Octonav: Towards generalist embodied navigation.arXiv preprint arXiv:2506.09839, 2025
-
[12]
Edmund T Hall and Edward T Hall.The hidden dimen- sion, volume 609. Anchor, 1966
work page 1966
-
[13]
Yulong Huang, Yonggang Zhang, Peng Shi, Zhemin Wu, Junhui Qian, and Jonathon A Chambers. Robust kalman filters based on gaussian scale mixture distributions with application to target tracking.IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(10):2082– 2096, 2017
work page 2082
-
[14]
Goat-bench: A benchmark for multi-modal lifelong navigation
Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, and Roozbeh Mottaghi. Goat-bench: A benchmark for multi-modal lifelong navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16373–16383, 2024
work page 2024
-
[15]
Beyond the nav-graph: Vision- and-language navigation in continuous environments
Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, and Stefan Lee. Beyond the nav-graph: Vision- and-language navigation in continuous environments. In European Conference on Computer Vision, pages 104–
-
[16]
Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding
Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, and Jason Baldridge. Room-across-room: Multilingual vision-and-language navigation with dense spatiotempo- ral grounding.arXiv preprint arXiv:2010.07954, 2020
-
[17]
Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, and Naka- masa Inoue. Citynav: Language-goal aerial navigation dataset with geographic information.arXiv preprint arXiv:2406.14240, 2024
-
[18]
Sihao Lin, Zerui Li, Xunyi Zhao, Gengze Zhou, Liuyi Wang, Rong Wei, Rui Tang, Juncheng Li, Hanqing Wang, Jiangmiao Pang, Anton van den Hengel, Jiajun Liu, and Qi Wu. Vlnverse: A benchmark for vision-language nav- igation with versatile, embodied, realistic simulation and evaluation, 2025. URL https://arxiv.org/abs/2512.19021
-
[19]
Openeqa: Embodied question answering in the era of foundation models
Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul Mcvay, Oleksandr Maksymets, Sergio Arnaud, et al. Openeqa: Embodied question answering in the era of foundation models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16488–16498, 2024
work page 2024
-
[20]
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart ´ın-Mart´ın. What matters in learning from offline human demon- strations for robot manipulation.arXiv preprint arXiv:2108.03298, 2021
work page internal anchor Pith review arXiv 2021
- [21]
-
[22]
Reverie: Remote embodied visual referring expression in real indoor environments
Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, and Anton van den Hengel. Reverie: Remote embodied visual referring expression in real indoor environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9982–9991, 2020
work page 2020
-
[23]
Egocognav: Cognition-aware human egocentric navigation.arXiv preprint arXiv:2511.17581, 2025
Zhiwen Qiu, Ziang Liu, Wenqian Niu, Tapomayukh Bhattacharjee, and Saleh Kalantari. Egocognav: Cognition-aware human egocentric navigation.arXiv preprint arXiv:2511.17581, 2025
-
[24]
Tim Schreiter, Tiago Rodrigues de Almeida, Yufei Zhu, Eduardo Gutierrez Maestro, Lucas Morillo-Mendez, An- drey Rudenko, Luigi Palmieri, Tomasz P Kucner, Martin Magnusson, and Achim J Lilienthal. Th ¨or-magni: A large-scale indoor motion capture recording of human movement and robot interaction.The International Jour- nal of Robotics Research, 44(4):568–591, 2025
work page 2025
-
[25]
Towards long-horizon vision- language navigation: Platform, benchmark and method
Xinshuai Song, Weixing Chen, Yang Liu, Weikai Chen, Guanbin Li, and Liang Lin. Towards long-horizon vision- language navigation: Platform, benchmark and method. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12078–12088, 2025
work page 2025
-
[26]
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat.Advances in neural information processing systems, 34:251–266, 2021
work page 2021
-
[27]
Grutopia: Dream general robots in a city at scale, 2024
Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, et al. Grutopia: Dream general robots in a city at scale.arXiv preprint arXiv:2407.10943, 2024
-
[28]
Liuyi Wang, Xinyuan Xia, Hui Zhao, Hanqing Wang, Tai Wang, Yilun Chen, Chengju Liu, Qijun Chen, and Jiangmiao Pang. Rethinking the embodied gap in vision- and-language navigation: A holistic study of physical and visual disparities. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9455–9465, 2025
work page 2025
-
[29]
Trackvla: Embodied visual tracking in the wild.arXiv preprint arXiv:2505.23189, 2025a
Shaoan Wang, Jiazhao Zhang, Minghan Li, Jiahang Liu, Anqi Li, Kui Wu, Fangwei Zhong, Junzhi Yu, Zhizheng Zhang, and He Wang. Trackvla: Embodied visual track- ing in the wild.arXiv preprint arXiv:2505.23189, 2025
-
[30]
Scaling data generation in vision-and-language naviga- tion
Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, and Yu Qiao. Scaling data generation in vision-and-language naviga- tion. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 12009–12020, 2023
work page 2023
-
[31]
Available: https://arxiv.org/abs/2512.08186
Meng Wei, Chenyang Wan, Jiaqi Peng, Xiqian Yu, Yuqiang Yang, Delin Feng, Wenzhe Cai, Chenming Zhu, Tai Wang, Jiangmiao Pang, et al. Ground slow, move fast: A dual-system foundation model for gener- alizable vision-and-language navigation.arXiv preprint arXiv:2512.08186, 2025
-
[32]
Streamvln: Streaming vision-and-language navigation via slowfast context modeling,
Meng Wei, Chenyang Wan, Xiqian Yu, Tai Wang, Yuqiang Yang, Xiaohan Mao, Chenming Zhu, Wenzhe Cai, Hanqing Wang, Yilun Chen, et al. Streamvln: Streaming vision-and-language navigation via slowfast context modeling.arXiv preprint arXiv:2507.05240, 2025
-
[33]
arXiv preprint arXiv:2509.25687 , year=
Xinda Xue, Junjun Hu, Minghua Luo, Xie Shichao, Jin- tao Chen, Zixun Xie, Quan Kuichen, Guo Wei, Mu Xu, and Zedong Chu. Omninav: A unified framework for prospective exploration and visual-language navigation. arXiv preprint arXiv:2509.25687, 2025
-
[34]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Unigoal: Towards universal zero- shot goal-oriented navigation
Hang Yin, Xiuwei Xu, Linqing Zhao, Ziwei Wang, Jie Zhou, and Jiwen Lu. Unigoal: Towards universal zero- shot goal-oriented navigation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19057–19066, 2025
work page 2025
-
[36]
Hm3d-ovon: A dataset and benchmark for open-vocabulary object goal navi- gation
Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, and Sehoon Ha. Hm3d-ovon: A dataset and benchmark for open-vocabulary object goal navi- gation. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5543–5550. IEEE, 2024
work page 2024
-
[37]
Multi-target embodied question answering
Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L Berg, and Dhruv Batra. Multi-target embodied question answering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6309–6318, 2019
work page 2019
-
[38]
Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hen- drix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, and Luca Weihs. Poliformer: Scal- ing on-policy rl with transformers results in masterful navigators.arXiv preprint arXiv:2406.20083, 2024
-
[39]
Uni-navid: A video-based vision-language-action model for unifying embodied navigation tasks,
Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, and He Wang. Uni-navid: A video-based vision- language-action model for unifying embodied navigation tasks.arXiv preprint arXiv:2412.06224, 2024
-
[40]
Embodied navigation foundation model.arXiv preprint arXiv:2509.12129, 2025
Jiazhao Zhang, Anqi Li, Yunpeng Qi, Minghan Li, Jia- hang Liu, Shaoan Wang, Haoran Liu, Gengze Zhou, Yuze Wu, Xingxing Li, et al. Embodied navigation foundation model.arXiv preprint arXiv:2509.12129, 2025
-
[41]
Zhen Zhang, Jiaqing Yan, Xin Kong, Guangyao Zhai, and Yong Liu. Efficient motion planning based on kinodynamic model for quadruped robots following per- sons in confined spaces.IEEE/ASME Transactions on Mechatronics, 26(4):1997–2006, 2021
work page 1997
-
[42]
The surprising effectiveness of visual odometry techniques for embodied pointgoal nav- igation
Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, and Alexander G Schwing. The surprising effectiveness of visual odometry techniques for embodied pointgoal nav- igation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16127–16136, 2021
work page 2021
-
[43]
Towards learning a generalist model for embodied navigation
Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, and Liwei Wang. Towards learning a generalist model for embodied navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13624–13634, 2024
work page 2024
-
[44]
Same: Learning generic language-guided visual navigation with state-adaptive mixture of experts
Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, and Qi Wu. Same: Learning generic language-guided visual navigation with state-adaptive mixture of experts. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7794–7807, 2025
work page 2025
-
[45]
Ziyu Zhu, Xilin Wang, Yixuan Li, Zhuofan Zhang, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Wei Liang, Qian Yu, Zhidong Deng, et al. Move to understand a 3d scene: Bridging visual grounding and exploration for efficient and versatile embodied navigation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8120–8132, 2025. APPENDIX ...
work page 2025
-
[46]
Robot Embodiment Specifications:The specific param- eters for the different robot embodiments used in our “Robot Pool” are detailed in Table VI.Heightrefers to the overall robot height.Cam (Stand)denotes the camera height in the default standing pose, whileCam (Obs.)indicates the camera height during active navigation, which may differ due to postural cha...
-
[47]
Simulation of Dynamic Humans:For Social Navigation and Human Following tasks, dynamic humans are simulated using theomni.anim.peopleextension in Isaac Sim. Human characters follow predefined waypoint sequences spec- ified, which can be chained to form longer trajectories. All characters walk at a fixed speed of about 1.0 m/s. Given the complexity of indoo...
-
[48]
Metrics Calculation:Distance Computation a) Geodesic Distance via NavMesh.:For VLN and Ob- jectNav tasks, we compute geodesic distanced geo(·,·)via Navigation Mesh (NavMesh). Existing ObjectNav evaluations typically measure the Euclidean distance between the robot and the object’s geometric center. However, for large objects (e.g., beds, sofas), the geome...
-
[49]
Matterport3D Scenes:We utilize 85 scenes from the Matterport3D dataset. The complete list of scene identifiers is provided below: 17DRP5sb8fy, 1LXtFkjw3qL, 1pXnuDYAj8r, 29hnd4uzFmX, 2azQ1b91cZZ, 2n8kARJN3HM, 2t7WUuJeko7, 5LpN3gDmAk7, 5q7pvUzZiYa, 759xd9YjKW5, 7y3sRwLe3Va, 8194nk5LbLH, 82sE5b5pLXE, 8WUmhLawc2A, ARNzJeq3xxb, B6ByNegPMKs, D7N2EKCX4Sj, E9uDoF...
-
[50]
GRScenes-Home Scenes:We utilize 61 residential scenes from GRScenes. The scene identifiers are: MV7J6NIKTKJZ2AABAAAAADI8, MV7J6NIKTKJZ2AABAAAAADQ8, MV7J6NIKTKJZ2AABAAAAADY8, MV7J6NIKTKJZ2AABAAAAAEA8, MV7J6NIKTKJZ2AABAAAAAEI8, MVUCSQAKTKJ5EAABAAAAAAA8, MVUCSQAKTKJ5EAABAAAAAAI8, MVUCSQAKTKJ5EAABAAAAAAQ8, MVUCSQAKTKJ5EAABAAAAAAY8, MVUCSQAKTKJ5EAABAAAAABA8, M...
-
[51]
GRScenes-Commercial Scenes:We utilize 24 commer- cial scenes from GRScenes. The scene identifiers are: MV4AFHQKTKJZ2AABAAAAAEA8, MV4AFHQKTKJZ2AABAAAAAEI8, MV5M25QKTKJZ2AABAAAAAAA8, MV5M25QKTKJZ2AABAAAAAAI8, MV5M25QKTKJZ2AABAAAAAAQ8, MV5M25QKTKJZ2AABAAAAAAY8, MV5M25QKTKJZ2AABAAAAAEI8, MV7J6NIKTKJZ2AABAAAAAAA8, MV7J6NIKTKJZ2AABAAAAAAI8, MVJWVGYKTLDAYAABAAAA...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.