G-DRAGON: Geospatial Reasoning and Dynamic Planning for Retrieval-Augmented Outdoor Navigation

Bing Xiao; Chen Wang; Dongzhihan Wang; Jianan Sun; Liang Xu; Yi Du; Yingchen Zhang; Yuan Xue

arxiv: 2605.25646 · v1 · pith:77HB7VKEnew · submitted 2026-05-25 · 💻 cs.RO

G-DRAGON: Geospatial Reasoning and Dynamic Planning for Retrieval-Augmented Outdoor Navigation

Dongzhihan Wang , Yi Du , Jianan Sun , Yuan Xue , Yingchen Zhang , Bing Xiao , Chen Wang , Liang Xu This is my paper

Pith reviewed 2026-06-29 21:47 UTC · model grok-4.3

classification 💻 cs.RO

keywords outdoor navigationretrieval-augmented generationopenstreetmapunmanned ground vehiclefrontier explorationsemantic mappingSLAMlong-range planning

0 comments

The pith

G-DRAGON maps natural-language commands to local OSM entities via lightweight LLM retrieval to support long-range outdoor robot navigation and last-mile search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a retrieval-augmented framework that turns spoken instructions into precise coordinates by having a small language model generate matches against versioned OpenStreetMap data. These coordinates feed a global route that a planning layer projects into the robot's local frame through its SLAM system. When near the goal the system hands off to frontier exploration paired with open-vocabulary voxel mapping so the robot can locate targets described only in words. Experiments show the approach beats prior methods in simulation and succeeds on a real unmanned ground vehicle completing person-search tasks along routes reaching 500 meters in previously unseen city areas. Readers care because most language-guided navigation remains limited to short distances or depends on remote large models that introduce factual errors over long missions.

Core claim

G-DRAGON maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the last mile, the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets.

What carries the argument

Generative retrieval module using lightweight LLM to map commands to OSM entities and produce coordinates for global planning.

If this is right

The framework outperforms state-of-the-art baselines in simulation.
Real UGV tests succeed on person-search missions with trajectories up to 500 m in unseen urban settings.
The system avoids reliance on cloud-based large LLMs and their associated factual errors.
Global topological planning connects directly to local SLAM while allowing seamless switch to frontier exploration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval pattern could be tested with map sources other than OSM.
Local-only operation may lower communication costs and latency for fleets of robots.
Versioned map handling suggests the method could adapt to slowly changing environments by refreshing entity data.
Extending the open-vocabulary mapping step to additional sensor types could improve target recognition under varied lighting.

Load-bearing premise

The lightweight LLM produces accurate coordinates from language without factual hallucination and OSM entities supply enough geospatial detail for reliable long-range routes.

What would settle it

A real-world trial in which the lightweight LLM outputs incorrect coordinates for a command, resulting in the robot failing to reach the intended global route.

Figures

Figures reproduced from arXiv: 2605.25646 by Bing Xiao, Chen Wang, Dongzhihan Wang, Jianan Sun, Liang Xu, Yi Du, Yingchen Zhang, Yuan Xue.

**Figure 1.** Figure 1: We propose G-DRAGON, a framework that accepts natural language commands and employs a Large Language Model (LLM) to parse instructions into navigation goals and object search targets. Global positioning is resolved by a generative retrieval module that queries an OpenStreetMap (OSM) database to generate an initial route. These global waypoints, combined with multi-modal sensor streams, guide a local planne… view at source ↗

**Figure 2.** Figure 2: The architecture of G-DRAGON. The system processes natural language instructions within a local deployed environment. The GeoQA module retrieves OSM metadata to ground semantic targets from abstract user commands into precise geodetic coordinates. The RAPPER module serves as the reasoning orchestrator, leveraging a locally deployed Qwen2.5 LLM to reason and generate task-and-motion planning. The NæVIS modu… view at source ↗

**Figure 3.** Figure 3: Simulation environment for NæVIS system. based exploration mode. We adapt Yamauchi’s strategy [14] by imposing a hard spatial constraint: candidate frontiers are first extracted from the global occupancy grid and then strictly filtered via a point-in-polygon (PiP) test against the target’s OSM boundary. This ensures the robot navigates greedily to the nearest valid frontier without drifting into irrelevant… view at source ↗

**Figure 4.** Figure 4: Illustration of our real-robot setup and real-world experimental results. (a) We deploy our system on a UGV. (b)–(e) present the tasks designed [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of “Last-Mile” Exploration in Simulation (Top) and Real-world (Bottom). Columns show the 1st-person view, 3rd-person view, voxel [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Autonomous ground robots operating in large-scale outdoor environments require both robust long-range navigation and fine-grained ''last-mile'' exploration. Current advances in visual-language navigation (VLN) work well at short-range tasks, lacking geospatial grounding for long-distance missions. Some OpenStreetMap (OSM)-based methods relying on cloud-based Large Language Models (LLMs) are prone to factual hallucination and cannot conduct ''last-mile'' exploration based on human instruction. To address these challenges, we present G-DRAGON, a retrieval-augmented framework for outdoor, open-world navigation. This framework maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the ''last mile," the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets. Experimental results in simulation demonstrate our framework outperforms state-of-the-art baselines. Furthermore, we validate the system in unseen real-world urban environments on an Unmanned Ground Vehicle (UGV), successfully completing person-search missions with trajectories of up to 500m.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

G-DRAGON shows a workable way to link lightweight LLM retrieval from local OSM to frontier exploration for 500 m outdoor robot runs, but the abstract leaves the retrieval accuracy and baseline comparisons thin.

read the letter

The main takeaway is that this framework gets global routes from a versioned local OSM via a small LLM, then hands off to SLAM-projected waypoints and open-set voxel mapping for the last-mile person search. That split addresses the short-range limit of most VLN work and the hallucination risk of cloud LLMs.

What the paper actually does is combine existing pieces—OSM entity retrieval, topological planning, and frontier exploration—into one pipeline that runs on a UGV. The real-world tests in unseen urban settings, with successful 500 m trajectories, are the concrete evidence that the handoff works at least in those cases. Simulation outperformance over baselines adds some support, and the versioned local map is a straightforward guard against factual errors.

The soft spot is that the abstract gives almost no numbers on retrieval precision, no list of the baselines, and no error bars or failure modes. Without those, it is hard to tell whether the gains come from the new integration or from better tuning on the tested environments. The claim that OSM entities supply enough grounding for long routes also sits on top of the results rather than being measured directly.

This is for people building outdoor navigation stacks who need something that runs without constant cloud calls. A reader already working on VLN or exploration would see a usable recipe and the real-robot numbers.

It deserves peer review because the real-world validation is there and the approach is falsifiable; the details just need to be filled in for a full assessment.

Referee Report

2 major / 2 minor

Summary. The paper introduces G-DRAGON, a retrieval-augmented framework for long-range outdoor navigation on ground robots. It maps natural-language instructions to versioned local OpenStreetMap (OSM) entities via generative retrieval with a lightweight LLM to obtain accurate coordinates for global topological route planning. A high-level planner projects these waypoints into the robot's local frame for integration with SLAM; for last-mile tasks the system switches to frontier-based exploration combined with open-set semantic voxel mapping. The central claims are that the framework outperforms state-of-the-art baselines in simulation and that it successfully completes person-search missions on an unseen real-world UGV with trajectories up to 500 m.

Significance. If the quantitative results and real-world validation hold, the work would offer a concrete route toward scalable geospatial grounding for open-world robot navigation, addressing hallucination risks in cloud LLM + OSM pipelines while retaining last-mile capability. The versioned local-OSM mechanism and the explicit global-to-local bridging are potentially reusable ideas for the VLN and outdoor robotics communities.

major comments (2)

[Experimental Results] Experimental Results section: the abstract states outperformance over SOTA baselines and reports 500 m real-world trajectories, yet no quantitative metrics, error bars, baseline names, or statistical tests are supplied. This information is load-bearing for the central claim of superiority and must be provided with tables or figures before the result can be evaluated.
[Methods] Methods (generative retrieval and versioned OSM): the claim that lightweight-LLM generative retrieval yields accurate OSM coordinates without factual hallucination rests on the versioned local-OSM mechanism, but the manuscript supplies no ablation, retrieval accuracy numbers, or failure-case analysis. This assumption is load-bearing for both the simulation and real-world claims.

minor comments (2)

[Abstract] The abstract uses the phrase "outperforms state-of-the-art baselines" without naming the baselines; this should be expanded even in the abstract for clarity.
[High-level Planning] Notation for the high-level planning module (projection of geospatial waypoints into the navigable frame) is introduced without an equation or diagram reference; a small schematic would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the current version of the manuscript requires additional quantitative detail and analysis to fully support the central claims, and we will revise accordingly.

read point-by-point responses

Referee: [Experimental Results] Experimental Results section: the abstract states outperformance over SOTA baselines and reports 500 m real-world trajectories, yet no quantitative metrics, error bars, baseline names, or statistical tests are supplied. This information is load-bearing for the central claim of superiority and must be provided with tables or figures before the result can be evaluated.

Authors: We agree that the Experimental Results section must supply the requested quantitative support. The revised manuscript will add tables and figures reporting all performance metrics (with error bars), explicit baseline names, and statistical tests for both simulation and real-world experiments. revision: yes
Referee: [Methods] Methods (generative retrieval and versioned OSM): the claim that lightweight-LLM generative retrieval yields accurate OSM coordinates without factual hallucination rests on the versioned local-OSM mechanism, but the manuscript supplies no ablation, retrieval accuracy numbers, or failure-case analysis. This assumption is load-bearing for both the simulation and real-world claims.

Authors: We acknowledge that the current manuscript lacks explicit ablations, retrieval accuracy metrics, and failure-case analysis for the generative retrieval and versioned OSM components. The revision will incorporate these elements, including quantitative retrieval accuracy figures and representative failure cases, to substantiate the hallucination-mitigation claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an engineering framework for retrieval-augmented navigation that maps language commands to OSM entities via lightweight LLM generative retrieval, then bridges global routes to local frontier exploration. No equations, fitted parameters, or derivation steps appear in the provided text. All performance claims rest on external experimental outcomes (simulation outperformance and real-world 500 m UGV trajectories) rather than any quantity defined in terms of itself or justified solely by self-citation. The central mechanisms are presented as design choices with stated mitigations for hallucination, not as predictions forced by prior fits or uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities described.

pith-pipeline@v0.9.1-grok · 5769 in / 1082 out tokens · 32453 ms · 2026-06-29T21:47:26.792002+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 16 canonical work pages · 3 internal anchors

[1]

Nomad: Goal masked diffusion policies for navigation and exploration,

A. Sridhar, D. Shah, C. Glossop, and S. Levine, “Nomad: Goal masked diffusion policies for navigation and exploration,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 63–70

2024
[2]

Vint: A foundation model for visual navigation,

D. Shah, A. Sridhar, N. Dashora, K. Stachowicz, K. Black, N. Hirose, and S. Levine, “Vint: A foundation model for visual navigation,” in Conference on Robot Learning, 2023, pp. 711–733

2023
[3]

Embodied navigation foundation model,

J. Zhang, A. Li, Y . Qi, M. Li, J. Liu, S. Wang, H. Liu, G. Zhou, Y . Wu, X. Li,et al., “Embodied navigation foundation model,”arXiv preprint arXiv:2509.12129, 2025

work page arXiv 2025
[4]

Navila: Legged robot vision-language- action model for navigation,

A.-C. Cheng, Y . Ji, Z. Yang, Z. Gongye, X. Zou, J. Kautz, E. Bıyık, H. Yin, S. Liu, and X. Wang, “Navila: Legged robot vision-language- action model for navigation,”arXiv preprint arXiv:2412.04453, 2024

work page arXiv 2024
[5]

Trackvla++: Unleashing reasoning and memory capabilities in vla models for embodied visual tracking,

J. Liu, Y . Qi, J. Zhang, M. Li, S. Wang, K. Wu, H. Ye, H. Zhang, Z. Chen, F. Zhong,et al., “Trackvla++: Unleashing reasoning and memory capabilities in vla models for embodied visual tracking,” arXiv preprint arXiv:2510.07134, 2025

work page arXiv 2025
[6]

Openbench: A new benchmark and baseline for semantic navigation in smart logistics,

J. Wang, D. Huo, Z. Xu, Y . Shi, Y . Yan, Y . Wang, C. Gao, Y . Qiao, and G. Zhou, “Openbench: A new benchmark and baseline for semantic navigation in smart logistics,”arXiv preprint arXiv:2502.09238, 2025

work page arXiv 2025
[7]

Transformer memory as a differentiable search index,

Y . Tay, V . Q. Tran, M. Dehghani, J. Ni, D. Bahri, H. Mehta, Z. Qin, K. Hui, Z. Zhao, J. Gupta, T. Schuster, W. W. Cohen, and D. Metzler, “Transformer memory as a differentiable search index,”
[8]

Available: https://arxiv.org/abs/2202.06991

[Online]. Available: https://arxiv.org/abs/2202.06991

work page arXiv
[9]

Autoregressive entity retrieval,

N. D. Cao, G. Izacard, S. Riedel, and F. Petroni, “Autoregressive entity retrieval,” 2021. [Online]. Available: https://arxiv.org/abs/2010.00904

work page arXiv 2021
[10]

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,

D. Shah, B. Osinski, B. Ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” 2022. [Online]. Available: https://arxiv.org/abs/2207.04429

work page arXiv 2022
[11]

Viking: Vision-based kilometer-scale naviga- tion with geographic hints,

D. Shah and S. Levine, “Viking: Vision-based kilometer-scale naviga- tion with geographic hints,” inRobotics: Science and Systems XVIII, ser. RSS2022, 2022

2022
[12]

Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning,

K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suen- derhauf, “Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning,” inProceedings of The 7th Conference on Robot Learning, vol. 229, 2023, pp. 23–72

2023
[13]

Geonav: Empowering mllms with explicit geospatial reasoning abilities for language-goal aerial navigation,

H. Xu, Y . Hu, C. Gao, Z. Zhu, Y . Zhao, Y . Li, and Q. Yin, “Geonav: Empowering mllms with explicit geospatial reasoning abilities for language-goal aerial navigation,”arXiv preprint arXiv:2504.09587, 2025

work page arXiv 2025
[14]

Embodied-RAG: General non-parametric embodied memory for retrieval and generation,

Q. Xie, S. Y . Min, P. Ji, Y . Yang, T. Zhang, K. Xu, A. Bajaj, R. Salakhutdinov, M. Johnson-Roberson, and Y . Bisk, “Embodied- rag: General non-parametric embodied memory for retrieval and generation,”arXiv preprint arXiv:2409.18313, 2024

work page arXiv 2024
[15]

A frontier-based approach for autonomous exploration,

B. Yamauchi, “A frontier-based approach for autonomous exploration,” inProceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1997, pp. 146–151

1997
[16]

Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation,

H. Yin, X. Xu, Z. Wu, J. Zhou, and J. Lu, “Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation,”arXiv preprint arXiv:2410.08189, 2024

work page arXiv 2024
[17]

Apex- nav: An adaptive exploration strategy for zero-shot object navigation with target-centric semantic fusion,

M. Zhang, Y . Du, C. Wu, J. Zhou, Z. Qi, J. Ma, and B. Zhou, “Apex- nav: An adaptive exploration strategy for zero-shot object navigation with target-centric semantic fusion,” 2025

2025
[18]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

J. Wei, X. Wang,et al., “Chain-of-thought prompting elicits reasoning in large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 9895–9901

2021
[20]

Btgenbot: Behavior tree generation for robotic tasks with lightweight llms,

R. A. Izzo, G. Bardaro, and M. Matteucci, “Btgenbot: Behavior tree generation for robotic tasks with lightweight llms,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

2024
[21]

Real-time routing with openstreetmap data,

D. Luxen and C. Vetter, “Real-time routing with openstreetmap data,” inProceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011, pp. 513–516

2011
[22]

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

I. Singh, V . Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating situated robot task plans using large language models,” 2022. [Online]. Available: https://arxiv.org/abs/2209.11302

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments,

S. Zhao, H. Zhang, P. Wang, L. Nogueira, and S. Scherer, “Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments,” in2021 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), 2021, pp. 8729–8736

2021
[24]

Fast-lio2: Fast direct lidar-inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” 2021. [Online]. Available: https://arxiv.org/abs/2107.06829

work page arXiv 2021
[26]

Far planner: Fast, attemptable route planner using dynamic visibility update,

F. Yang, C. Cao, H. Zhu, J. Oh, and J. Zhang, “Far planner: Fast, attemptable route planner using dynamic visibility update,” 2022. [Online]. Available: https://arxiv.org/abs/2504.06994

work page arXiv 2022
[27]

Representation granularity enables time-efficient autonomous exploration in large, complex worlds,

C. Cao, H. Zhu, Z. Ren, H. Choset, and J. Zhang, “Representation granularity enables time-efficient autonomous exploration in large, complex worlds,”Science Robotics, vol. 8, p. eadf0970, 2023

2023
[28]

Raven: Resilient aerial navigation via open-set semantic memory and behavior adaptation,

S. Kim, O. Alama, D. Kurdydyk, J. Keller, N. Keetha, W. Wang, Y . Bisk, and S. Scherer, “Raven: Resilient aerial navigation via open-set semantic memory and behavior adaptation,”arXiv preprint arXiv:2509.23563, 2025

work page arXiv 2025
[29]

On Evaluation of Embodied Navigation Agents

P. Anderson, A. Chang,et al., “On evaluation of embodied navigation agents,” 2018. [Online]. Available: https://arxiv.org/abs/1807.06757

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

The probabilistic relevance frame- work: Bm25 and beyond,

S. Robertson and H. Zaragoza, “The probabilistic relevance frame- work: Bm25 and beyond,”Foundations and Trends® in Information Retrieval, vol. 3, pp. 333–389, 2009

2009
[31]

Dense passage retrieval for open-domain question answering,

V . Karpukhin, B. O ˘guz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W. tau Yih, “Dense passage retrieval for open-domain question answering,” 2020

2020

[1] [1]

Nomad: Goal masked diffusion policies for navigation and exploration,

A. Sridhar, D. Shah, C. Glossop, and S. Levine, “Nomad: Goal masked diffusion policies for navigation and exploration,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 63–70

2024

[2] [2]

Vint: A foundation model for visual navigation,

D. Shah, A. Sridhar, N. Dashora, K. Stachowicz, K. Black, N. Hirose, and S. Levine, “Vint: A foundation model for visual navigation,” in Conference on Robot Learning, 2023, pp. 711–733

2023

[3] [3]

Embodied navigation foundation model,

J. Zhang, A. Li, Y . Qi, M. Li, J. Liu, S. Wang, H. Liu, G. Zhou, Y . Wu, X. Li,et al., “Embodied navigation foundation model,”arXiv preprint arXiv:2509.12129, 2025

work page arXiv 2025

[4] [4]

Navila: Legged robot vision-language- action model for navigation,

A.-C. Cheng, Y . Ji, Z. Yang, Z. Gongye, X. Zou, J. Kautz, E. Bıyık, H. Yin, S. Liu, and X. Wang, “Navila: Legged robot vision-language- action model for navigation,”arXiv preprint arXiv:2412.04453, 2024

work page arXiv 2024

[5] [5]

Trackvla++: Unleashing reasoning and memory capabilities in vla models for embodied visual tracking,

J. Liu, Y . Qi, J. Zhang, M. Li, S. Wang, K. Wu, H. Ye, H. Zhang, Z. Chen, F. Zhong,et al., “Trackvla++: Unleashing reasoning and memory capabilities in vla models for embodied visual tracking,” arXiv preprint arXiv:2510.07134, 2025

work page arXiv 2025

[6] [6]

Openbench: A new benchmark and baseline for semantic navigation in smart logistics,

J. Wang, D. Huo, Z. Xu, Y . Shi, Y . Yan, Y . Wang, C. Gao, Y . Qiao, and G. Zhou, “Openbench: A new benchmark and baseline for semantic navigation in smart logistics,”arXiv preprint arXiv:2502.09238, 2025

work page arXiv 2025

[7] [7]

Transformer memory as a differentiable search index,

Y . Tay, V . Q. Tran, M. Dehghani, J. Ni, D. Bahri, H. Mehta, Z. Qin, K. Hui, Z. Zhao, J. Gupta, T. Schuster, W. W. Cohen, and D. Metzler, “Transformer memory as a differentiable search index,”

[8] [8]

Available: https://arxiv.org/abs/2202.06991

[Online]. Available: https://arxiv.org/abs/2202.06991

work page arXiv

[9] [9]

Autoregressive entity retrieval,

N. D. Cao, G. Izacard, S. Riedel, and F. Petroni, “Autoregressive entity retrieval,” 2021. [Online]. Available: https://arxiv.org/abs/2010.00904

work page arXiv 2021

[10] [10]

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,

D. Shah, B. Osinski, B. Ichter, and S. Levine, “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action,” 2022. [Online]. Available: https://arxiv.org/abs/2207.04429

work page arXiv 2022

[11] [11]

Viking: Vision-based kilometer-scale naviga- tion with geographic hints,

D. Shah and S. Levine, “Viking: Vision-based kilometer-scale naviga- tion with geographic hints,” inRobotics: Science and Systems XVIII, ser. RSS2022, 2022

2022

[12] [12]

Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning,

K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suen- derhauf, “Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning,” inProceedings of The 7th Conference on Robot Learning, vol. 229, 2023, pp. 23–72

2023

[13] [13]

Geonav: Empowering mllms with explicit geospatial reasoning abilities for language-goal aerial navigation,

H. Xu, Y . Hu, C. Gao, Z. Zhu, Y . Zhao, Y . Li, and Q. Yin, “Geonav: Empowering mllms with explicit geospatial reasoning abilities for language-goal aerial navigation,”arXiv preprint arXiv:2504.09587, 2025

work page arXiv 2025

[14] [14]

Embodied-RAG: General non-parametric embodied memory for retrieval and generation,

Q. Xie, S. Y . Min, P. Ji, Y . Yang, T. Zhang, K. Xu, A. Bajaj, R. Salakhutdinov, M. Johnson-Roberson, and Y . Bisk, “Embodied- rag: General non-parametric embodied memory for retrieval and generation,”arXiv preprint arXiv:2409.18313, 2024

work page arXiv 2024

[15] [15]

A frontier-based approach for autonomous exploration,

B. Yamauchi, “A frontier-based approach for autonomous exploration,” inProceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1997, pp. 146–151

1997

[16] [16]

Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation,

H. Yin, X. Xu, Z. Wu, J. Zhou, and J. Lu, “Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation,”arXiv preprint arXiv:2410.08189, 2024

work page arXiv 2024

[17] [17]

Apex- nav: An adaptive exploration strategy for zero-shot object navigation with target-centric semantic fusion,

M. Zhang, Y . Du, C. Wu, J. Zhou, Z. Qi, J. Ma, and B. Zhou, “Apex- nav: An adaptive exploration strategy for zero-shot object navigation with target-centric semantic fusion,” 2025

2025

[18] [18]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

J. Wei, X. Wang,et al., “Chain-of-thought prompting elicits reasoning in large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing in- crementally for constrained auto-regressive decoding from language models,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 9895–9901

2021

[20] [20]

Btgenbot: Behavior tree generation for robotic tasks with lightweight llms,

R. A. Izzo, G. Bardaro, and M. Matteucci, “Btgenbot: Behavior tree generation for robotic tasks with lightweight llms,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

2024

[21] [21]

Real-time routing with openstreetmap data,

D. Luxen and C. Vetter, “Real-time routing with openstreetmap data,” inProceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011, pp. 513–516

2011

[22] [22]

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

I. Singh, V . Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating situated robot task plans using large language models,” 2022. [Online]. Available: https://arxiv.org/abs/2209.11302

work page internal anchor Pith review Pith/arXiv arXiv 2022

[23] [23]

Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments,

S. Zhao, H. Zhang, P. Wang, L. Nogueira, and S. Scherer, “Super odometry: Imu-centric lidar-visual-inertial estimator for challenging environments,” in2021 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), 2021, pp. 8729–8736

2021

[24] [24]

Fast-lio2: Fast direct lidar-inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” 2021. [Online]. Available: https://arxiv.org/abs/2107.06829

work page arXiv 2021

[25] [26]

Far planner: Fast, attemptable route planner using dynamic visibility update,

F. Yang, C. Cao, H. Zhu, J. Oh, and J. Zhang, “Far planner: Fast, attemptable route planner using dynamic visibility update,” 2022. [Online]. Available: https://arxiv.org/abs/2504.06994

work page arXiv 2022

[26] [27]

Representation granularity enables time-efficient autonomous exploration in large, complex worlds,

C. Cao, H. Zhu, Z. Ren, H. Choset, and J. Zhang, “Representation granularity enables time-efficient autonomous exploration in large, complex worlds,”Science Robotics, vol. 8, p. eadf0970, 2023

2023

[27] [28]

Raven: Resilient aerial navigation via open-set semantic memory and behavior adaptation,

S. Kim, O. Alama, D. Kurdydyk, J. Keller, N. Keetha, W. Wang, Y . Bisk, and S. Scherer, “Raven: Resilient aerial navigation via open-set semantic memory and behavior adaptation,”arXiv preprint arXiv:2509.23563, 2025

work page arXiv 2025

[28] [29]

On Evaluation of Embodied Navigation Agents

P. Anderson, A. Chang,et al., “On evaluation of embodied navigation agents,” 2018. [Online]. Available: https://arxiv.org/abs/1807.06757

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [30]

The probabilistic relevance frame- work: Bm25 and beyond,

S. Robertson and H. Zaragoza, “The probabilistic relevance frame- work: Bm25 and beyond,”Foundations and Trends® in Information Retrieval, vol. 3, pp. 333–389, 2009

2009

[30] [31]

Dense passage retrieval for open-domain question answering,

V . Karpukhin, B. O ˘guz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W. tau Yih, “Dense passage retrieval for open-domain question answering,” 2020

2020