Phi-Nav generates path-level hindsight instructions from on-policy exploration trajectories to supply additional semantic supervision for vision-language navigation agents.
arXiv preprint arXiv:2411.11394 (2024)
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
GoViG decomposes goal-conditioned navigation instruction generation into visual state prediction and instruction synthesis using an autoregressive multimodal LLM with one-pass and interleaved reasoning, showing gains on a new R2R-Goal dataset.
citing papers explorer
-
Path-level Hindsight Instructions for Semantic Exploration in Vision-Language Navigation
Phi-Nav generates path-level hindsight instructions from on-policy exploration trajectories to supply additional semantic supervision for vision-language navigation agents.
-
GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
GoViG decomposes goal-conditioned navigation instruction generation into visual state prediction and instruction synthesis using an autoregressive multimodal LLM with one-pass and interleaved reasoning, showing gains on a new R2R-Goal dataset.