pith. machine review for the scientific record. sign in

arxiv: 1901.03035 · v1 · submitted 2019-01-10 · 💻 cs.AI · cs.CL· cs.CV· cs.RO

Recognition: unknown

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

Authors on Pith no claims yet
classification 💻 cs.AI cs.CLcs.CVcs.RO
keywords instructionagentnavigationprogressnextself-monitoringcompletedcomponents
0
0 comments X
read the original abstract

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. We test our self-monitoring agent on a standard benchmark and analyze our proposed approach through a series of ablation studies that elucidate the contributions of the primary components. Using our proposed method, we set the new state of the art by a significant margin (8% absolute increase in success rate on the unseen test set). Code is available at https://github.com/chihyaoma/selfmonitoring-agent .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

    cs.CV 2026-04 unverdicted novelty 6.0

    SpaAct activates spatial awareness in VLMs using action retrospection, future frame prediction, and progressive curriculum learning to reach SOTA on VLN-CE benchmarks.