pith. sign in

VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Conventional Vision-and-Language Navigation (VLN) benchmarks assume instructions are feasible and the referenced target exists, leaving agents ill-equipped to handle false-premise goals. We introduce VLN-NF, a benchmark with false-premise instructions where the target is absent from the specified room and agents must navigate, gather evidence through in-room exploration, and explicitly output NOT-FOUND. VLN-NF is constructed via a scalable pipeline that rewrites VLN instructions using an LLM and verifies target absence with a VLM, producing plausible yet factually incorrect goals. We further propose REV-SPL to jointly evaluate room reaching, exploration coverage, and decision correctness. To address this challenge, we present ROAM, a two-stage hybrid that combines supervised room-level navigation with LLM/VLM-driven in-room exploration guided by a free-space clearance prior. ROAM achieves the best REV-SPL among compared methods, while baselines often under-explore and terminate prematurely under unreliable instructions. VLN-NF project page can be found at https://vln-nf.github.io/.

fields

cs.AI 1 cs.RO 1

years

2026 2

representative citing papers

The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents

cs.RO · 2026-05-19 · conditional · novelty 8.0

The paper presents RoboAbstention, a new benchmark showing frontier VLMs and embodied planners abstain on only 16.5-39% of 6,069 instructions grounded in robotics images, with prompting interventions raising rates to 88-93% but not solving the problem.

citing papers explorer

Showing 2 of 2 citing papers.

  • The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents cs.RO · 2026-05-19 · conditional · none · ref 35 · internal anchor

    The paper presents RoboAbstention, a new benchmark showing frontier VLMs and embodied planners abstain on only 16.5-39% of 6,069 instructions grounded in robotics images, with prompting interventions raising rates to 88-93% but not solving the problem.

  • ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints cs.AI · 2026-04-16 · unverdicted · none · ref 21 · internal anchor

    ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.