Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

Anderson, P · 2018 · arXiv 2018.00387

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.

Lost in Aggregation: A Multi-Scale Diagnostic Benchmark for LLM Spatial Navigation

physics.soc-ph · 2026-06-20 · unverdicted · novelty 7.0

A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.

SCOPE: Real-Time Natural Language Camera Agent at the Edge

cs.RO · 2026-06-01 · unverdicted · novelty 5.0

SCOPE introduces an edge-deployable natural-language PTZ camera agent, a simulation benchmark, and evaluations showing that stronger small language models reduce hallucinations while perception remains the main bottleneck.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Lost in Aggregation: A Multi-Scale Diagnostic Benchmark for LLM Spatial Navigation physics.soc-ph · 2026-06-20 · unverdicted · none · ref 1
A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer