Sigurdsson, Robinson Piramuthu, Jesse Thomason, and Gaurav S

· 2022 · arXiv 2211.16649

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

DarkQA: Benchmarking Vision-Language Models on Visual-Primitive Question Answering in Low-Light Indoor Scenes

cs.CV · 2025-12-31 · accept · novelty 7.0

DarkQA is a new benchmark that measures vision-language model performance on basic visual questions under controlled low-light degradations modeled from real camera physics.

NORM-Nav: Zero-Shot Mobile Robot Navigation with Natural Language Behavioral Constraints

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

NORM-Nav is a zero-shot framework that parses natural language behavioral constraints with an LLM, grounds them via vision-LiDAR, and encodes them as multi-layer costmaps for grid-based robot navigation.

FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

FineCog-Nav uses fine-grained cognitive modules driven by foundation models to outperform zero-shot baselines in UAV navigation and introduces the AerialVLN-Fine benchmark with refined instructions.

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

cs.RO · 2024-12-09 · unverdicted · novelty 6.0

Uni-NaVid unifies diverse embodied navigation tasks into one video-based vision-language-action model trained on 3.6 million samples from four sub-tasks, achieving state-of-the-art performance on benchmarks and real-world tests.

Agent AI: Surveying the Horizons of Multimodal Interaction

cs.AI · 2024-01-07 · unverdicted · novelty 4.0

The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 5
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 15
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Sigurdsson, Robinson Piramuthu, Jesse Thomason, and Gaurav S

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer