Are you looking? ground- ing to multiple modalities in vision-and-language navigation

Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko · 1906 · arXiv 1906.00347

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Environmental Understanding Vision-Language Model for Embodied Agent

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.

citing papers explorer

Showing 1 of 1 citing paper.

Environmental Understanding Vision-Language Model for Embodied Agent cs.CV · 2026-04-21 · unverdicted · none · ref 15
EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.

Are you looking? ground- ing to multiple modalities in vision-and-language navigation

fields

years

verdicts

representative citing papers

citing papers explorer