VIGIL decouples world-state completion from terminal commitment in embodied agents, exposing up to 19.7 pp gaps in benchmark success despite comparable execution across 20 models.
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1roles
baseline 1polarities
baseline 1representative citing papers
citing papers explorer
-
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents
VIGIL decouples world-state completion from terminal commitment in embodied agents, exposing up to 19.7 pp gaps in benchmark success despite comparable execution across 20 models.