MirrorBench reveals that leading MLLMs perform far below humans on tasks requiring self-referential perception and representation, even at the simplest level.
Alfred: A benchmark for interpret- ing grounded instructions for everyday tasks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
AnyUser translates free-form sketches on images plus optional language into executable robot actions for domestic tasks using multimodal fusion and a hierarchical policy.
citing papers explorer
-
MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror
MirrorBench reveals that leading MLLMs perform far below humans on tasks requiring self-referential perception and representation, even at the simplest level.
-
AnyUser: Translating Sketched User Intent into Domestic Robots
AnyUser translates free-form sketches on images plus optional language into executable robot actions for domestic tasks using multimodal fusion and a hierarchical policy.