LongAct benchmark reveals top VLMs reach only 59% goal completion and 16% full success on long-horizon household tasks, while HoloMind agent improves results via DAG planner, multimodal spatial memory, episodic memory, and global critic.
Karen Liu, Jiajun Wu, and Li Fei-Fei
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution
LongAct benchmark reveals top VLMs reach only 59% goal completion and 16% full success on long-horizon household tasks, while HoloMind agent improves results via DAG planner, multimodal spatial memory, episodic memory, and global critic.