SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.
Embodied agent interface: Benchmarking LLMs for embodied decision making.arXiv preprint arXiv:2410.07166, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2verdicts
UNVERDICTED 2representative citing papers
PDDLego iteratively formalizes and refines PDDL representations of partially observable environments to improve planning success without finetuning or in-context examples.
citing papers explorer
-
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.
-
Iterative Formalization and Planning in Partially Observable Environments
PDDLego iteratively formalizes and refines PDDL representations of partially observable environments to improve planning success without finetuning or in-context examples.