pith. sign in
Pith Number

pith:HHQVCX47

pith:2025:HHQVCX474M63G67R7LUOLOWFX2
not attested not anchored not stored refs resolved

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Cheng Qian, Hanyang Chen, Heng Ji, Huan Zhang, Junyu Zhang, Kangrui Wang, Manling Li, Mark Zhao, Marziyeh Movahedi, Qineng Wang, Rui Yang, Teja Venkat Koripella, Tong Zhang

MLLMs excel at high-level embodied tasks but score only 28.9 percent on low-level manipulation.

arxiv:2502.09560 v3 · 2025-02-13 · cs.AI · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HHQVCX474M63G67R7LUOLOWFX2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MLLMs excel at high-level tasks but struggle with low-level manipulation, with the best model, GPT-4o, scoring only 28.9% on average.

C2weakest assumption

That performance in the four chosen simulated environments and the six curated capability subsets accurately reflects real-world embodied agent challenges.

C3one line summary

EmbodiedBench is a new evaluation framework for MLLM-based embodied agents that shows strong high-level reasoning but weak low-level manipulation performance across 24 tested models.

References

22 extracted · 22 resolved · 0 Pith anchors

[1] Put washed lettuce in the refrigerator 2015 · doi:10.24963/ijcai.2024/15
[3] **Visibility**: Always locate a visible object by the ’find’ action before interacting with it
[4] Avoid performing actions that do not meet the defined validity criteria
[6] You can explore these instances if you do not find the desired object in the current receptacle
[7] If the last action is invalid, reflect on the reason, such as not adhering to action rules or missing preliminary actions, and adjust your plan accordingly

Formal links

2 machine-checked theorem links

Cited by

29 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.118759Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

39e1515f9fe33db37bf1fae8e5bac5be8bd65ad66c2761748c703d1323a40c9e

Aliases

arxiv: 2502.09560 · arxiv_version: 2502.09560v3 · doi: 10.48550/arxiv.2502.09560 · pith_short_12: HHQVCX474M63 · pith_short_16: HHQVCX474M63G67R · pith_short_8: HHQVCX47
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HHQVCX474M63G67R7LUOLOWFX2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 39e1515f9fe33db37bf1fae8e5bac5be8bd65ad66c2761748c703d1323a40c9e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "12cbdefdd1c0e9a1766ce06a931d3e50de0b2822ea302800d0d69603632a1381",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-02-13T18:11:34Z",
    "title_canon_sha256": "f197cab08bcf541567652650012478e17043804ab59a7cb2839f5e4a50c9323a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.09560",
    "kind": "arxiv",
    "version": 3
  }
}