pith:LJHUW6AY
An Embodied Generalist Agent in 3D World
LEO trains as a 3D embodied generalist agent through two-stage alignment on large vision-language and vision-language-action datasets.
arxiv:2311.12871 v3 · 2023-11-18 · cs.CV · cs.AI · cs.CL · cs.LG
Record completeness
Claims
Through extensive experiments, we demonstrate LEO's remarkable proficiency across a wide spectrum of tasks, including 3D captioning, question answering, embodied reasoning, navigation and manipulation.
The central claim assumes that the collected large-scale 3D VL and VLA datasets plus the two-stage training procedure are sufficient to produce generalist performance that transfers beyond the specific benchmarks shown.
LEO is an embodied generalist agent that performs 3D captioning, question answering, reasoning, navigation, and manipulation after 3D vision-language alignment followed by vision-language-action instruction tuning on large-scale object- and scene-level datasets.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.838991Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
5a4f4b781870e8dc531ba04a8af92b97ce9163b929a16e03aaf471e3739edb0f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LJHUW6AYODUNYUY3UBFIV6JLS7 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5a4f4b781870e8dc531ba04a8af92b97ce9163b929a16e03aaf471e3739edb0f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4f47ee0a5b4dedd7b27c6fe1061559319ed77d33695b15255171ac542dd21944",
"cross_cats_sorted": [
"cs.AI",
"cs.CL",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2023-11-18T01:21:38Z",
"title_canon_sha256": "d3f6a0a01ad1f88f36aa4d2acac57a9346d7c35c4738df676395e28791d80bc8"
},
"schema_version": "1.0",
"source": {
"id": "2311.12871",
"kind": "arxiv",
"version": 3
}
}