Pith Number

pith:LJHUW6AY

pith:2023:LJHUW6AYODUNYUY3UBFIV6JLS7

not attested not anchored not stored refs resolved

An Embodied Generalist Agent in 3D World

Baoxiong Jia, Jiangyong Huang, Puhao Li, Qing Li, Silong Yong, Siyuan Huang, Song-Chun Zhu, Xiaojian Ma, Xiongkun Linghu, Yan Wang

LEO trains as a 3D embodied generalist agent through two-stage alignment on large vision-language and vision-language-action datasets.

arxiv:2311.12871 v3 · 2023-11-18 · cs.CV · cs.AI · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through extensive experiments, we demonstrate LEO's remarkable proficiency across a wide spectrum of tasks, including 3D captioning, question answering, embodied reasoning, navigation and manipulation.

C2weakest assumption

The central claim assumes that the collected large-scale 3D VL and VLA datasets plus the two-stage training procedure are sufficient to produce generalist performance that transfers beyond the specific benchmarks shown.

C3one line summary

LEO is an embodied generalist agent that performs 3D captioning, question answering, reasoning, navigation, and manipulation after 3D vision-language alignment followed by vision-language-action instruction tuning on large-scale object- and scene-level datasets.

References

27 extracted · 27 resolved · 10 Pith anchors

[1] A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity 2022 · arXiv:2302.04023

[2] RT-1: Robotics Transformer for Real-World Control at Scale 2022 · arXiv:2212.06817

[3] Scaling Instruction-Finetuned Language Models 2022 · arXiv:2210.11416

[4] LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model 2022 · arXiv:2304.15010

[5] Scaling Laws for Neural Language Models 2001 · arXiv:2001.08361

Formal links

2 machine-checked theorem links

Cited by

20 papers in Pith

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception

POMA-3D: The Point Map Way to 3D Scene Understanding

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

Receipt and verification

First computed	2026-05-17T23:38:13.838991Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

5a4f4b781870e8dc531ba04a8af92b97ce9163b929a16e03aaf471e3739edb0f

Aliases

arxiv: 2311.12871 · arxiv_version: 2311.12871v3 · doi: 10.48550/arxiv.2311.12871 · pith_short_12: LJHUW6AYODUN · pith_short_16: LJHUW6AYODUNYUY3 · pith_short_8: LJHUW6AY

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LJHUW6AYODUNYUY3UBFIV6JLS7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5a4f4b781870e8dc531ba04a8af92b97ce9163b929a16e03aaf471e3739edb0f

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4f47ee0a5b4dedd7b27c6fe1061559319ed77d33695b15255171ac542dd21944",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-11-18T01:21:38Z",
    "title_canon_sha256": "d3f6a0a01ad1f88f36aa4d2acac57a9346d7c35c4738df676395e28791d80bc8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.12871",
    "kind": "arxiv",
    "version": 3
  }
}