pith. sign in
Pith Number

pith:PJU6IWRD

pith:2026:PJU6IWRD6PMWYZPP66ML6CSIRS
not attested not anchored not stored refs resolved

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

Ayaan Malik, Chen-Hsuan Lin, Dantong Niu, George Kurian, Jiannan Xiang, Jinwei Gu, Jitendra Malik, Joel Jang, Jun Zhang, Kaichun Mo, Kaiyuan Zheng, K.R. Zentner, Linxi "Jim" Fan, Loic Magne, Ming-Yu Liu, Pieter Abbeel, Pooya Jannaty, Qianli Ma, Ruijie Zheng, Seonghyeon Ye, Seungjun Nah, Shenyuan Gao, Sihyun Yu, Suneel Indupuru, Wei-Cheng Tseng, William Liang, You Liang Tan, Yuke Zhu, Yuqi Xie, Yuzhu Dong

A world model pretrained on 44k hours of human videos transfers to robots with accurate physics and control after minimal fine-tuning.

arxiv:2602.06949 v1 · 2026-02-06 · cs.RO · cs.AI · cs.CV · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PJU6IWRD6PMWYZPP66ML6CSIRS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

After post-training on small-scale target robot data, DreamDojo demonstrates a strong understanding of physics and precise action controllability on multiple challenging out-of-distribution benchmarks.

C2weakest assumption

Continuous latent actions learned from unlabeled human videos serve as effective proxy actions that transfer interaction knowledge to robot control without introducing domain gaps that degrade physics prediction.

C3one line summary

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

References

131 extracted · 131 resolved · 30 Pith anchors

[1] World Simulation with Video Foundation Models for Physical AI 2025 · arXiv:2511.00062
[2] Diffusion for World Modeling: Visual Details Matter in Atari 2024
[3] V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning 2025 · arXiv:2506.09985
[4] Whole-body conditioned egocentric video prediction 2025
[5] Genie 3: A New Frontier for World Models, 2025 2025

Formal links

2 machine-checked theorem links

Cited by

23 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.189361Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7a69e45a23f3d96c65eff798bf0a488c8a2a7b9eb7b175fa9bd5fb26d25d470c

Aliases

arxiv: 2602.06949 · arxiv_version: 2602.06949v1 · doi: 10.48550/arxiv.2602.06949 · pith_short_12: PJU6IWRD6PMW · pith_short_16: PJU6IWRD6PMWYZPP · pith_short_8: PJU6IWRD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PJU6IWRD6PMWYZPP66ML6CSIRS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7a69e45a23f3d96c65eff798bf0a488c8a2a7b9eb7b175fa9bd5fb26d25d470c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "83090a0b7e213a14395a56990cba1bc5989cb4fdb3ad92e464a98152840b7853",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-02-06T18:49:43Z",
    "title_canon_sha256": "150dec8e0282a7e6db552a244369c04eb2de1c00c818ae3ac60e8cb4564082bd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.06949",
    "kind": "arxiv",
    "version": 1
  }
}