pith. sign in
Pith Number

pith:WFARVYX4

pith:2025:WFARVYX43DK65L5SXE356SZTPU
not attested not anchored not stored refs resolved

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

Ajay Mandlekar, Avnish Narayan, Dieter Fox, Fengyuan Hu, Guanzhi Wang, Jan Kautz, Jiannan Xiang, Jing Wang, Joel Jang, Johan Bjorck, Kaiyuan Zheng, Kaushil Kundalia, Linxi Fan, Loic Magne, Luke Zettlemoyer, Ming-Yu Liu, Qi Wang, Ruijie Zheng, Scott Reed, Seonghyeon Ye, Spencer Huang, Xiaohui Zeng, Yen-Chen Lin, Yinzhen Xu, You Liang Tan, Yu Fang, Yuke Zhu, Zongyu Lin

A simple pipeline adapts video world models to generate synthetic robot trajectories that let humanoid policies generalize to 22 new behaviors and unseen environments from data of a single task.

arxiv:2505.12705 v2 · 2025-05-19 · cs.RO · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WFARVYX43DK65L5SXE356SZTPU}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment.

C2weakest assumption

That the adapted video world models produce sufficiently realistic and embodiment-consistent synthetic videos such that pseudo-actions recovered by the latent action model or IDM yield policies that transfer effectively to the physical robot without large domain gaps.

C3one line summary

DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.

References

79 extracted · 79 resolved · 24 Pith anchors

[1] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control 2023 · arXiv:2307.15818
[2] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control 2024 · arXiv:2410.24164
[3] Gemini Robotics: Bringing AI into the Physical World 2025 · arXiv:2503.20020
[4] AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems 2025 · arXiv:2503.06669
[5] GR00T N1: An Open Foundation Model for Generalist Humanoid Robots 2025 · arXiv:2503.14734

Cited by

31 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:49.710381Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b1411ae2fcd8d5eeafb2b937df4b337d148e5fe34b9c713873d7a41e79330ddc

Aliases

arxiv: 2505.12705 · arxiv_version: 2505.12705v2 · doi: 10.48550/arxiv.2505.12705 · pith_short_12: WFARVYX43DK6 · pith_short_16: WFARVYX43DK65L5S · pith_short_8: WFARVYX4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WFARVYX43DK65L5SXE356SZTPU \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b1411ae2fcd8d5eeafb2b937df4b337d148e5fe34b9c713873d7a41e79330ddc
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "15a1b058cf62eeef9be0ed98732a07113d59ff33a5cc79dbaa3dedc0b1aa0611",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-05-19T04:55:39Z",
    "title_canon_sha256": "5fe7036366dfadbec2fee4d738e16acb05606dedf41aa685db9c3b9c1a71ed51"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.12705",
    "kind": "arxiv",
    "version": 2
  }
}