Pith Number

pith:UCFL54T5

pith:2025:UCFL54T5TYCJIGF4EG4OUKZIID

not attested not anchored not stored refs resolved

Vidar: Embodied Video Diffusion Model for Generalist Manipulation

Chendong Xiang, Guodong Liu, Hang Su, Hengkai Tan, Jun Zhu, Shuhe Huang, Xinyi Mao, Yao Feng

A video diffusion model pre-trained on internet-scale data and 750K robot trajectories adapts to new robot embodiments with only 20 minutes of demonstrations.

arxiv:2507.12898 v4 · 2025-07-17 · cs.LG · cs.AI · cs.CV · cs.RO

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{UCFL54T5TYCJIGF4EG4OUKZIID}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

With only 20 minutes of human demonstrations on an unseen robot (1% of typical data), Vidar outperforms state-of-the-art baselines and generalizes to unseen tasks, backgrounds, and camera layouts.

C2weakest assumption

That continuous pre-training of an internet-scale video diffusion model on 750K trajectories from only three robot platforms produces a sufficiently general visual-dynamics prior that can be grounded to arbitrary new embodiments via a lightweight masked inverse dynamics adapter.

C3one line summary

Vidar shows that a video diffusion prior continuously pre-trained on 750K multi-view robot trajectories plus a label-free masked inverse dynamics adapter can generalize manipulation to new robot embodiments with 1% of typical demonstration data.

References

46 extracted · 46 resolved · 20 Pith anchors

[1] Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration 2024

[2] OpenVLA: An Open-Source Vision-Language-Action Model 2024

[3] RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation 2024 · arXiv:2410.07864

[4] Crossformer: Transformer Utilizing Cross-Dimension Depen- dency for Multivariate Time Series Forecasting 2023

[5] $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization 2025 · arXiv:2504.16054

Formal links

1 machine-checked theorem link

Cited by

19 papers in Pith

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

PlayWorld: Learning Robot World Models from Autonomous Play

CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

Receipt and verification

First computed	2026-05-17T23:38:48.275613Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

a08abef27d9e049418bc21b8ea2b2840fd028bb99319803a1c9b926dadbd5add

Aliases

arxiv: 2507.12898 · arxiv_version: 2507.12898v4 · doi: 10.48550/arxiv.2507.12898 · pith_short_12: UCFL54T5TYCJ · pith_short_16: UCFL54T5TYCJIGF4 · pith_short_8: UCFL54T5

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/UCFL54T5TYCJIGF4EG4OUKZIID \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a08abef27d9e049418bc21b8ea2b2840fd028bb99319803a1c9b926dadbd5add

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "b93945cdd00928251a5e5d498e11d02981f54dfc1915fba1b499718dfc9f733a",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV",
      "cs.RO"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-07-17T08:31:55Z",
    "title_canon_sha256": "fa3c128f8b8583963347041a1d34688c91ca0ea5ec73addaaa8e286d2aaa09b0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2507.12898",
    "kind": "arxiv",
    "version": 4
  }
}