pith. sign in
Pith Number

pith:B3MFR4N7

pith:2024:B3MFR4N7YWZSXPIWNIYG62VWMM
not attested not anchored not stored refs resolved

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

Abhinav Gupta, Carl Doersch, Debidatta Dwibedi, Dhruv Shah, Dorsa Sadigh, Fei Xia, Homanga Bharadhwaj, Sean Kirmani, Shubham Tulsiani, Ted Xiao

Generating human videos from web data lets a single robot policy manipulate unseen objects and novel motions without fine-tuning.

arxiv:2409.16283 v1 · 2024-09-24 · cs.RO · cs.CV · cs.LG · eess.IV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B3MFR4N7YWZSXPIWNIYG62VWMM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our results on diverse real-world scenarios show how Gen2Act enables manipulating unseen object types and performing novel motions for tasks not present in the robot data.

C2weakest assumption

That videos generated by a pre-trained model from web data provide sufficiently accurate and transferable motion information for a robot policy to execute novel tasks without any fine-tuning of the video model or additional domain adaptation.

C3one line summary

Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.

References

61 extracted · 61 resolved · 12 Pith anchors

[1] RT-1: Robotics Transformer for Real-World Control at Scale 2022 · arXiv:2212.06817
[2] Roboagent: Generalization and efficiency in robot manipulation via semantic augmen- tations and action chunking, 2024
[4] DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset 2024 · arXiv:2403.12945
[5] R3M: A Universal Visual Representation for Robot Manipulation 2022 · arXiv:2203.12601
[6] Where are we in the search for an artificial vi- sual cortex for embodied intelligence? 2023

Cited by

33 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.572500Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0ed858f1bfc5b32bbd166a306f6ab6632284e5e44e96a97f8ee3ab25c759d4b4

Aliases

arxiv: 2409.16283 · arxiv_version: 2409.16283v1 · doi: 10.48550/arxiv.2409.16283 · pith_short_12: B3MFR4N7YWZS · pith_short_16: B3MFR4N7YWZSXPIW · pith_short_8: B3MFR4N7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B3MFR4N7YWZSXPIWNIYG62VWMM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0ed858f1bfc5b32bbd166a306f6ab6632284e5e44e96a97f8ee3ab25c759d4b4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "34fb22a907aed15928b0a7e293e229a831eafa8c063e11d912c7ca183fdceb80",
    "cross_cats_sorted": [
      "cs.CV",
      "cs.LG",
      "eess.IV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2024-09-24T17:57:33Z",
    "title_canon_sha256": "3f9ca604e7808f1291056c97bf44e06cc35ffd74c485e0535211086a76b907c9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.16283",
    "kind": "arxiv",
    "version": 1
  }
}