pith. sign in
Pith Number

pith:2RCKNUMH

pith:2023:2RCKNUMH66EZWGGSE6X4BKL32A
not attested not anchored not stored refs resolved

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

Aviral Kumar, Chelsea Finn, Homer Walke, Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Sergey Levine

A finetuned image-editing diffusion model generates subgoal images that let a low-level policy complete manipulation tasks on objects and instructions absent from robot training data.

arxiv:2310.10639 v1 · 2023-10-16 · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2RCKNUMH66EZWGGSE6X4BKL32A}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We achieve state-of-the-art results on the CALVIN benchmark, and also demonstrate robust generalization on real-world manipulation tasks, beating strong baselines that have access to privileged information or that utilize orders of magnitude more compute and training data.

C2weakest assumption

That subgoal images generated by the finetuned diffusion model remain sufficiently accurate and executable for the low-level policy when the robot encounters objects, lighting, or instructions outside the finetuning distribution.

C3one line summary

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

References

66 extracted · 66 resolved · 17 Pith anchors

[1] Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B. Tenenbaum, Tommi S. Jaakkola, and Pulkit Agrawal. Is conditional generative modeling all you need for decision making? In The Eleventh International Confer 2023
[2] Compositional founda- tion models for hierarchical planning 2023
[3] Fitvid: Overfitting in pixel-level video prediction 2020
[4] Robotic offline rl from internet videos via value-function pre-training 2023
[5] Introducing ChatGPT and Whis- per APIs 2023

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.917021Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d444a6d187f7899b18d227afc0a97bd0068e1f401790682db06cd5a4f03c3c19

Aliases

arxiv: 2310.10639 · arxiv_version: 2310.10639v1 · doi: 10.48550/arxiv.2310.10639 · pith_short_12: 2RCKNUMH66EZ · pith_short_16: 2RCKNUMH66EZWGGS · pith_short_8: 2RCKNUMH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2RCKNUMH66EZWGGSE6X4BKL32A \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d444a6d187f7899b18d227afc0a97bd0068e1f401790682db06cd5a4f03c3c19
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3a5b18f67532489bed26bb37ff3209f173b78c96f1e8dc820dd2ab374d3cfa83",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2023-10-16T17:57:23Z",
    "title_canon_sha256": "1afa1eb71c2abc380fef589399b6b9ce5c674f28e4c2d67bdf69760ca03cf0eb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.10639",
    "kind": "arxiv",
    "version": 1
  }
}