pith:2RCKNUMH
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
A finetuned image-editing diffusion model generates subgoal images that let a low-level policy complete manipulation tasks on objects and instructions absent from robot training data.
arxiv:2310.10639 v1 · 2023-10-16 · cs.RO
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2RCKNUMH66EZWGGSE6X4BKL32A}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We achieve state-of-the-art results on the CALVIN benchmark, and also demonstrate robust generalization on real-world manipulation tasks, beating strong baselines that have access to privileged information or that utilize orders of magnitude more compute and training data.
That subgoal images generated by the finetuned diffusion model remain sufficiently accurate and executable for the low-level policy when the robot encounters objects, lighting, or instructions outside the finetuning distribution.
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.917021Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d444a6d187f7899b18d227afc0a97bd0068e1f401790682db06cd5a4f03c3c19
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2RCKNUMH66EZWGGSE6X4BKL32A \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d444a6d187f7899b18d227afc0a97bd0068e1f401790682db06cd5a4f03c3c19
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "3a5b18f67532489bed26bb37ff3209f173b78c96f1e8dc820dd2ab374d3cfa83",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.RO",
"submitted_at": "2023-10-16T17:57:23Z",
"title_canon_sha256": "1afa1eb71c2abc380fef589399b6b9ce5c674f28e4c2d67bdf69760ca03cf0eb"
},
"schema_version": "1.0",
"source": {
"id": "2310.10639",
"kind": "arxiv",
"version": 1
}
}