{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:AWTLHGQFMS6UY7JHLKNKIPLENX","short_pith_number":"pith:AWTLHGQF","schema_version":"1.0","canonical_sha256":"05a6b39a0564bd4c7d275a9aa43d646dc49a7801a6ef9b2c37e9b7a43ae97c66","source":{"kind":"arxiv","id":"2302.11550","version":1},"attestation_state":"computed","paper":{"title":"Scaling Robot Learning with Semantically Imagined Experience","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Robot policies trained on data augmented by text-to-image inpainting solve unseen tasks with new objects and resist novel distractors.","cross_cats":["cs.AI","cs.CL","cs.CV","cs.LG"],"primary_cat":"cs.RO","authors_text":"Anthony Brohan, Austin Stone, Brian Ichter, Clayton Tan, Dee M, Fei Xia, Jaspiar Singh, Jodilyn Peralta, Jonathan Tompson, Karol Hausman, Su Wang, Ted Xiao, Tianhe Yu","submitted_at":"2023-02-22T18:47:51Z","abstract_excerpt":"Recent advances in robot learning have shown promise in enabling robots to perform a variety of manipulation tasks and generalize to novel scenarios. One of the key contributing factors to this progress is the scale of robot data used to train the models. To obtain large-scale datasets, prior approaches have relied on either demonstrations requiring high human involvement or engineering-heavy autonomous data collection schemes, both of which are challenging to scale. To mitigate this issue, we propose an alternative route and leverage text-to-image foundation models widely used in computer vis"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2302.11550","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.RO","submitted_at":"2023-02-22T18:47:51Z","cross_cats_sorted":["cs.AI","cs.CL","cs.CV","cs.LG"],"title_canon_sha256":"416b99d59369f421d2a477ea51d7e169215b0c727e7612844bb23588525725bc","abstract_canon_sha256":"55904d64b7891cd5f5b1000f866ce83187fa3ee4c80a632c6b1382e7ba4fc268"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:13.345899Z","signature_b64":"TaLK4rYfarQWvX/ctOkJKEl3Z9j6cjnX3edy0xkddPe1DGj7RiUJwKVPTGf9mBt9BtQhd2n7h1D3kTBSROG1Ag==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"05a6b39a0564bd4c7d275a9aa43d646dc49a7801a6ef9b2c37e9b7a43ae97c66","last_reissued_at":"2026-05-17T23:38:13.345311Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:13.345311Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Scaling Robot Learning with Semantically Imagined Experience","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Robot policies trained on data augmented by text-to-image inpainting solve unseen tasks with new objects and resist novel distractors.","cross_cats":["cs.AI","cs.CL","cs.CV","cs.LG"],"primary_cat":"cs.RO","authors_text":"Anthony Brohan, Austin Stone, Brian Ichter, Clayton Tan, Dee M, Fei Xia, Jaspiar Singh, Jodilyn Peralta, Jonathan Tompson, Karol Hausman, Su Wang, Ted Xiao, Tianhe Yu","submitted_at":"2023-02-22T18:47:51Z","abstract_excerpt":"Recent advances in robot learning have shown promise in enabling robots to perform a variety of manipulation tasks and generalize to novel scenarios. One of the key contributing factors to this progress is the scale of robot data used to train the models. To obtain large-scale datasets, prior approaches have relied on either demonstrations requiring high human involvement or engineering-heavy autonomous data collection schemes, both of which are challenging to scale. To mitigate this issue, we propose an alternative route and leverage text-to-image foundation models widely used in computer vis"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The inpainted images generated by the text-to-image diffusion model are sufficiently realistic and physically plausible that policies trained on them transfer successfully to real-world robot execution without introducing harmful artifacts or distribution shifts.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Robot policies trained on data augmented by text-to-image inpainting solve unseen tasks with new objects and resist novel distractors.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f8221ef6bd7733798d4ff7504725b3d1ddba8f89cdb562c3128162a553a876f7"},"source":{"id":"2302.11550","kind":"arxiv","version":1},"verdict":{"id":"620568ac-7b7f-4ffd-b872-914be9cbeb90","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T18:52:17.658791Z","strongest_claim":"manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors.","one_line_summary":"Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The inpainted images generated by the text-to-image diffusion model are sufficiently realistic and physically plausible that policies trained on them transfer successfully to real-world robot execution without introducing harmful artifacts or distribution shifts.","pith_extraction_headline":"Robot policies trained on data augmented by text-to-image inpainting solve unseen tasks with new objects and resist novel distractors."},"references":{"count":78,"sample":[{"doi":"","year":2022,"title":"VIMA : General robot manipulation with multimodal prompts","work_id":"7b5f6cce-bbaa-40ed-8b09-7330832dd736","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"RT-1: Robotics Transformer for Real-World Control at Scale","work_id":"e11bda85-8531-46bc-a07f-d0ade3643ab1","ref_index":2,"cited_arxiv_id":"2212.06817","is_internal_anchor":true},{"doi":"","year":2022,"title":"M. Shridhar, L. Manuelli, and D. Fox. Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, 2022","work_id":"30fc4fa5-1578-4727-8698-e4e6d6d06872","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Perceiver-Actor: A multi-task transformer for robotic manipulation","work_id":"b20db57f-09e7-4916-b4b5-9e7b95f2bd97","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Hierarchical Text-Conditional Image Generation with CLIP Latents","work_id":"0c6a768b-70b8-4242-bb0e-459f1008c9fc","ref_index":5,"cited_arxiv_id":"2204.06125","is_internal_anchor":true}],"resolved_work":78,"snapshot_sha256":"90377220c7b4b8ee0ad5f2180fc6b4908eacd4a2c1df69aff970ff3e751df9cc","internal_anchors":21},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b39de740a6e5c54404793b66f82288d44b63b0d6218f7f686d68e0079cadac7e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2302.11550","created_at":"2026-05-17T23:38:13.345407+00:00"},{"alias_kind":"arxiv_version","alias_value":"2302.11550v1","created_at":"2026-05-17T23:38:13.345407+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2302.11550","created_at":"2026-05-17T23:38:13.345407+00:00"},{"alias_kind":"pith_short_12","alias_value":"AWTLHGQFMS6U","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"AWTLHGQFMS6UY7JH","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"AWTLHGQF","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":19,"internal_anchor_count":19,"sample":[{"citing_arxiv_id":"2505.03233","citing_title":"GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2310.17596","citing_title":"MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2310.10639","citing_title":"Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models","ref_index":64,"is_internal_anchor":true},{"citing_arxiv_id":"2505.12705","citing_title":"DreamGen: Unlocking Generalization in Robot Learning through Video World Models","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2409.16283","citing_title":"Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13105","citing_title":"What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2307.05973","citing_title":"VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models","ref_index":112,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26509","citing_title":"3D Generation for Embodied AI and Robotic Simulation: A Survey","ref_index":193,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26509","citing_title":"3D Generation for Embodied AI and Robotic Simulation: A Survey","ref_index":193,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23001","citing_title":"Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00244","citing_title":"Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10809","citing_title":"WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations","ref_index":122,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11386","citing_title":"ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07306","citing_title":"BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07474","citing_title":"ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations","ref_index":75,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26509","citing_title":"3D Generation for Embodied AI and Robotic Simulation: A Survey","ref_index":192,"is_internal_anchor":true},{"citing_arxiv_id":"2405.12213","citing_title":"Octo: An Open-Source Generalist Robot Policy","ref_index":97,"is_internal_anchor":true},{"citing_arxiv_id":"2503.14734","citing_title":"GR00T N1: An Open Foundation Model for Generalist Humanoid Robots","ref_index":99,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15483","citing_title":"${\\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities","ref_index":75,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX","json":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX.json","graph_json":"https://pith.science/api/pith-number/AWTLHGQFMS6UY7JHLKNKIPLENX/graph.json","events_json":"https://pith.science/api/pith-number/AWTLHGQFMS6UY7JHLKNKIPLENX/events.json","paper":"https://pith.science/paper/AWTLHGQF"},"agent_actions":{"view_html":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX","download_json":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX.json","view_paper":"https://pith.science/paper/AWTLHGQF","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2302.11550&json=true","fetch_graph":"https://pith.science/api/pith-number/AWTLHGQFMS6UY7JHLKNKIPLENX/graph.json","fetch_events":"https://pith.science/api/pith-number/AWTLHGQFMS6UY7JHLKNKIPLENX/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX/action/timestamp_anchor","attest_storage":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX/action/storage_attestation","attest_author":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX/action/author_attestation","sign_citation":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX/action/citation_signature","submit_replication":"https://pith.science/pith/AWTLHGQFMS6UY7JHLKNKIPLENX/action/replication_record"}},"created_at":"2026-05-17T23:38:13.345407+00:00","updated_at":"2026-05-17T23:38:13.345407+00:00"}