{"paper":{"title":"Ctrl-World: A Controllable Generative World Model for Robot Manipulation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A controllable world model ranks robot policies and improves them by 44.7 percent through imagined trajectories alone.","cross_cats":["cs.AI"],"primary_cat":"cs.RO","authors_text":"Chelsea Finn, Jianyu Chen, Lucy Xiaoyang Shi, Yanjiang Guo","submitted_at":"2025-10-11T09:13:10Z","abstract_excerpt":"Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can ha"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"By synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7%.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The generated trajectories are sufficiently accurate proxies for real-world dynamics on novel objects, instructions, and camera placements to enable reliable policy ranking and effective fine-tuning.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A controllable world model ranks robot policies and improves them by 44.7 percent through imagined trajectories alone.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"151df77c49b7b3312d0c27b87f6f77990c2fcd09b51e91c45bd4a7dd8afada08"},"source":{"id":"2510.10125","kind":"arxiv","version":3},"verdict":{"id":"f26e11c6-4ee4-47a8-b897-324ba90446e0","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T01:08:54.062305Z","strongest_claim":"By synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7%.","one_line_summary":"A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The generated trajectories are sufficiently accurate proxies for real-world dynamics on novel objects, instructions, and camera placements to enable reliable policy ranking and effective fine-tuning.","pith_extraction_headline":"A controllable world model ranks robot policies and improves them by 44.7 percent through imagined trajectories alone."},"references":{"count":56,"sample":[{"doi":"","year":null,"title":"Cosmos World Foundation Model Platform for Physical AI","work_id":"a2dba24c-318d-476a-8b21-4289c265810c","ref_index":1,"cited_arxiv_id":"2501.03575","is_internal_anchor":true},{"doi":"","year":null,"title":"RoboArena: Distributed real-world evaluation of generalist robot policies","work_id":"a02af411-4d93-4ac8-a15c-930c8f021765","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation","work_id":"a3bde288-aace-40db-8067-3ae6656f9509","ref_index":3,"cited_arxiv_id":"2409.16283","is_internal_anchor":true},{"doi":"","year":null,"title":"Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models","work_id":"954b4359-f4ed-4c73-ae5b-f75d486b6fc8","ref_index":4,"cited_arxiv_id":"2310.10639","is_internal_anchor":true},{"doi":"","year":null,"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","ref_index":5,"cited_arxiv_id":"2410.24164","is_internal_anchor":true}],"resolved_work":56,"snapshot_sha256":"243ff9ac1de778d6328c913e9251ca801609863084055bc3a4127ff3483d2c95","internal_anchors":32},"formal_canon":{"evidence_count":2,"snapshot_sha256":"5f4eaeb564048631af71e0063430757edbdb147dd3d0e57b12560c030c4487d8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}