{"paper":{"title":"Fast-WAM: Do World Action Models Need Test-time Future Imagination?","license":"http://creativecommons.org/licenses/by/4.0/","headline":"World Action Models achieve competitive performance without generating future observations at test time.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Hang Zhao, Tianyuan Yuan, Yicheng Liu, Zibin Dong","submitted_at":"2026-03-17T15:33:43Z","abstract_excerpt":"World Action Models (WAMs) have emerged as a promising alternative to Vision-Language-Action (VLA) models for embodied control because they explicitly model how visual observations may evolve under action. Most existing WAMs follow an imagine-then-execute paradigm, incurring substantial test-time latency from iterative video denoising, yet it remains unclear whether explicit future imagination is actually necessary for strong action performance. In this paper, we ask whether WAMs need explicit future imagination at test time, or whether their benefit comes primarily from video modeling during "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Fast-WAM remains competitive with imagine-then-execute variants, while removing video co-training causes a much larger performance drop. It achieves competitive results with state-of-the-art methods on simulation benchmarks (LIBERO and RoboTwin) and real-world tasks, without embodied pretraining, running in real time with 190ms latency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the proposed Fast-WAM variants successfully disentangle the contribution of video modeling during training from explicit future generation at inference, allowing a controlled comparison of the two factors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Fast-WAM shows that explicit future imagination at test time is not required for strong WAM performance; video modeling during training provides the main benefit.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"World Action Models achieve competitive performance without generating future observations at test time.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"69a585b0269711df9ba37b68dc5d5d631f849824b21db9bdaed01646db3f89ad"},"source":{"id":"2603.16666","kind":"arxiv","version":2},"verdict":{"id":"1157b792-dd93-4ea1-8a04-b5a4b69815a3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T01:52:48.041740Z","strongest_claim":"Fast-WAM remains competitive with imagine-then-execute variants, while removing video co-training causes a much larger performance drop. It achieves competitive results with state-of-the-art methods on simulation benchmarks (LIBERO and RoboTwin) and real-world tasks, without embodied pretraining, running in real time with 190ms latency.","one_line_summary":"Fast-WAM shows that explicit future imagination at test time is not required for strong WAM performance; video modeling during training provides the main benefit.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the proposed Fast-WAM variants successfully disentangle the contribution of video modeling during training from explicit future generation at inference, allowing a controlled comparison of the two factors.","pith_extraction_headline":"World Action Models achieve competitive performance without generating future observations at test time."},"references":{"count":41,"sample":[{"doi":"","year":2025,"title":"mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs","work_id":"cd5b191a-8f67-43d3-8816-0ade1b9a7c29","ref_index":1,"cited_arxiv_id":"2512.15692","is_internal_anchor":true},{"doi":"","year":2025,"title":"Video Generators are Robot Policies","work_id":"0941f8fe-e893-4e3a-9ac6-0a72f26340e9","ref_index":2,"cited_arxiv_id":"2508.00795","is_internal_anchor":true},{"doi":"","year":2026,"title":"Causal World Modeling for Robot Control","work_id":"a33c4ee0-db06-4f9a-8852-c62e3a72fc27","ref_index":3,"cited_arxiv_id":"2601.21998","is_internal_anchor":true},{"doi":"","year":null,"title":"World action models are zero-shot policies","work_id":"b3e862e1-b68b-436d-9c90-ea27dcf20d53","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"World Action Models are Zero-shot Policies","work_id":"9a85fc69-74df-450e-94cd-69d186e9e830","ref_index":5,"cited_arxiv_id":"2602.15922","is_internal_anchor":true}],"resolved_work":41,"snapshot_sha256":"159210ec1cedf7fbd17cca25da2665dbbc88a218df79827812091a7a2104b6cd","internal_anchors":30},"formal_canon":{"evidence_count":2,"snapshot_sha256":"78553b27229f55e16edce3c61a9795852fe7d7db08a367f300e02e4942f3a96a"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}