{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:SSGN4R6LIKKYFLZTTJP2ICRFMD","short_pith_number":"pith:SSGN4R6L","schema_version":"1.0","canonical_sha256":"948cde47cb429582af339a5fa40a2560e2e14364cd856a55a1aa17f870c450df","source":{"kind":"arxiv","id":"2603.16666","version":2},"attestation_state":"computed","paper":{"title":"Fast-WAM: Do World Action Models Need Test-time Future Imagination?","license":"http://creativecommons.org/licenses/by/4.0/","headline":"World Action Models achieve competitive performance without generating future observations at test time.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Hang Zhao, Tianyuan Yuan, Yicheng Liu, Zibin Dong","submitted_at":"2026-03-17T15:33:43Z","abstract_excerpt":"World Action Models (WAMs) have emerged as a promising alternative to Vision-Language-Action (VLA) models for embodied control because they explicitly model how visual observations may evolve under action. Most existing WAMs follow an imagine-then-execute paradigm, incurring substantial test-time latency from iterative video denoising, yet it remains unclear whether explicit future imagination is actually necessary for strong action performance. In this paper, we ask whether WAMs need explicit future imagination at test time, or whether their benefit comes primarily from video modeling during "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2603.16666","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CV","submitted_at":"2026-03-17T15:33:43Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"2de1c9d7f8c5b27c97c726e2efd13279dba35af91b2527fba96388b03aae1c41","abstract_canon_sha256":"44e9f3d67425464f251e16eccbb9f2cf49bd5c87dad154d9f4bb2e0486218d79"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:45:49.534404Z","signature_b64":"ABaIHGcDfsceE4XFxa16Rbpv206waD/QXIeEChL5PGi/naje2pKxtqiWkt8S39AYhTLTI6DlkQdrfisy5k8XAA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"948cde47cb429582af339a5fa40a2560e2e14364cd856a55a1aa17f870c450df","last_reissued_at":"2026-05-18T02:45:49.533935Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:45:49.533935Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Fast-WAM: Do World Action Models Need Test-time Future Imagination?","license":"http://creativecommons.org/licenses/by/4.0/","headline":"World Action Models achieve competitive performance without generating future observations at test time.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Hang Zhao, Tianyuan Yuan, Yicheng Liu, Zibin Dong","submitted_at":"2026-03-17T15:33:43Z","abstract_excerpt":"World Action Models (WAMs) have emerged as a promising alternative to Vision-Language-Action (VLA) models for embodied control because they explicitly model how visual observations may evolve under action. Most existing WAMs follow an imagine-then-execute paradigm, incurring substantial test-time latency from iterative video denoising, yet it remains unclear whether explicit future imagination is actually necessary for strong action performance. In this paper, we ask whether WAMs need explicit future imagination at test time, or whether their benefit comes primarily from video modeling during "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Fast-WAM remains competitive with imagine-then-execute variants, while removing video co-training causes a much larger performance drop. It achieves competitive results with state-of-the-art methods on simulation benchmarks (LIBERO and RoboTwin) and real-world tasks, without embodied pretraining, running in real time with 190ms latency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the proposed Fast-WAM variants successfully disentangle the contribution of video modeling during training from explicit future generation at inference, allowing a controlled comparison of the two factors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Fast-WAM shows that explicit future imagination at test time is not required for strong WAM performance; video modeling during training provides the main benefit.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"World Action Models achieve competitive performance without generating future observations at test time.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"69a585b0269711df9ba37b68dc5d5d631f849824b21db9bdaed01646db3f89ad"},"source":{"id":"2603.16666","kind":"arxiv","version":2},"verdict":{"id":"1157b792-dd93-4ea1-8a04-b5a4b69815a3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T01:52:48.041740Z","strongest_claim":"Fast-WAM remains competitive with imagine-then-execute variants, while removing video co-training causes a much larger performance drop. It achieves competitive results with state-of-the-art methods on simulation benchmarks (LIBERO and RoboTwin) and real-world tasks, without embodied pretraining, running in real time with 190ms latency.","one_line_summary":"Fast-WAM shows that explicit future imagination at test time is not required for strong WAM performance; video modeling during training provides the main benefit.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the proposed Fast-WAM variants successfully disentangle the contribution of video modeling during training from explicit future generation at inference, allowing a controlled comparison of the two factors.","pith_extraction_headline":"World Action Models achieve competitive performance without generating future observations at test time."},"references":{"count":41,"sample":[{"doi":"","year":2025,"title":"mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs","work_id":"cd5b191a-8f67-43d3-8816-0ade1b9a7c29","ref_index":1,"cited_arxiv_id":"2512.15692","is_internal_anchor":true},{"doi":"","year":2025,"title":"Video Generators are Robot Policies","work_id":"0941f8fe-e893-4e3a-9ac6-0a72f26340e9","ref_index":2,"cited_arxiv_id":"2508.00795","is_internal_anchor":true},{"doi":"","year":2026,"title":"Causal World Modeling for Robot Control","work_id":"a33c4ee0-db06-4f9a-8852-c62e3a72fc27","ref_index":3,"cited_arxiv_id":"2601.21998","is_internal_anchor":true},{"doi":"","year":null,"title":"World action models are zero-shot policies","work_id":"b3e862e1-b68b-436d-9c90-ea27dcf20d53","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"World Action Models are Zero-shot Policies","work_id":"9a85fc69-74df-450e-94cd-69d186e9e830","ref_index":5,"cited_arxiv_id":"2602.15922","is_internal_anchor":true}],"resolved_work":41,"snapshot_sha256":"159210ec1cedf7fbd17cca25da2665dbbc88a218df79827812091a7a2104b6cd","internal_anchors":30},"formal_canon":{"evidence_count":2,"snapshot_sha256":"78553b27229f55e16edce3c61a9795852fe7d7db08a367f300e02e4942f3a96a"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2603.16666","created_at":"2026-05-18T02:45:49.534009+00:00"},{"alias_kind":"arxiv_version","alias_value":"2603.16666v2","created_at":"2026-05-18T02:45:49.534009+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2603.16666","created_at":"2026-05-18T02:45:49.534009+00:00"},{"alias_kind":"pith_short_12","alias_value":"SSGN4R6LIKKY","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"SSGN4R6LIKKYFLZT","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"SSGN4R6L","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":33,"internal_anchor_count":33,"sample":[{"citing_arxiv_id":"2605.15153","citing_title":"Pelican-Unify 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22183","citing_title":"Action with Visual Primitives","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20752","citing_title":"GaussianDream: A Feed-Forward 3D Gaussian World Model for Robotic Manipulation","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15705","citing_title":"Feedback World Model Enables Precise Guidance of Diffusion Policy","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17912","citing_title":"WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19319","citing_title":"SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19594","citing_title":"MCNav: Memory-Aware Dynamic Cognitive Map for Zero-shot Goal-oriented Navigation","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2601.07060","citing_title":"PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation","ref_index":134,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15153","citing_title":"Pelican-Unify 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13548","citing_title":"AttenA+: Rectifying Action Inequality in Robotic Foundation Models","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2603.28489","citing_title":"Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms","ref_index":223,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12038","citing_title":"OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12090","citing_title":"World Action Models: The Next Frontier in Embodied AI","ref_index":110,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12167","citing_title":"From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11550","citing_title":"The DAWN of World-Action Interactive Models","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27792","citing_title":"MotuBrain: An Advanced World Action Model for Robot Control","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27711","citing_title":"ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26694","citing_title":"Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06222","citing_title":"When to Trust Imagination: Adaptive Action Execution for World Action Models","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25859","citing_title":"Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26694","citing_title":"Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06481","citing_title":"OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation","ref_index":91,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06247","citing_title":"CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06222","citing_title":"When to Trust Imagination: Adaptive Action Execution for World Action Models","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.00080","citing_title":"World Model for Robot Learning: A Comprehensive Survey","ref_index":63,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD","json":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD.json","graph_json":"https://pith.science/api/pith-number/SSGN4R6LIKKYFLZTTJP2ICRFMD/graph.json","events_json":"https://pith.science/api/pith-number/SSGN4R6LIKKYFLZTTJP2ICRFMD/events.json","paper":"https://pith.science/paper/SSGN4R6L"},"agent_actions":{"view_html":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD","download_json":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD.json","view_paper":"https://pith.science/paper/SSGN4R6L","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2603.16666&json=true","fetch_graph":"https://pith.science/api/pith-number/SSGN4R6LIKKYFLZTTJP2ICRFMD/graph.json","fetch_events":"https://pith.science/api/pith-number/SSGN4R6LIKKYFLZTTJP2ICRFMD/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD/action/timestamp_anchor","attest_storage":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD/action/storage_attestation","attest_author":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD/action/author_attestation","sign_citation":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD/action/citation_signature","submit_replication":"https://pith.science/pith/SSGN4R6LIKKYFLZTTJP2ICRFMD/action/replication_record"}},"created_at":"2026-05-18T02:45:49.534009+00:00","updated_at":"2026-05-18T02:45:49.534009+00:00"}