Say , dream, and act: Learning video world models for instruction-driven robot manipulation

Songen Gu, Yunuo Cai, Tianyu Wang, Simo Wu, Yanwei Fu · 2026 · arXiv 2602.10717

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VT-WAM: Visual-Tactile World Action Model for Contact-Rich Manipulation

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

VT-WAM jointly predicts visual futures, tactile deformation, and actions via flow matching with Asymmetric MoT attention and contact-gated AVTAG, reporting 71.67% success on six real-world contact-rich tasks.

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

cs.RO · 2026-05-31 · unverdicted · novelty 6.0

Empirical study introduces behavioral and representational diagnostics showing architecture-dependent gains in object targeting and predictive structure for WAMs over VLAs on LIBERO and RoboTwin2.0.

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

World Action Models: A Survey

cs.RO · 2026-06-18 · unverdicted · novelty 3.0

A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

citing papers explorer

Showing 6 of 6 citing papers.

VT-WAM: Visual-Tactile World Action Model for Contact-Rich Manipulation cs.RO · 2026-07-02 · unverdicted · none · ref 26
VT-WAM jointly predicts visual futures, tactile deformation, and actions via flow matching with Asymmetric MoT attention and contact-gated AVTAG, reporting 71.67% success on six real-world contact-rich tasks.
Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA cs.RO · 2026-05-31 · unverdicted · none · ref 8
Empirical study introduces behavioral and representational diagnostics showing architecture-dependent gains in object targeting and predictive structure for WAMs over VLAs on LIBERO and RoboTwin2.0.
World Pilot: Steering Vision-Language-Action Models with World-Action Priors cs.RO · 2026-06-10 · unverdicted · none · ref 8
World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 12
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
World Action Models: A Survey cs.RO · 2026-06-18 · unverdicted · none · ref 48
A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.
World Model for Robot Learning: A Comprehensive Survey cs.RO · 2026-04-30 · unverdicted · none · ref 20
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

Say , dream, and act: Learning video world models for instruction-driven robot manipulation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer