EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

· 2026 · cs.CV · arXiv 2605.06192

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generation as an auxiliary representation for policy learning. Consequently, they insufficiently explore the inverse problem: leveraging action signals to guide video synthesis, thereby often failing to preserve precise robot spatial geometry and fine-grained robot-object interaction dynamics in the generated rollouts. To bridge this gap, we present EA-WM, an Event-Aware Generative World Model that effectively closes the loop between kinematic control and visual perception. Rather than injecting joint or end-effector actions as abstract, low-dimensional tokens, EA-WM projects actions and kinematic states directly into the target camera view as Structured Kinematic-to-Visual Action Fields. To fully exploit this geometrically grounded representation, we introduce event-aware bidirectional fusion blocks that modulate cross-branch attention, capturing object state changes and interaction dynamics. Evaluated on the comprehensive WorldArena benchmark, EA-WM achieves state-of-the-art performance, outperforming existing baselines by a significant margin.

representative citing papers

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position

cs.LG · 2026-06-13 · unverdicted · novelty 5.0

The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

citing papers explorer

Showing 1 of 1 citing paper.

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position cs.LG · 2026-06-13 · unverdicted · none · ref 68 · internal anchor
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

fields

years

verdicts

representative citing papers

citing papers explorer