PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8verdicts
UNVERDICTED 8polarities
background 2representative citing papers
DeformGen uses dynamics-based state expansion via localized disturbances and deformation-field warping for trajectory transfer to improve policy learning on deformable manipulation benchmarks.
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
AnnotateAnything converts passive 3D assets into manipulation-ready assets by combining vision-language reasoning for semantics with parallel physics pipelines for executable action annotations such as grasps and articulations.
TempoVLA learns a single VLA policy with controllable execution speed via variable-speed trajectory augmentation and explicit speed conditioning.
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
MagicSim is a unified embodied interaction infrastructure built on a deterministic batched runtime and shared MDP that supports diverse world construction, execution, task evaluation, automatic rollout generation, and interactive agent interfaces.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
citing papers explorer
-
PhysEditWorld: A Large-Scale Dataset Toward Physics-Editable World Models
PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
-
DeformGen: Dynamics-Based Topology Augmentation for Deformable Manipulation Policy Learning
DeformGen uses dynamics-based state expansion via localized disturbances and deformation-field warping for trajectory transfer to improve policy learning on deformable manipulation benchmarks.
-
Mask World Model: Predicting What Matters for Robust Robot Policy Learning
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
-
AnnotateAnything: Automatic Annotation of 3D Assets for Robot Manipulation
AnnotateAnything converts passive 3D assets into manipulation-ready assets by combining vision-language reasoning for semantics with parallel physics pipelines for executable action annotations such as grasps and articulations.
-
TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies
TempoVLA learns a single VLA policy with controllable execution speed via variable-speed trajectory augmentation and explicit speed conditioning.
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
MagicSim: A Unified Infrastructure for Executable Embodied Interaction
MagicSim is a unified embodied interaction infrastructure built on a deterministic batched runtime and shared MDP that supports diverse world construction, execution, task evaluation, automatic rollout generation, and interactive agent interfaces.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.