NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
Mixed citations
arXiv preprint arXiv:2601.05230 (2026)
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
years
2026 9verdicts
UNVERDICTED 9roles
background 5representative citing papers
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.
GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.
citing papers explorer
-
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
-
Latent State Design for World Models under Sufficiency Constraints
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
-
DiLA: Disentangled Latent Action World Models
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
-
SCAR: Self-Supervised Continuous Action Representation Learning
SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
-
Why Latent Actions Fail, and How to Prevent It
Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.
-
GazeVLA: Learning Human Intention for Robotic Manipulation
GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.
-
Human Cognition in Machines: A Unified Perspective of World Models
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
-
PhyWorld: Physics-Faithful World Model for Video Generation
PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.