hub

How far is video generation from world model: A physical law perspective

· 2024 · arXiv 2411.02385

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

TeDiO regularizes temporal diagonals in diffusion transformer attention maps to produce smoother video motion while keeping per-frame quality intact.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

cs.RO · 2025-05-19 · unverdicted · novelty 7.0

DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.

ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

ACWM-Phys benchmark shows action-conditioned world models generalize on simple geometric interactions but drop sharply on deformable contacts, high-dimensional control, and complex articulated motion, indicating reliance on visual appearance over learned physics.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

TRAP: Tail-aware Ranking Attack for World-Model Planning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.

Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models

cs.GR · 2026-04-27 · unverdicted · novelty 6.0

Alice v1 is an open video model that surpasses its teacher and closed-source systems like Veo3 and Sora2 in quality while running 7x faster through specialized distillation.

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and text-to-video synthesis.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

Cosmos World Foundation Model Platform for Physical AI

cs.CV · 2025-01-07 · unverdicted · novelty 3.0

The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

citing papers explorer

Showing 11 of 11 citing papers.

TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 23
TeDiO regularizes temporal diagonals in diffusion transformer attention maps to produce smoother video motion while keeping per-frame quality intact.
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 12 · 2 links
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
DreamGen: Unlocking Generalization in Robot Learning through Video World Models cs.RO · 2025-05-19 · unverdicted · none · ref 25
DreamGen trains robot policies on synthetic trajectories from adapted video world models, enabling a humanoid robot to perform 22 new behaviors in seen and unseen environments from a single pick-and-place teleoperation dataset.
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors cs.CV · 2026-05-11 · unverdicted · none · ref 8
The paper presents WorldReasonBench, a benchmark that tests video generators on maintaining physical, social, logical, and informational consistency when predicting future states from initial conditions and actions.
ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models cs.CV · 2026-05-09 · unverdicted · none · ref 13
ACWM-Phys benchmark shows action-conditioned world models generalize on simple geometric interactions but drop sharply on deformable contacts, high-dimensional control, and complex articulated motion, indicating reliance on visual appearance over learned physics.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 8
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 27
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.
Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models cs.GR · 2026-04-27 · unverdicted · none · ref 34
Alice v1 is an open video model that surpasses its teacher and closed-source systems like Veo3 and Sora2 in quality while running 7x faster through specialized distillation.
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations cs.CV · 2026-04-09 · unverdicted · none · ref 24
SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and text-to-video synthesis.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 46
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 88
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

How far is video generation from world model: A physical law perspective

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer