PhysEditWorld is a new dataset of over 60 million frames from 12 UE5 cinematic scenes with synchronized multimodal signals and explicit gravity labels, built via replay to support physics-editable world models.
Causalvqa: A physically grounded causal reasoning benchmark for video models
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 11roles
background 1polarities
background 1representative citing papers
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
What-If World is a new paired-prompt benchmark showing that nine state-of-the-art video generation models achieve at most 52% on causal intervention tests and cluster near 28% for open-source systems.
CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.
CaST-Bench creates a benchmark with causal-chain annotations and novel metrics showing that current VLMs struggle to construct precise grounded causal chains in video QA.
Act2See trains VLMs via supervised fine-tuning on verified reasoning traces to interleave active frame calls within text CoTs, yielding SOTA results on video reasoning benchmarks.
SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.
Introduces CausalPhys benchmark with causal graphs and CRFT fine-tuning to improve VLMs' causal physical reasoning accuracy and interpretability.
Cosmos 3 presents a unified omnimodal world model family based on mixture-of-transformers that processes language, vision, audio, and action for Physical AI applications.
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
citing papers explorer
No citing papers match the current filters.