CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.
T 2 V P hys B ench: A first-principles benchmark for physical consistency in text-to-video generation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
PH-Dreamer integrates a port-Hamiltonian framework into generative world models to enforce physical priors, yielding tighter imagined-real reward alignment and reduced latent space volume on visual control benchmarks.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
citing papers explorer
-
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models
CRONOS benchmark shows recent open-source video generators fail to preserve physical consistency under controlled changes to viewpoint, scene, object category, and appearance.
-
PhyGround: Benchmarking Physical Reasoning in Generative World Models
PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.
-
PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics
PH-Dreamer integrates a port-Hamiltonian framework into generative world models to enforce physical priors, yielding tighter imagined-real reward alignment and reduced latent space volume on visual control benchmarks.
-
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
-
World Simulation with Video Foundation Models for Physical AI
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
- Do Joint Audio-Video Generation Models Understand Physics?