The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.
arXiv preprint arXiv:2403.05131 (2024)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.
A new dataset and fine-tuned VLM detector/explainer called PhyDetEx shows that current T2V models still struggle to generate videos that obey physical laws, with open-source models performing worse.
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
The paper surveys the evolution of video trailer generation from extractive heuristics to generative AI methods and proposes a new taxonomy for future systems based on autoregressive and foundation models.
citing papers explorer
-
Graph World Models: Concepts, Taxonomy, and Future Directions
The paper unifies emerging graph-based world models under a new paradigm and proposes a taxonomy organized by spatial, physical, and logical relational inductive biases.
-
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.
-
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
A new dataset and fine-tuned VLM detector/explainer called PhyDetEx shows that current T2V models still struggle to generate videos that obey physical laws, with open-source models performing worse.
-
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
-
Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity
The paper surveys the evolution of video trailer generation from extractive heuristics to generative AI methods and proposes a new taxonomy for future systems based on autoregressive and foundation models.