UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
Prism: A unified framework for photorealistic reconstruction and intrinsic scene modeling, 2025.https://arxiv.org/abs/2504.14219
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
citing papers explorer
-
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.