MCSC-Bench is the first large-scale dataset for the Multimodal Context-to-Script Creation task, requiring models to select relevant shots from redundant materials, plan missing shots, and generate coherent scripts with voiceovers.
Animate-a-story: Storytelling with retrieval- augmented video generation
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.
Sa2VA unifies SAM-2 segmentation with MLLM reasoning into a single model for referring segmentation and conversation on images and videos, supported by a new 72k-expression Ref-SAV dataset.
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
citing papers explorer
-
MCSC-Bench: Multimodal Context-to-Script Creation for Realistic Video Production
MCSC-Bench is the first large-scale dataset for the Multimodal Context-to-Script Creation task, requiring models to select relevant shots from redundant materials, plan missing shots, and generate coherent scripts with voiceovers.
-
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
-
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
VBench-2.0 is a benchmark suite that automatically evaluates video generative models on five dimensions of intrinsic faithfulness: Human Fidelity, Controllability, Creativity, Physics, and Commonsense using VLMs, LLMs, and anomaly detection methods.
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA unifies SAM-2 segmentation with MLLM reasoning into a single model for referring segmentation and conversation on images and videos, supported by a new 72k-expression Ref-SAV dataset.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
-
Evolution of Video Generative Foundations
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.