MuSS is a new movie-sourced dataset and benchmark that enables AI models to generate multi-shot videos with improved narrative coherence and subject identity preservation.
Story-adapter: A training-free iterative framework for long story visualization
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5roles
background 3polarities
background 3representative citing papers
A new data pipeline using real photos, entity removal, and image-to-video models plus a cross-view attention loss enables text-driven generation of actors in reference scenes with improved alignment.
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
DreamShot uses video diffusion priors and a role-attention consistency loss to produce coherent, personalized storyboards with better character and scene continuity than text-to-image methods.
StoryBlender generates inter-shot consistent editable 3D storyboards using a three-stage pipeline of semantic-spatial grounding, canonical asset materialization, and spatial-temporal dynamics with agent-based verification.
citing papers explorer
-
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation
MuSS is a new movie-sourced dataset and benchmark that enables AI models to generate multi-shot videos with improved narrative coherence and subject identity preservation.
-
DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior
DreamShot uses video diffusion priors and a role-attention consistency loss to produce coherent, personalized storyboards with better character and scene continuity than text-to-image methods.
-
StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics
StoryBlender generates inter-shot consistent editable 3D storyboards using a three-stage pipeline of semantic-spatial grounding, canonical asset materialization, and spatial-temporal dynamics with agent-based verification.