MV-S2V: Multi-View Subject-Consistent Video Generation

· 2026 · cs.CV · arXiv 2601.17756

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Existing Subject-to-Video Generation (S2V) methods have achieved high-fidelity and subject-consistent video generation, yet remain constrained to single-view subject references. This limitation renders the S2V task reducible to an S2I + I2V pipeline, failing to exploit the full potential of video subject control. In this work, we propose and address the challenging Multi-View S2V (MV-S2V) task, which synthesizes videos from multiple reference views to enforce 3D-level subject consistency. Regarding the scarcity of training data, we first develop a synthetic data curation pipeline to generate highly customized synthetic data, complemented by a small-scale real-world captured dataset to boost the training of MV-S2V. Another key issue lies in the potential confusion between cross-subject and cross-view references in conditional generation. To overcome this, we further introduce Temporally Shifted RoPE (TS-RoPE) to distinguish between different subjects and distinct views of the same subject in reference conditioning. Our framework achieves superior 3D subject consistency w.r.t. multi-view reference images and high-quality visual outputs, establishing a new meaningful direction for subject-driven video generation. Code and data are available at: https://szy-young.github.io/mv-s2v

representative citing papers

HarmoView: Harmonizing Multi-View Constraints for Identity-Consistent Video Generation

cs.CV · 2026-06-09 · unverdicted · novelty 5.0

HarmoView proposes Multi-level Feature Injection, learnable proxy tokens, Jump-RoPE, and Progressive View Curriculum plus a new multi-view dataset to achieve state-of-the-art identity-consistent video generation from multi-view inputs.

citing papers explorer

Showing 1 of 1 citing paper.

HarmoView: Harmonizing Multi-View Constraints for Identity-Consistent Video Generation cs.CV · 2026-06-09 · unverdicted · none · ref 14 · internal anchor
HarmoView proposes Multi-level Feature Injection, learnable proxy tokens, Jump-RoPE, and Progressive View Curriculum plus a new multi-view dataset to achieve state-of-the-art identity-consistent video generation from multi-view inputs.

MV-S2V: Multi-View Subject-Consistent Video Generation

fields

years

verdicts

representative citing papers

citing papers explorer